Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/cpu: VEX prefixed instructions require OSXSAVE #41022

Open
zhangyoufu opened this issue Aug 25, 2020 · 21 comments
Open

internal/cpu: VEX prefixed instructions require OSXSAVE #41022

zhangyoufu opened this issue Aug 25, 2020 · 21 comments
Assignees
Labels
Milestone

Comments

@zhangyoufu
Copy link

@zhangyoufu zhangyoufu commented Aug 25, 2020

What did you expect to see?

HasFMA should report false on operating systems that does not support XSAVE (HasOSXSAVE=false).

What did you see instead?

HasFMA=true on Windows Vista.

@gopherbot gopherbot added this to the Unreleased milestone Aug 25, 2020
@martisch
Copy link
Contributor

@martisch martisch commented Aug 25, 2020

Any useful code that uses FMA with ymm registers also needs to check for AVX (e.g. mov instructions to ymm). The HasAVX is only true if HasOSXSAVE is true.

Example from standard library:

var useFMA = cpu.X86.HasAVX && cpu.X86.HasFMA

I think FMA can also be used with xmm registers on Windows Vista which does not require HasOSXSAVE (but I dont have a machine to check). Also here a check for SSE/SSE2 would be needed in addition at any rate.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

Another example from standard library:

x86HasFMA = cpu.X86.HasFMA

addF("math", "FMA",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if !s.config.UseFMA {
a := s.call(n, callNormal)
s.vars[n] = s.load(types.Types[TFLOAT64], a)
return s.variable(n, types.Types[TFLOAT64])
}
v := s.entryNewValue0A(ssa.OpHasCPUFeature, types.Types[TBOOL], x86HasFMA)

I managed to run latest Go release (1.14/1.15) on unsupported Windows OS (XP/Vista), found that math.FMA crashed with STATUS_ILLEGAL_INSTRUCTION.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

The above code uses FMA with an xmm register (does not require OSXSAVE) not with an ymm (AVX) register and is only used by amd64 which has SSE2 and xmm register support as a requirement. It should therefore be fine unless Vista doesnt support SSE2 which would cause other problems unrelated to FMA.

Note that Windows XP/Vista is not supported by Go 1.14 and 1.15:
https://golang.org/doc/install#requirements

To understand the actual problem instead of the proposed solution please give the following information:

  • The output of go env
  • what is the CPU name and vendor that Go is run on
  • the stack trace of the crash (so we can identify where the STATUS_ILLEGAL_INSTRUCTION is happening)
  • if the STATUS_ILLEGAL_INSTRUCTION is happening in code not from the Go standard library/runtime please post a link to the source code that is run
  • does this reproduce on Windows 7 or newer? (if possible to check)
@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

Environment

  • Machine: MacBook16,2 with Intel Core i5-1038NG7
  • Hypervisor: VMWare Fusion 11.5.6 (16696540)
  • Guest OS: Windows Server 2008 (6.0.6003)
  • Go: 1.14.7 (PeMinimumTargetMajorVersion adjusted to 5 to be able to run on old unsupported Windows)

CPU-Z reported that FMA is available to guest. But old Windows does not support it. See also https://support.sisoftware.co.uk/knowledgebase.php?article=70.

I understand that my use case is not supported. I found X86.HasAVX = isSet(ecx1, cpuid_AVX) && osSupportsAVX, and osSupportsAVX depends on X86.HasOSXSAVE. So I thought maybe HasFMA can change in the same way.

Windows 7 and newer does not have any problem.

stack trace
$ /c/go1.14.7/bin/go test -run TestFMA math
Exception 0xc000001d 0x0 0x0 0x52ad65
PC=0x52ad65

math_test.TestFMA(0xc000122120)
        C:/go1.14.7/src/math/all_test.go:3060 +0xb5
testing.tRunner(0xc000122120, 0x58c140)
        C:/go1.14.7/src/testing/testing.go:1039 +0xe3
created by testing.(*T).Run
        C:/go1.14.7/src/testing/testing.go:1090 +0x379

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000122120, 0x57fe31, 0x7, 0x58c140, 0x482a01)
        C:/go1.14.7/src/testing/testing.go:1091 +0x3a0
testing.runTests.func1(0xc000122000)
        C:/go1.14.7/src/testing/testing.go:1334 +0x7f
testing.tRunner(0xc000122000, 0xc000071e10)
        C:/go1.14.7/src/testing/testing.go:1039 +0xe3
testing.runTests(0xc000004540, 0x6a7a60, 0x47, 0x47, 0x0)
        C:/go1.14.7/src/testing/testing.go:1332 +0x2ae
testing.(*M).Run(0xc000110000, 0x0)
        C:/go1.14.7/src/testing/testing.go:1249 +0x1be
main.main()
        _testmain.go:394 +0x13c
rax     0x27
rbx     0xc000122120
rcx     0x69a7c0
rdi     0x19db1ded53e8000
rsi     0xbfc992aceb3a1c38
rbp     0xc00003ff70
rsp     0xc00003fe88
r8      0xc00003fce8
r9      0x657b50
r10     0x5bb4e0
r11     0x31602
r12     0x5ad6a4
r13     0x0
r14     0x0
r15     0x2030000
rip     0x52ad65
rflags  0x10202
cs      0x33
fs      0x53
gs      0x2b
FAIL    math    0.024s
FAIL
go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\zhangyoufu\AppData\Local\go-build
set GOENV=C:\Users\zhangyoufu\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\zhangyoufu\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\go1.14.7
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\go1.14.7\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\ZHANGY~1\AppData\Local\Temp\go-build319242354=/tmp/go-build -gno-record-gcc-switches
@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

FMA can be used without using AVX registers which as far as I understand does not need OSXSAVE. So I think requiring OSXSAVE here is not the right fix as this would disallow code to use FMA with xmm (SSE) registers while the OS does not support saving ymm (AVX) registers.

I think to understand whats happening we need to disassemble go1.14.7 math_test.TestFMA and see what the instruction stream is that is the problem. Maybe VMWare pretends FMA is supported while it isnt but it doesnt seem to be an issue with OSXSAVE if this is reproducable to always error on the same PC.

Please run these two commands in /go1.14.7/src/math/ and post the output:
go test -c
go tool objdump -s "math_test.TestFMA" math.test

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

disasm
$ /c/go1.14.7/bin/go tool objdump -s "math_test.TestFMA" math.test.exe
TEXT math_test.TestFMA(SB) C:/go1.14.7/src/math/all_test.go
  all_test.go:3058      0x52acb0                65488b0c2528000000      MOVQ GS:0x28, CX
  all_test.go:3058      0x52acb9                488b8900000000          MOVQ 0(CX), CX
  all_test.go:3058      0x52acc0                488d442490              LEAQ -0x70(SP), AX
  all_test.go:3058      0x52acc5                483b4110                CMPQ 0x10(CX), AX
  all_test.go:3058      0x52acc9                0f86c8040000            JBE 0x52b197
  all_test.go:3058      0x52accf                4881ecf0000000          SUBQ $0xf0, SP
  all_test.go:3058      0x52acd6                4889ac24e8000000        MOVQ BP, 0xe8(SP)
  all_test.go:3058      0x52acde                488dac24e8000000        LEAQ 0xe8(SP), BP
  all_test.go:3059      0x52ace6                488b05cb661700          MOVQ math_test.fmaC+8(SB), AX
  all_test.go:3059      0x52aced                488b0dbc661700          MOVQ math_test.fmaC(SB), CX
  all_test.go:3059      0x52acf4                4885c0                  TESTQ AX, AX
  all_test.go:3059      0x52acf7                0f8ef2000000            JLE 0x52adef
  all_test.go:3059      0x52acfd                4889442468              MOVQ AX, 0x68(SP)
  all_test.go:3059      0x52ad02                31d2                    XORL DX, DX
  all_test.go:3059      0x52ad04                eb17                    JMP 0x52ad1d
  all_test.go:3059      0x52ad06                488b9c2490000000        MOVQ 0x90(SP), BX
  all_test.go:3059      0x52ad0e                4883c320                ADDQ $0x20, BX
  all_test.go:3059      0x52ad12                4889d9                  MOVQ BX, CX
  all_test.go:3059      0x52ad15                4889c2                  MOVQ AX, DX
  all_test.go:3059      0x52ad18                488b442468              MOVQ 0x68(SP), AX
  all_test.go:3059      0x52ad1d                48898c2490000000        MOVQ CX, 0x90(SP)
  all_test.go:3059      0x52ad25                4889542460              MOVQ DX, 0x60(SP)
  all_test.go:3059      0x52ad2a                f20f104118              MOVSD_XMM 0x18(CX), X0
  all_test.go:3059      0x52ad2f                f20f11442458            MOVSD_XMM X0, 0x58(SP)
  all_test.go:3059      0x52ad35                f20f104910              MOVSD_XMM 0x10(CX), X1
  all_test.go:3059      0x52ad3a                f20f114c2450            MOVSD_XMM X1, 0x50(SP)
  all_test.go:3059      0x52ad40                f20f105108              MOVSD_XMM 0x8(CX), X2
  all_test.go:3059      0x52ad45                f20f11542448            MOVSD_XMM X2, 0x48(SP)
  all_test.go:3059      0x52ad4b                f20f1019                MOVSD_XMM 0(CX), X3
  all_test.go:3059      0x52ad4f                f20f115c2440            MOVSD_XMM X3, 0x40(SP)
  all_test.go:3060      0x52ad55                803dbfd81a0000          CMPB $0x0, runtime.x86HasFMA(SB)
  all_test.go:3060      0x52ad5c                0f84ea030000            JE 0x52b14c
  all_test.go:3060      0x52ad62                0f10e1                  MOVUPS X1, X4
  all_test.go:3060      0x52ad65                c4e2e1b9ca660f2e        MOVL $0x2e0f66ca, CX
  bits.go:39            0x52ad6d                c9                      LEAVE
  all_test.go:2093      0x52ad6e                7506                    JNE 0x52ad76
  all_test.go:2093      0x52ad70                0f8ba0030000            JNP 0x52b116
  bits.go:39            0x52ad76                660f2ec0                UCOMISD X0, X0
  all_test.go:2092      0x52ad7a                7506                    JNE 0x52ad82
  all_test.go:2092      0x52ad7c                0f8b94030000            JNP 0x52b116
  all_test.go:2092      0x52ad82                bb01000000              MOVL $0x1, BX
  all_test.go:3061      0x52ad87                84db                    TESTL BL, BL
  all_test.go:3061      0x52ad89                0f8413020000            JE 0x52afa2
  all_test.go:3064      0x52ad8f                488b15f2551700          MOVQ math_test.PortableFMA(SB), DX
  all_test.go:3064      0x52ad96                f20f111c24              MOVSD_XMM X3, 0(SP)
  all_test.go:3064      0x52ad9b                f20f11542408            MOVSD_XMM X2, 0x8(SP)
  all_test.go:3064      0x52ada1                f20f11642410            MOVSD_XMM X4, 0x10(SP)
  all_test.go:3064      0x52ada7                488b02                  MOVQ 0(DX), AX
  all_test.go:3064      0x52adaa                ffd0                    CALL AX
  all_test.go:3064      0x52adac                f20f10442418            MOVSD_XMM 0x18(SP), X0
  bits.go:39            0x52adb2                660f2ec0                UCOMISD X0, X0
  all_test.go:2093      0x52adb6                7506                    JNE 0x52adbe
  all_test.go:2093      0x52adb8                0f8bdc010000            JNP 0x52af9a
  bits.go:39            0x52adbe                f20f104c2458            MOVSD_XMM 0x58(SP), X1
  bits.go:39            0x52adc4                660f2ec9                UCOMISD X1, X1
  all_test.go:2092      0x52adc8                7506                    JNE 0x52add0
  all_test.go:2092      0x52adca                0f8b97010000            JNP 0x52af67
  all_test.go:2092      0x52add0                b801000000              MOVL $0x1, AX
  all_test.go:3065      0x52add5                84c0                    TESTL AL, AL
  all_test.go:3065      0x52add7                7426                    JE 0x52adff
  all_test.go:3059      0x52add9                488b442460              MOVQ 0x60(SP), AX
  all_test.go:3059      0x52adde                48ffc0                  INCQ AX
  all_test.go:3059      0x52ade1                488b4c2468              MOVQ 0x68(SP), CX
  all_test.go:3059      0x52ade6                4839c8                  CMPQ CX, AX
  all_test.go:3059      0x52ade9                0f8c17ffffff            JL 0x52ad06
  all_test.go:3059      0x52adef                488bac24e8000000        MOVQ 0xe8(SP), BP
  all_test.go:3059      0x52adf7                4881c4f0000000          ADDQ $0xf0, SP
  all_test.go:3059      0x52adfe                c3                      RET
  all_test.go:3064      0x52adff                f20f11442438            MOVSD_XMM X0, 0x38(SP)
  all_test.go:3066      0x52ae05                f20f10442440            MOVSD_XMM 0x40(SP), X0
  all_test.go:3066      0x52ae0b                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3066      0x52ae10                e85bf8edff              CALL runtime.convT64(SB)
  all_test.go:3066      0x52ae15                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3066      0x52ae1a                4889842488000000        MOVQ AX, 0x88(SP)
  all_test.go:3066      0x52ae22                f20f10442448            MOVSD_XMM 0x48(SP), X0
  all_test.go:3066      0x52ae28                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3066      0x52ae2d                e83ef8edff              CALL runtime.convT64(SB)
  all_test.go:3066      0x52ae32                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3066      0x52ae37                4889842480000000        MOVQ AX, 0x80(SP)
  all_test.go:3066      0x52ae3f                f20f10442450            MOVSD_XMM 0x50(SP), X0
  all_test.go:3066      0x52ae45                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3066      0x52ae4a                e821f8edff              CALL runtime.convT64(SB)
  all_test.go:3066      0x52ae4f                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3066      0x52ae54                4889442478              MOVQ AX, 0x78(SP)
  all_test.go:3066      0x52ae59                f20f10442438            MOVSD_XMM 0x38(SP), X0
  all_test.go:3066      0x52ae5f                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3066      0x52ae64                e807f8edff              CALL runtime.convT64(SB)
  all_test.go:3066      0x52ae69                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3066      0x52ae6e                4889442470              MOVQ AX, 0x70(SP)
  all_test.go:3066      0x52ae73                f20f10442458            MOVSD_XMM 0x58(SP), X0
  all_test.go:3066      0x52ae79                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3066      0x52ae7e                e8edf7edff              CALL runtime.convT64(SB)
  all_test.go:3066      0x52ae83                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3066      0x52ae88                488dbc2498000000        LEAQ 0x98(SP), DI
  all_test.go:3066      0x52ae90                0f57c0                  XORPS X0, X0
  all_test.go:3066      0x52ae93                488d7fd0                LEAQ -0x30(DI), DI
  all_test.go:3066      0x52ae97                48896c24f0              MOVQ BP, -0x10(SP)
  all_test.go:3066      0x52ae9c                488d6c24f0              LEAQ -0x10(SP), BP
  all_test.go:3066      0x52aea1                e8cfa8f3ff              CALL 0x465775
  all_test.go:3066      0x52aea6                488b6d00                MOVQ 0(BP), BP
  all_test.go:3066      0x52aeaa                488d0d0f0f0200          LEAQ runtime.types+81344(SB), CX
  all_test.go:3066      0x52aeb1                48898c2498000000        MOVQ CX, 0x98(SP)
  all_test.go:3066      0x52aeb9                488b942488000000        MOVQ 0x88(SP), DX
  all_test.go:3066      0x52aec1                48899424a0000000        MOVQ DX, 0xa0(SP)
  all_test.go:3066      0x52aec9                48898c24a8000000        MOVQ CX, 0xa8(SP)
  all_test.go:3066      0x52aed1                488b942480000000        MOVQ 0x80(SP), DX
  all_test.go:3066      0x52aed9                48899424b0000000        MOVQ DX, 0xb0(SP)
  all_test.go:3066      0x52aee1                48898c24b8000000        MOVQ CX, 0xb8(SP)
  all_test.go:3066      0x52aee9                488b542478              MOVQ 0x78(SP), DX
  all_test.go:3066      0x52aeee                48899424c0000000        MOVQ DX, 0xc0(SP)
  all_test.go:3066      0x52aef6                48898c24c8000000        MOVQ CX, 0xc8(SP)
  all_test.go:3066      0x52aefe                488b542470              MOVQ 0x70(SP), DX
  all_test.go:3066      0x52af03                48899424d0000000        MOVQ DX, 0xd0(SP)
  all_test.go:3066      0x52af0b                48898c24d8000000        MOVQ CX, 0xd8(SP)
  all_test.go:3066      0x52af13                48898424e0000000        MOVQ AX, 0xe0(SP)
  all_test.go:3066      0x52af1b                488b8424f8000000        MOVQ 0xf8(SP), AX
  all_test.go:3066      0x52af23                8400                    TESTB AL, 0(AX)
  all_test.go:3066      0x52af25                48890424                MOVQ AX, 0(SP)
  all_test.go:3066      0x52af29                488d15ddd90500          LEAQ go.string.*+37909(SB), DX
  all_test.go:3066      0x52af30                4889542408              MOVQ DX, 0x8(SP)
  all_test.go:3066      0x52af35                48c744241024000000      MOVQ $0x24, 0x10(SP)
  all_test.go:3066      0x52af3e                488d9c2498000000        LEAQ 0x98(SP), BX
  all_test.go:3066      0x52af46                48895c2418              MOVQ BX, 0x18(SP)
  all_test.go:3066      0x52af4b                48c744242005000000      MOVQ $0x5, 0x20(SP)
  all_test.go:3066      0x52af54                48c744242805000000      MOVQ $0x5, 0x28(SP)
  all_test.go:3066      0x52af5d                e85e0afbff              CALL testing.(*common).Errorf(SB)
  all_test.go:3066      0x52af62                e972feffff              JMP 0x52add9
  all_test.go:2095      0x52af67                660f2ec1                UCOMISD X1, X0
  all_test.go:2095      0x52af6b                7526                    JNE 0x52af93
  all_test.go:2095      0x52af6d                7a24                    JP 0x52af93
  unsafe.go:23          0x52af6f                66480f7ec1              MOVQ X0, CX
  signbit.go:9          0x52af74                480fbae13f              BTQ $0x3f, CX
  signbit.go:9          0x52af79                0f92c1                  SETB CL
  unsafe.go:23          0x52af7c                66480f7ecb              MOVQ X1, BX
  signbit.go:9          0x52af81                480fbae33f              BTQ $0x3f, BX
  signbit.go:9          0x52af86                0f92c3                  SETB BL
  all_test.go:2096      0x52af89                38cb                    CMPL CL, BL
  all_test.go:2096      0x52af8b                0f94c0                  SETE AL
  all_test.go:3065      0x52af8e                e942feffff              JMP 0x52add5
  all_test.go:3065      0x52af93                31c0                    XORL AX, AX
  all_test.go:3065      0x52af95                e93bfeffff              JMP 0x52add5
  all_test.go:2095      0x52af9a                f20f104c2458            MOVSD_XMM 0x58(SP), X1
  all_test.go:2093      0x52afa0                ebc5                    JMP 0x52af67
  all_test.go:3060      0x52afa2                f20f114c2430            MOVSD_XMM X1, 0x30(SP)
  all_test.go:3062      0x52afa8                f20f111c24              MOVSD_XMM X3, 0(SP)
  all_test.go:3062      0x52afad                e8bef6edff              CALL runtime.convT64(SB)
  all_test.go:3062      0x52afb2                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3062      0x52afb7                4889842488000000        MOVQ AX, 0x88(SP)
  all_test.go:3062      0x52afbf                f20f10442448            MOVSD_XMM 0x48(SP), X0
  all_test.go:3062      0x52afc5                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3062      0x52afca                e8a1f6edff              CALL runtime.convT64(SB)
  all_test.go:3062      0x52afcf                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3062      0x52afd4                4889842480000000        MOVQ AX, 0x80(SP)
  all_test.go:3062      0x52afdc                f20f10442450            MOVSD_XMM 0x50(SP), X0
  all_test.go:3062      0x52afe2                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3062      0x52afe7                e884f6edff              CALL runtime.convT64(SB)
  all_test.go:3062      0x52afec                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3062      0x52aff1                4889442478              MOVQ AX, 0x78(SP)
  all_test.go:3062      0x52aff6                f20f10442430            MOVSD_XMM 0x30(SP), X0
  all_test.go:3062      0x52affc                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3062      0x52b001                e86af6edff              CALL runtime.convT64(SB)
  all_test.go:3062      0x52b006                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3062      0x52b00b                4889442470              MOVQ AX, 0x70(SP)
  all_test.go:3062      0x52b010                f20f10442458            MOVSD_XMM 0x58(SP), X0
  all_test.go:3062      0x52b016                f20f110424              MOVSD_XMM X0, 0(SP)
  all_test.go:3062      0x52b01b                e850f6edff              CALL runtime.convT64(SB)
  all_test.go:3062      0x52b020                488b442408              MOVQ 0x8(SP), AX
  all_test.go:3062      0x52b025                488dbc2498000000        LEAQ 0x98(SP), DI
  all_test.go:3062      0x52b02d                0f57c0                  XORPS X0, X0
  all_test.go:3062      0x52b030                488d7fd0                LEAQ -0x30(DI), DI
  all_test.go:3062      0x52b034                48896c24f0              MOVQ BP, -0x10(SP)
  all_test.go:3062      0x52b039                488d6c24f0              LEAQ -0x10(SP), BP
  all_test.go:3062      0x52b03e                e832a7f3ff              CALL 0x465775
  all_test.go:3062      0x52b043                488b6d00                MOVQ 0(BP), BP
  all_test.go:3062      0x52b047                488d0d720d0200          LEAQ runtime.types+81344(SB), CX
  all_test.go:3062      0x52b04e                48898c2498000000        MOVQ CX, 0x98(SP)
  all_test.go:3062      0x52b056                488b942488000000        MOVQ 0x88(SP), DX
  all_test.go:3062      0x52b05e                48899424a0000000        MOVQ DX, 0xa0(SP)
  all_test.go:3062      0x52b066                48898c24a8000000        MOVQ CX, 0xa8(SP)
  all_test.go:3062      0x52b06e                488b942480000000        MOVQ 0x80(SP), DX
  all_test.go:3062      0x52b076                48899424b0000000        MOVQ DX, 0xb0(SP)
  all_test.go:3062      0x52b07e                48898c24b8000000        MOVQ CX, 0xb8(SP)
  all_test.go:3062      0x52b086                488b542478              MOVQ 0x78(SP), DX
  all_test.go:3062      0x52b08b                48899424c0000000        MOVQ DX, 0xc0(SP)
  all_test.go:3062      0x52b093                48898c24c8000000        MOVQ CX, 0xc8(SP)
  all_test.go:3062      0x52b09b                488b542470              MOVQ 0x70(SP), DX
  all_test.go:3062      0x52b0a0                48899424d0000000        MOVQ DX, 0xd0(SP)
  all_test.go:3062      0x52b0a8                48898c24d8000000        MOVQ CX, 0xd8(SP)
  all_test.go:3062      0x52b0b0                48898424e0000000        MOVQ AX, 0xe0(SP)
  all_test.go:3062      0x52b0b8                488b8424f8000000        MOVQ 0xf8(SP), AX
  all_test.go:3062      0x52b0c0                8400                    TESTB AL, 0(AX)
  all_test.go:3062      0x52b0c2                48890424                MOVQ AX, 0(SP)
  all_test.go:3062      0x52b0c6                488d1586b50500          LEAQ go.string.*+29019(SB), DX
  all_test.go:3062      0x52b0cd                4889542408              MOVQ DX, 0x8(SP)
  all_test.go:3062      0x52b0d2                48c74424101c000000      MOVQ $0x1c, 0x10(SP)
  all_test.go:3062      0x52b0db                488d9c2498000000        LEAQ 0x98(SP), BX
  all_test.go:3062      0x52b0e3                48895c2418              MOVQ BX, 0x18(SP)
  all_test.go:3062      0x52b0e8                48c744242005000000      MOVQ $0x5, 0x20(SP)
  all_test.go:3062      0x52b0f1                48c744242805000000      MOVQ $0x5, 0x28(SP)
  all_test.go:3062      0x52b0fa                e8c108fbff              CALL testing.(*common).Errorf(SB)
  all_test.go:3064      0x52b0ff                f20f10542448            MOVSD_XMM 0x48(SP), X2
  all_test.go:3064      0x52b105                f20f105c2440            MOVSD_XMM 0x40(SP), X3
  all_test.go:3064      0x52b10b                f20f10642450            MOVSD_XMM 0x50(SP), X4
  all_test.go:3062      0x52b111                e979fcffff              JMP 0x52ad8f
  all_test.go:2095      0x52b116                660f2ec8                UCOMISD X0, X1
  all_test.go:2095      0x52b11a                7529                    JNE 0x52b145
  all_test.go:2095      0x52b11c                7a27                    JP 0x52b145
  unsafe.go:23          0x52b11e                66480f7ece              MOVQ X1, SI
  signbit.go:9          0x52b123                480fbae63f              BTQ $0x3f, SI
  signbit.go:9          0x52b128                400f92c6                SETB SI
  unsafe.go:23          0x52b12c                66480f7ec7              MOVQ X0, DI
  signbit.go:9          0x52b131                480fbae73f              BTQ $0x3f, DI
  signbit.go:9          0x52b136                400f92c7                SETB DI
  all_test.go:2096      0x52b13a                4038f7                  CMPL SI, DI
  all_test.go:2096      0x52b13d                0f94c3                  SETE BL
  all_test.go:3061      0x52b140                e942fcffff              JMP 0x52ad87
  all_test.go:3061      0x52b145                31db                    XORL BX, BX
  all_test.go:3061      0x52b147                e93bfcffff              JMP 0x52ad87
  all_test.go:3060      0x52b14c                f20f111c24              MOVSD_XMM X3, 0(SP)
  all_test.go:3060      0x52b151                f20f11542408            MOVSD_XMM X2, 0x8(SP)
  all_test.go:3060      0x52b157                f20f114c2410            MOVSD_XMM X1, 0x10(SP)
  all_test.go:3060      0x52b15d                e85e89f6ff              CALL math.FMA(SB)
  all_test.go:3060      0x52b162                f20f104c2418            MOVSD_XMM 0x18(SP), X1
  all_test.go:3059      0x52b168                488b442468              MOVQ 0x68(SP), AX
  all_test.go:3059      0x52b16d                488b8c2490000000        MOVQ 0x90(SP), CX
  all_test.go:3059      0x52b175                488b542460              MOVQ 0x60(SP), DX
  bits.go:39            0x52b17a                f20f10442458            MOVSD_XMM 0x58(SP), X0
  all_test.go:3064      0x52b180                f20f10542448            MOVSD_XMM 0x48(SP), X2
  all_test.go:3064      0x52b186                f20f105c2440            MOVSD_XMM 0x40(SP), X3
  all_test.go:3064      0x52b18c                f20f10642450            MOVSD_XMM 0x50(SP), X4
  all_test.go:3060      0x52b192                e9d3fbffff              JMP 0x52ad6a
  all_test.go:3058      0x52b197                e8047af3ff              CALL runtime.morestack_noctxt(SB)
  all_test.go:3058      0x52b19c                e90ffbffff              JMP math_test.TestFMA(SB)
  :-1                   0x52b1a1                cc                      INT $0x3
  :-1                   0x52b1a2                cc                      INT $0x3
  :-1                   0x52b1a3                cc                      INT $0x3
  :-1                   0x52b1a4                cc                      INT $0x3
  :-1                   0x52b1a5                cc                      INT $0x3
  :-1                   0x52b1a6                cc                      INT $0x3
  :-1                   0x52b1a7                cc                      INT $0x3
  :-1                   0x52b1a8                cc                      INT $0x3
  :-1                   0x52b1a9                cc                      INT $0x3
  :-1                   0x52b1aa                cc                      INT $0x3
  :-1                   0x52b1ab                cc                      INT $0x3
  :-1                   0x52b1ac                cc                      INT $0x3
  :-1                   0x52b1ad                cc                      INT $0x3
  :-1                   0x52b1ae                cc                      INT $0x3
  :-1                   0x52b1af                cc                      INT $0x3

GODEBUG=cpu.fma=off does not help.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

According to Intel's manual, 0f10e1 MOVUPS X1, X4 is categorized as 128-bit Legacy SSE instruction, and requires CR4.OSFXSR[bit 9] to be set.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

Thanks for all the infos. I think we are on to something but it seems related to general SSE/SSE2 support and checking that it works at all (not just FMA).

What is the state of CR4.OSFXSR on Vista? (Note that OSFXSR is not OSXSAVE)
What does CPU-Z say for SSE and SSE2 CPUID feature flags? (Those should be enabled)

amd64 Go minimal requirement is SSE2. So if SSE/SSE2 is not supported yes Go does not work (with or without FMA). MOVUPS
is an SSE instructition.

The action I would then see here is to check CR4.OSFXSR as well as SSE/SSE2 CPUID on go runtime start and if it is not set then stop right there with a warning message (similar to MMX on 386) independent of if FMA is used or not.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

What is the state of CR4.OSFXSR on Vista?

I need to find a way to dump CR4. GDB cannot dump control registers using info registers.

What does CPU-Z say for SSE and SSE2 CPUID feature flags?

image

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

It seems reading CR4 is priviledged so likely Go wont be able to read it. There seems to be a windows API way to do this: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getenabledxstatefeatures

If there is no easy way to check if Windows supports SSE from Go code and all supported Windows versions of Go work I dont think there will be a fix to prevent Go starting. The issue is not related to FMA and could trigger in other Go code too. SSE/SSE2 itself is supported by all amd64 compatible CPUs so it can only be the OS that doesnt support them on amd64.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

Windows Server 2008 x64

GDB>r cr4
cr4=0x6f8

Windows Server 2008 R2 x64

GDB>r cr4
cr4=0x406f8
@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

But what is CR4 on vista?

At any rate Go amd64 is not supported when SSE and SSE2 are not supported. Adding more checks to FMA wont change that and the FMA instruction wasnt the fault here.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

AFAIK, Vista and Server 2008 (without R2 suffix) share the same(?) kernel. I can check cr4 on Vista if you insist.

I think the behavior of MOVUPS (0F 10) on my hardware does not match Intel's manual. Somehow it generated #UD when OSXSAVE=0.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

I can check cr4 on Vista if you insist

I dont think there is anything that Go could do better here.

OSXSAVE is not related to MOVUPS. OSXSAVE is only required for AVX.

MOVUPS needs to be supported for Go to work as amd64 requirement for Go is support for SSE/SSE2 by CPU and OS.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

I think I misread something. I tried a simple C program with movups and it does not fail on Server 2008.

#include <stdio.h>

int main() {
    __asm__ __volatile__ (
        "movups %%xmm0, %%xmm4\n\t"
    : /* no output */
    : /* no input */
    : "%xmm0", "xmm4"
    );
    puts("Done");
    return 0;
}

I need to minimize reproducing TestFMA's failure.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

The instruction from the dump above that is faulting is:
c4 e2 e1 b9 ca vfmadd231sd xmm1,xmm3,xmm2

This uses SSE registers so SSE (required) + FMA (HasFMA) should be enought. There are no AVX registers involved that would require OSXSAVE.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

go tool objdump generated incorrect disasm for FMA instruction. And I assumed that the PC given by stack trace is biased, and focused on MOVUPS. My bad.

I reproduced this issue with

package main

import (
	"log"
	"math"
)

func main() {
	log.Print(math.FMA(0, 0, 0))
}

Debugged with x64dbg, stopped at

00000000004A8684 | C4E2F9B9C0               | vfmadd231sd xmm0,xmm0,xmm0              |
00000000004A8689 | F2:0F110424              | movsd qword ptr ss:[rsp],xmm0           |

While go tool objdump shows

  a.go:9                0x4a8684                c4e2f9b9c0f20f11        MOVL $0x110ff2c0, CX
  a.go:9                0x4a868c                0424                    ADDL $0x24, AL

FMA fast path should not be used when OSXSAVE=0.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

But why should it not be used without OSXSAVE? The FMA instruction above uses xmm registers that are supported otherwise other SSE/SSE2 instructions wouldnt work either and the CPU says it supports FMA.

Requiring OSXSAVE will just mask that Vista doesnt support FMA even if FMA CPUID is set to 1 and SSE is supported. I wasnt able to find any documentation that requires FMA with xmm registers to also have OSXSAVE supported.

I think it is perfectly fine for the OS to store/restore xmm registers with FXSAVE and FXRSTOR which needs to be done for other SSE instructions at any rate.

@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

This behavior is documented in Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2.

See Other Exceptions section under VFMADD231SD and 2.4.3 Exceptions Type 3.

This instruction is using VEX-prefix, if I didn't misread, and requires CR4.OSXSAVE=1.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 26, 2020

You are right! (Thanks for indulging my questions)

The issue is the VEX prefix and 64bit and protected mode.

So I guess we need to guard all vex prefixed instructions extensions with an OSXSAVE checks to not set them true. I think this will apply to other instruction set additions like BMI too.

The specific issue here is in internal/cpu and it needs to be fixed in x/sys/cpu too.

@martisch martisch changed the title x/sys/cpu: HasFMA should check HasOSXSAVE is true internal/cpu: VEX prefixed instructions require OSXSAVE Aug 26, 2020
@martisch martisch added NeedsFix and removed WaitingForInfo labels Aug 26, 2020
@martisch martisch self-assigned this Aug 26, 2020
@martisch martisch modified the milestones: Unreleased, Go1.16 Aug 26, 2020
@zhangyoufu
Copy link
Author

@zhangyoufu zhangyoufu commented Aug 26, 2020

FYI, I created another issue for the incorrect objdump of FMA instruction. #41043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.