Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instructions (SSE4) in go1.19+ -race builds (fails on some Intel Atom cores) #53743

Closed
cwedgwood opened this issue Jul 8, 2022 · 16 comments
Assignees
Milestone

Comments

@cwedgwood
Copy link
Contributor

What version of Go are you using (go version)?

go version go1.19rc1 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

go build -race generates SSE4 instruction(s) that are executed on non-SSE4 capable CPUs (old Intel Atom).

(pinsrq specifically)

go1.18 works as expected

What did you expect to see?

program should run

What did you see instead?

crash, no output, fails in early runtime:

$ ./program
Illegal instruction

using gdb:

Program received signal SIGILL, Illegal instruction.
0x0000000000431737 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >::Refill(
__sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >::PerClass*, __sanitizer::SizeC
lassAllocator32<__sanitizer::AP32>*, unsigned long) ()

...

   0x0000000000431726 <+406>:   movdqa %xmm1,%xmm0
   0x000000000043172a <+410>:   mov    $0x14,%edx
   0x000000000043172f <+415>:   shl    $0x4,%rax
   0x0000000000431733 <+419>:   shl    $0xa,%rdi
=> 0x0000000000431737 <+423>:   pinsrq $0x1,%rax,%xmm0
   0x000000000043173e <+430>:   mov    %rsi,%rax
   0x0000000000431741 <+433>:   add    $0x1,%rsi
   0x0000000000431745 <+437>:   movups %xmm0,0x8(%rbx,%rdi,1)
@randall77
Copy link
Contributor

Sounds like we need an -march when building the race detector.
I see this only in the linux .syso. The others seem fine for some reason.

@randall77
Copy link
Contributor

This was done intentionally in https://reviews.llvm.org/D106948

I'm not sure if there's anything we can do here. We could conceivably remove that patch and rebuild, but I'm not sure what the implications would be. Presumably just performance? The other OSes don't bump up the requirement, so it presumably still works without it.

The other option would be to require sse4 when running with -race. In that case, we should provide a better error message.

@dvyukov

@randall77 randall77 added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 8, 2022
@randall77 randall77 added this to the Go1.19 milestone Jul 8, 2022
@randall77 randall77 self-assigned this Jul 8, 2022
@gopherbot
Copy link

Change https://go.dev/cl/416477 mentions this issue: cmd/compile: improve GOAMD64=1 violation test

@randall77
Copy link
Contributor

Getting rid of the -msse=4.2 does fix the problem, and introduces no others that I can see. We should probably revert the Go parts of the tsan patch I mentioned.

@thanm This hasn't reached the windows .syso yet, but it will once the changes you're working on land. We should coordinate the rollback so we don't introduce a sse41 dependency in the new windows .syso.

@cwedgwood
Copy link
Contributor Author

It looks to me like the low-power Atom cores didn't support this until 2013 and in some cases 2016.

Might it be possible to have slower fallback paths should they be required? These older cores are still quite prevalent in some places.

@dvyukov
Copy link
Member

dvyukov commented Jul 9, 2022

I suspect a runtime switch will be too expensive. Besides the load-check-branch itself, inlining both version or adding a call will break register alloc.
A separate build of syso files is possible, though. Theoretically we could provide both syso files and select one during linking...
Or just provide a fallback version somewhere on the side and tell people to copy it over.

A proper runtime check in the tsan runtime would be good. The info should be somewhere in CPUID, right?

@randall77
Copy link
Contributor

A separate build of syso files is possible, though. Theoretically we could provide both syso files and select one during linking...

Yes, we could condition on the GOAMD64 version. Currently only the compiler looks at that, but the linker could also.

A proper runtime check in the tsan runtime would be good. The info should be somewhere in CPUID, right?

Yes. There's probably a whole set that are required by -m=sse4.2. Go has code to read them in internal/cpu/cpu_x86.{go,s}.

Do you know how much adding -msse4.2 helps? I'd rather just leave it out and not have people need to jump through hoops. Especially at this point in the release cycle for 1.19. We can get fancier for 1.20.

@dvyukov
Copy link
Member

dvyukov commented Jul 9, 2022

Running go install -race std cmd and then time go install -race -a std cmd gives me with -msse4.2:

$ time go install -race -a std cmd
real 3m2.300s
user 12m41.828s
sys 0m59.242s

$ time go install -race -a std cmd
real 3m4.621s
user 12m43.517s
sys 1m0.203s

$ time go install -race -a std cmd
real 3m4.426s
user 12m38.915s
sys 0m59.916s

without -msse4.2:

$ time go install -race -a std cmd
real 3m41.980s
user 14m38.523s
sys 1m1.868s

$ time go install -race -a std cmd
real 3m40.421s
user 14m32.731s
sys 1m1.647s

$ time go install -race -a std cmd
real 3m41.872s
user 14m35.246s
sys 1m1.504s

So -msse4.2 is -17.3% of real time.

Running some randomly selected TensorFlow C++ tests with -msse4.2:

test 1: Stats over 100 runs: max = 65.4s, min = 13.8s, avg = 39.5s, dev = 11.9s
test 2: Stats over 100 runs: max = 164.0s, min = 77.9s, avg = 96.5s, dev = 11.8s
test 3: Stats over 100 runs: max = 77.8s, min = 13.7s, avg = 39.3s, dev = 16.0s
test 4: Stats over 100 runs: max = 21.8s, min = 8.2s, avg = 13.9s, dev = 2.9s
test 5: Stats over 100 runs: max = 37.9s, min = 22.5s, avg = 28.6s, dev = 3.8s

without -msse4.2:

test 1: Stats over 100 runs: max = 111.6s, min = 13.2s, avg = 43.7s, dev = 18.4s
test 2: Stats over 100 runs: max = 134.7s, min = 80.4s, avg = 106.3s, dev = 8.9s
test 3: Stats over 100 runs: max = 89.0s, min = 17.3s, avg = 50.2s, dev = 17.4s
test 4: Stats over 100 runs: max = 25.0s, min = 9.8s, avg = 16.9s, dev = 3.1s
test 5: Stats over 100 runs: max = 43.9s, min = 20.4s, avg = 30.4s, dev = 5.1s

For avg time this is -9.6%, -9.2%, -21.7%, -17.8%, -5.9% with -msse4.2.

@randall77
Copy link
Contributor

I'm ok with a ~15% slowdown, to keep -race working on older processors.
We can come up with GOAMD64 versioning scheme for 1.20.

@thanm
Copy link
Contributor

thanm commented Jul 10, 2022

I'll put together an LLVM CL for this later in the week.

@thanm
Copy link
Contributor

thanm commented Jul 11, 2022

Sent https://reviews.llvm.org/D129482 with change to Go build scripts.

thanm added a commit to llvm/llvm-project that referenced this issue Jul 11, 2022
This is a partial revert of https://reviews.llvm.org/D106948, changing
just the Go build rules to remove -msse4.2 and revert back to -msse3,
so as to preserve support for older x86 machines. More details at
golang/go#53743.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D129482
@thanm
Copy link
Contributor

thanm commented Jul 11, 2022

LLVM change is submitted. I will work on sending another CL that updates the linux, freebsd, and darwin syso images.

@gopherbot
Copy link

Change https://go.dev/cl/416857 mentions this issue: runtime/race: update amd64 syso images to avoid sse4

@thanm thanm removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 11, 2022
@thanm
Copy link
Contributor

thanm commented Jul 11, 2022

OK, I am going to submit CL 416857, which should resolve this. Note that this updates darwin, freebsd, and linux. The current windows/amd64 syso is old enough that it shouldn't have the problem, and for openbsd/netbsd there are other things blocking the update.

@cwedgwood
Copy link
Contributor Author

@thanm thanks! I confirm this is working.

gopherbot pushed a commit that referenced this issue Jul 20, 2022
Add more opcodes that are only available in >v1 modes.

This test will now correctly detect the regression in -race mode
for #53743.

Change-Id: Icfbb1384e4333d7b4ff167c9ebcb6f4c7aeb6134
Reviewed-on: https://go-review.googlesource.com/c/go/+/416477
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
Rebuild selected amd64 syso images with updated LLVM build rules that
avoid the use of SSE4, so as to ensure that the Go race detector
continues to work on older x86 cpus. No changes to the syso files for
openbsd/amd64 (upstream support has been removed in LLVM) or
netbsd/amd64 (work still in progress there).

Fixes golang#53743.

Change-Id: I738ae4d1e0528c6e06dd4ddb78e7039a30a51779
Reviewed-on: https://go-review.googlesource.com/c/go/+/416857
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
Add more opcodes that are only available in >v1 modes.

This test will now correctly detect the regression in -race mode
for golang#53743.

Change-Id: Icfbb1384e4333d7b4ff167c9ebcb6f4c7aeb6134
Reviewed-on: https://go-review.googlesource.com/c/go/+/416477
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
This is a partial revert of https://reviews.llvm.org/D106948, changing
just the Go build rules to remove -msse4.2 and revert back to -msse3,
so as to preserve support for older x86 machines. More details at
golang/go#53743.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D129482
@gopherbot
Copy link

Change https://go.dev/cl/444396 mentions this issue: runtime/race: add GOAMD64=v3 version of linux race .syso

gopherbot pushed a commit that referenced this issue Oct 21, 2022
Makes -race mode faster, in the 15% speedup range.

Update #53743

Change-Id: I735eb71902b41c924c9f885ded8f7a350a56b751
Reviewed-on: https://go-review.googlesource.com/c/go/+/444396
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
Makes -race mode faster, in the 15% speedup range.

Update golang#53743

Change-Id: I735eb71902b41c924c9f885ded8f7a350a56b751
Reviewed-on: https://go-review.googlesource.com/c/go/+/444396
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
arichardson pushed a commit to CTSRD-CHERI/compiler-rt that referenced this issue Sep 12, 2023
This is a partial revert of https://reviews.llvm.org/D106948, changing
just the Go build rules to remove -msse4.2 and revert back to -msse3,
so as to preserve support for older x86 machines. More details at
golang/go#53743.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D129482
@golang golang locked and limited conversation to collaborators Oct 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants