Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: go binaries not working on exynos 64 bit CPUs #28431

Closed
noxer opened this issue Oct 26, 2018 · 27 comments
Closed

cmd/compile: go binaries not working on exynos 64 bit CPUs #28431

noxer opened this issue Oct 26, 2018 · 27 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@noxer
Copy link

noxer commented Oct 26, 2018

Go binaries have stopped working on Galaxy S9+. The breaking commit seems to be 0a7ac93.

What version of Go are you using (go version)?

go 1.11.1 (worked in go 1.10.4)

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Android on Arm64 (Android 8.0 on a Galaxy S9+)

What did you do?

Used gomobile to compile the following code and used it in an Android app to display a toast notification.

package hello

import (
	"runtime"
)

func Message() string {
	return "Hello: " + runtime.Version()
}

What did you expect to see?

The app displaying a Toast with the text "Hello: "

What did you see instead?

The app crashing with a SIGILL.

Credits

The problem was brought to our attention in the gophers slack channel by @shipit.

Git bisect output

0a7ac93c27c9ade79fe0f66ae0bb81484c241ae5 is the first bad commit
commit 0a7ac93c27c9ade79fe0f66ae0bb81484c241ae5
Author: Wei Xiao <wei.xiao@arm.com>
Date:   Fri Nov 3 02:05:28 2017 +0000

    cmd/compile: improve atomic add intrinsics with ARMv8.1 new instruction
    
    ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
    CL improves existing atomic add intrinsics with the new instruction. Since the
    new instruction is only guaranteed to be present after ARMv8.1, we guard its
    usage with a conditional on CPU feature.
    
    Performance result on ARMv8.1 machine:
    name        old time/op  new time/op  delta
    Xadd-224    1.05µs ± 6%  0.02µs ± 4%  -98.06%  (p=0.000 n=10+8)
    Xadd64-224  1.05µs ± 3%  0.02µs ±13%  -98.10%  (p=0.000 n=9+10)
    [Geo mean]  1.05µs       0.02µs       -98.08%
    
    Performance result on ARMv8.0 machine:
    name        old time/op  new time/op  delta
    Xadd-46      538ns ± 1%   541ns ± 1%  +0.62%  (p=0.000 n=9+9)
    Xadd64-46    505ns ± 1%   508ns ± 0%  +0.48%  (p=0.003 n=9+8)
    [Geo mean]   521ns        524ns       +0.55%
    
    Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
    Reviewed-on: https://go-review.googlesource.com/81877
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Cherry Zhang <cherryyz@google.com>

:040000 040000 848bfd65d49b5fe8b5472954c50e6596a4ad15a6 c77dcbdac660c41fb79e07785745459b5d704489 M	src
@noxer
Copy link
Author

noxer commented Oct 26, 2018

grafik
CPU info

@randall77
Copy link
Contributor

Seems like a likely culprit.
The question is: is the go toolchain not guarding the new instruction correctly, or is your system incorrectly reporting that the instruction is available, when it isn't?

I see "atomics" on your list of cpu features. That's the feature we're using to guard the new instruction.
Is that not the right one? Or is your chip lying to you?

@noxer
Copy link
Author

noxer commented Oct 26, 2018

How can I help to find that out?

@randall77
Copy link
Contributor

Read the docs for that new instruction (LDADDAL)?
I notice that your chip has two different speed cores on it. Maybe some support LDADDAL and some don't?
Read through src/internal/cpu/cpu_arm64.go and look for the atomics flag. See if the constants there agree with the hwcap bits we should be checking.
Compile a binary and make sure the guard code around the new instruction looks correct.

@noxer
Copy link
Author

noxer commented Oct 26, 2018

We've encountered the error only in the 64 bit version, compiling it with 32 bit support only worked perfectly.

@noxer
Copy link
Author

noxer commented Oct 26, 2018

I notice that your chip has two different speed cores on it. Maybe some support LDADDAL and some don't?

All cores seem to support atomics (flags are set).

Read through src/internal/cpu/cpu_arm64.go and look for the atomics flag. See if the constants there agree with the hwcap bits we should be checking.

They do

Compile a binary and make sure the guard code around the new instruction looks correct.

This goes a little beyond my capabilities.

@noxer
Copy link
Author

noxer commented Oct 26, 2018

Reading through the ARM manual (https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf) I suspect there is a check in ID_AA64ISAR0_EL1 (PDF page 2518) for the Arm 8.1 atomics missing.

@noxer
Copy link
Author

noxer commented Oct 26, 2018

Also a hint is the CPU info from the device:

$ cat /proc/cpuinfo                                                                                   
processor	: 0
BogoMIPS	: 52.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd05
CPU revision	: 1
// + 3 more

processor	: 4
BogoMIPS	: 52.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer	: 0x53
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0x002
CPU revision	: 0
// + 3 more

as you can see the small cores are ARM 8.0 and the big cores are 8.1.

I notice that your chip has two different speed cores on it. Maybe some support LDADDAL and some don't?

Seems like your first suspicion was right, while they all support atomics, LDADDAL is only available on Arm 8.1, not 8.0.

@randall77
Copy link
Contributor

We've encountered the error only in the 64 bit version, compiling it with 32 bit support only worked perfectly.

That's expected, the new instruction is only used for 64-bit builds.

All cores seem to support atomics (flags are set).

Well, they claim to. We should test that claim. Can you write some assembly with a LDADDAL instruction in it, compile and run it? No Go, just C and/or assembly.
There may be a way to use gcc intrinsics to get that assembly op without writing assembly, not sure.

This goes a little beyond my capabilities.

I took a look and it seems ok at first glance, but the Go arm64 disassembler gives up on the instruction in question, so I can't be sure. I need access to the platform disassembler. Could you build this program (go build program.go), run objdump -d on the executable, and post the disassembled code for main.main?

package main

import "sync/atomic"

var p int64

func main() {
	atomic.AddInt64(&p, 1)
}

@randall77
Copy link
Contributor

We could add a test for the CPU variant if there's an easy way to get it.
Will the OS move a Go process between cores of different types? If so, we need to find the lowest common denominator feature set somehow.

I wish the OS didn't lie to us and present the atomics feature when all cores don't support it. It's going to be hard (possibly, impossible) for us to work around it.

@noxer
Copy link
Author

noxer commented Oct 26, 2018

Piece of information while I try to get you your objdump ready
smartselect_20181027-011758_termux

@noxer
Copy link
Author

noxer commented Oct 26, 2018

Here is the object dump

0000000000087c08 <main.main>:
   87c08:	90000520 	adrp	x0, 12b000 <runtime.trace+0xf2a0>
   87c0c:	913a8c00 	add	x0, x0, #0xea3
   87c10:	39400000 	ldrb	w0, [x0]
   87c14:	b5000120 	cbnz	x0, 87c38 <main.main+0x30>
   87c18:	b24003e0 	orr	x0, xzr, #0x1
   87c1c:	90000521 	adrp	x1, 12b000 <runtime.trace+0xf2a0>
   87c20:	913c8021 	add	x1, x1, #0xf20
   87c24:	c85ffc22 	ldaxr	x2, [x1]
   87c28:	8b000042 	add	x2, x2, x0
   87c2c:	c81bfc22 	stlxr	w27, x2, [x1]
   87c30:	b5ffffbb 	cbnz	x27, 87c24 <main.main+0x1c>
   87c34:	d65f03c0 	ret
   87c38:	b24003e0 	orr	x0, xzr, #0x1
   87c3c:	90000521 	adrp	x1, 12b000 <runtime.trace+0xf2a0>
   87c40:	913c8021 	add	x1, x1, #0xf20
   87c44:	f8e00022 	ldaddal	x0, x2, [x1]
   87c48:	8b000042 	add	x2, x2, x0
   87c4c:	17fffffa 	b	87c34 <main.main+0x2c>
	...

@randall77
Copy link
Contributor

Thanks. That code looks correct to me.
I think your taskset test says it best. Some of your chips just can't execute the instruction, even though the OS says they can.
I'm not sure what the fix would be.
How would a C program handle this? Can any C program actually use the new instructions without being run inside a taskset cage?

@niaow
Copy link

niaow commented Oct 27, 2018

@randall77 the only way I can think of would be for the C code to put itself in a taskset cage. Go code could potentially do something similar by using syscall.SYS_SCHED_SETSCHEDULER (syscall equivalent of taskset), but this seems extremely messy. Auto-detecting this may also be difficult.

@niaow
Copy link

niaow commented Oct 27, 2018

Also, ARM documentation states that the big and little cores must have the same instruction support. Therefore, this is really a hardware bug. Unfortunately, it is a widespread bug.

Source: https://developer.arm.com/technologies/big-little

Clarification: The 4 big cores in the Samsung Exynos 9810 are custom Mongoose 3 cores which support ARMv8.3 ARMv8.0, and the 4 little cores are standard ARM Cortex-A55 cores which support ARMv8.2.

Except LDADDAL is supposed to be ARMv8.1 so the A55 might actually have full ARMv8.2 support.
I was wrong, LDADDAL is supported by the Cortex-A55. It is the other way around - the little cores have a newer ISA version than the big cores.

@katiehockman katiehockman added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 29, 2018
@katiehockman katiehockman added this to the Go1.12 milestone Oct 29, 2018
@niaow
Copy link

niaow commented Oct 29, 2018

If we can confirm that the Go compiler is doing this right, then the best solution may be to report this to Samsung. The most reasonable fix I can think of would be updating the kernel to report the lowest common feature denominator instead of that of the local core.

@randall77
Copy link
Contributor

I agree, this isn't a bug in Go. It's a bug in the feature set reported by the kernel.
I'm going to close this issue on the Go end.
It would probably be helpful for someone to write a C/asm program to demonstrate the problem and report that to the kernel or chip maintainers.

@niaow
Copy link

niaow commented Oct 31, 2018

Update:

I have made a minimal C example of this. It is available on a gist: https://gist.github.com/jadr2ddude/aee7b47a91b70ee3dd75cad97e5ff4e0

Also on twitter: https://twitter.com/CompuJad/status/1057744906943377408

@eliasnaur
Copy link
Contributor

Even though it's a bug in the kernel or hardware, I think Go should avoid triggering it. It seems gomobile (1.11) programs are completely broken on one of the most widely used Android phones.

@eliasnaur eliasnaur reopened this Nov 3, 2018
@steeve
Copy link
Contributor

steeve commented Nov 3, 2018

We have also experienced this behaviour, however only in debug binaries. We could not trigger it with release binaries (somehow).
Perhaps the process was scheduled on ARM v8.2 cores...

@niaow
Copy link

niaow commented Nov 4, 2018

In order to detect this, we would need to read /proc/cpuinfo, and check for a mongoose 3 core. Cannot rely on the reported architecture variant - the kernel reports this incorrectly.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/147377 mentions this issue: runtime: avoid arm64 8.1 atomics on Android

@eliasnaur
Copy link
Contributor

The easy sledgehammer is disabling the optimization if runtime.GOOS == "android".

@gopherbot please consider this for backport to 1.11. It's an unavoidable crash on a popular Android phone and easy to avoid from Go.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #28586 (for 1.11).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@eliasnaur
Copy link
Contributor

I don't have a Samsung S9+ to test CL 147377. Can someone with the offending hardware verify the fix?

@noxer
Copy link
Author

noxer commented Nov 4, 2018

I can confirm that it works now (android app built with gomobile and this Go version).

@niaow
Copy link

niaow commented Feb 10, 2019

Update: the kernel problem was resolved in an update. The kernel no longer reports support for atomics. Image from @noxer of my atomics test.

smartselect_20190210-013206_termux

@golang golang locked and limited conversation to collaborators Feb 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

7 participants