Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory corruption on linux/386 with float32 arithmetic, GO386=387, buildmode pie/c-archive #41503

Closed
BenLubar opened this issue Sep 20, 2020 · 23 comments

Comments

@BenLubar
Copy link

@BenLubar BenLubar commented Sep 20, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.1 linux/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="386"
GOBIN="/home/ben/go/bin"
GOCACHE="/home/ben/.cache/go-build"
GOENV="/home/ben/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ben/.golang-path/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="android"
GOPATH="/home/ben/.golang-path"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.15"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.15/pkg/tool/linux_amd64"
GCCGO="gccgo"
GO386="387"
AR="ar"
CC="/storage/android/android-sdk/ndk-bundle/toolchains/llvm/prebuilt/linux-x86_64/bin/i686-linux-android30-clang"
CXX="/storage/android/android-sdk/ndk-bundle/toolchains/llvm/prebuilt/linux-x86_64/bin/i686-linux-android30-clang++"
CGO_ENABLED="1"
GOMOD="/home/ben/src/games/spy-cards/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m32 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build991622513=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://play.golang.org/p/Ha0xVDFyLCd

gomobile build and then run on the android emulator

What did you expect to see?

The second log line should print the same pointer twice.

What did you see instead?

I/GoLog: 0x9640e040 "(blank sprite)" (initial)
I/GoLog: 0xc6be79c3, 0x9640e040, equals: true "(blank sprite)" (in struct)
@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 20, 2020

printing sprite with %+v format: &{S0:-0 S1:-1.950005e-09 T0:1 T1:1 X0:-0.5 X1:-1.5831242e-18 Y0:7.966319e-21 Y1:-1.5831242e-18 spriteSheet:0xc6c73923}

the same with all float32 changed to float64: &{S0:0 S1:1 T0:0 T1:1 X0:-0.5 X1:0.5 Y0:-0.5 Y1:0.5 spriteSheet:0x96914100}

doing float64 arithmetic but having all fields be float32: &{S0:0 S1:1 T0:0 T1:1 X0:-0.5 X1:0.5 Y0:-0.5 Y1:0.5 spriteSheet:0x9640e040}

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Sep 20, 2020

Thank you for this report @BenLubar. Kindly ccing some mobile folks @eliasnaur @hyangah.

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 20, 2020

Does this problem reproduce on linux/386?

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 20, 2020

FWIW, I failed to reproduce the error on an Android API 29 emulator, with

$ go version
go version go1.15.2 linux/amd64
@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 20, 2020

the problem does not reproduce on linux/386

@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 20, 2020

full repro instructions:

go get golang.org/x/mobile/cmd/gomobile
mkdir issue41503 && cd issue41503
go mod init issue41503
wget -O main.go https://play.golang.org/p/Ha0xVDFyLCd.go
gomobile build

running the resulting APK on system-images;android-29;default;x86 (revision 7) does reproduce the memory corruption.
running the resulting APK on system-images;android-29;default;x86_64 (revision 7) does not reproduce the memory corruption.

I'm running the emulator on a Windows 10 machine as my Linux machine is headless.

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 21, 2020

Thanks. I created an "system-images;android-29;default;x86" avd, but had no luck in reproducing the corruption. Can you try Go 1.15.2? What's your NDK version? I'm using version 21.1.6352462.

@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 21, 2020

ben@urist:~$ go version
go version go1.15.2 linux/amd64
ben@urist:~$ gomobile version
gomobile version +973feb4 Sat Aug 1 11:21:45 2020 +0000 (android); androidSDK=/storage/android/android-sdk/platforms/android-30

NDK is 21.3.6528147

@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 21, 2020

in case disassembly helps diagnose this, here's the APK generated by gomobile: https://ben.lubar.me/dump/issue41503.apk

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 21, 2020

That's odd, your apk does reproduce the differing pointers, but I'm unable to build a version myself that does it. I tried NDK 21.3.6528147, no luck.

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 21, 2020

I managed to reproduce the bug by setting GO386=387 like your go env. Is there a reason you set that? Given #40255, I don't think this issue is worth exploring further if that setting is the culprit.

@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 21, 2020

I'm not sure how GO386 is being set. I can manually set it to sse2, but it's not being set to 387 in my environment variables or my wrapper script that sets up variables like CC.

@BenLubar BenLubar closed this Sep 21, 2020
@martisch
Copy link
Contributor

@martisch martisch commented Sep 21, 2020

Does the computer you build on support SSE2? Otherwise 387 is set.

if cansse2() {

Can the problem be reproduced on linux/386 with GO386=387?

It is likely still good to understand what the root cause is here especially if reproducible outside android.

@BenLubar
Copy link
Author

@BenLubar BenLubar commented Sep 21, 2020

The build machine has a Ryzen 5 1600, so it definitely supports SSE2.

I'm using https://launchpad.net/~longsleep/+archive/ubuntu/golang-backports so the 387 might be from there.

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 21, 2020

Does the computer you build on support SSE2? Otherwise 387 is set.

if cansse2() {

Can the problem be reproduced on linux/386 with GO386=387?

It is likely still good to understand what the root cause is here especially if reproducible outside android.

Note that buildmode c-archive is used for Android. That may help to reproduce on linux/386.

@eliasnaur eliasnaur reopened this Sep 21, 2020
@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 21, 2020

Got it:

$ CGO_ENABLED=1 GOARCH=386 GO386=387 go run -buildmode=pie main.go
2020/09/21 22:49:40 0x57c0e030 "(blank sprite)" (initial)
2020/09/21 22:49:40 0x566b2823, 0x57c0e030, equals: true "(blank sprite)" (in struct)

Note the different pointer values on the second line.

@eliasnaur eliasnaur changed the title memory corruption on android/386 with float32 arithmetic memory corruption on linux/386 with float32 arithmetic Sep 21, 2020
@eliasnaur eliasnaur changed the title memory corruption on linux/386 with float32 arithmetic memory corruption on linux/386 with float32 arithmetic, GO386=387, buildmode pie/c-archive Sep 21, 2020
@martisch
Copy link
Contributor

@martisch martisch commented Sep 22, 2020

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 24, 2020

To reproduce, I had to run:

$ sudo apt-get install libglvnd-dev:i386

Then Elias's command worked.

I get a fault, not just a bad print:

$ CGO_ENABLED=1 GOARCH=386 GO386=387 go run -buildmode=pie issue41503.go
2020/09/24 12:09:31 0x5780e030 "(blank sprite)" (initial)
unexpected fault address 0x566d6509
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x566d6509 pc=0x566d64ff]

goroutine 1 [running, locked to thread]:
runtime.throw(0x566da320, 0x5)
	/home/khr/go1.13.4/src/runtime/panic.go:774 +0x70 fp=0x57885efc sp=0x57885ee8 pc=0x56653a00
runtime.sigpanic()
	/home/khr/go1.13.4/src/runtime/signal_unix.go:401 +0x368 fp=0x57885f14 sp=0x57885efc pc=0x56667fe8
main.(*spriteSheet).sprite(0x5780e030, 0x0, 0x0, 0x40000000, 0x40000000, 0x3f000000, 0x3f000000, 0x2)
	/home/khr/gowork/issue41503/issue41503.go:53 +0x15f fp=0x57885f74 sp=0x57885f14 pc=0x566d64ff
main.init()
	/home/khr/gowork/issue41503/issue41503.go:36 +0xc4 fp=0x57885f9c sp=0x57885f74 pc=0x566d6a64
runtime.doInit(0x567d0380)
	/home/khr/go1.13.4/src/runtime/proc.go:5222 +0x6b fp=0x57885fb0 sp=0x57885f9c pc=0x5666192b
runtime.main()
	/home/khr/go1.13.4/src/runtime/proc.go:190 +0x234 fp=0x57885ff0 sp=0x57885fb0 pc=0x56655634
runtime.goexit()
	/home/khr/go1.13.4/src/runtime/asm_386.s:1325 +0x1 fp=0x57885ff4 sp=0x57885ff0 pc=0x5667de31

goroutine 6 [select]:
golang.org/x/mobile/app.pump.func1(0x5781c1c0, 0x57810208)
	/home/khr/gopath/pkg/mod/golang.org/x/mobile@v0.0.0-20200801112145-973feb4309de/app/app.go:164 +0x15d
created by golang.org/x/mobile/app.pump
	/home/khr/gopath/pkg/mod/golang.org/x/mobile@v0.0.0-20200801112145-973feb4309de/app/app.go:147 +0x92
exit status 2

That crash is happening here:

   e34e8:	d9 7c 24 18          	fnstcw 0x18(%esp)
   e34ec:	e8 6f 39 f5 ff       	call   36e60 <__x86.get_pc_thunk.cx>
   e34f1:	d9 a9 73 88 0f 00    	fldcw  0xf8873(%ecx)
   e34f7:	de fa                	fdivrp %st,%st(2)
   e34f9:	d9 6c 24 18          	fldcw  0x18(%esp)
   e34fd:	d9 c1                	fld    %st(1)
   e34ff:	d9 59 18             	fstps  0x18(%ecx)          <- this one
   e3502:	d9 c4                	fld    %st(4)
   e3504:	dd da                	fstp   %st(2)

My guess is that the CX used for global variable access, set at e34ec, is overwriting the CX which contains the pointer to the sprite. (The 0x18 is the offset of the field Y0 in the Sprite type, written at line 53).

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 24, 2020

Here's a simple reproducer:

package main

type T [2]float32

//go:noinline
func f(p *T, i int, x, y, z, w float32) {
	a0 := g0
	a1 := g1
	a2 := g2
	p[i] = x / y
	p[i+1] = z / w
	g0 = a0
	g1 = a1
	g2 = a2
}

func main() {
	var t T
	f(&t, 0, 1, 1, 2, 2)
	println(g0, g1, g2)
}

var g0, g1, g2 int

Run with

CGO_ENABLED=1 GOARCH=386 GO386=387 go run -buildmode=pie tmp1.go

It should print 0 0 0 but instead it prints 0 0 <some random big integer>.

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 24, 2020

The code needs to do a FLDCW (float load control word) from a global to set to 32-bit rounding for the divide.
To load that global, we need a register to put the PC in, because we're compiling with -pie.
But all the registers are being used. Oops.
Normally access to globals is ok because we always operate on them with LEAL or MOVL (search for "thunk" in cmd/compile/internal/ssa/gen/386.rules). But that's only for operations on globals generated in the normal SSA pipeline. Operations on globals introduced in cmd/compile/internal/ssa/x86/387.go aren't like that (and they need to be).

I think we need to do some register shuffling to ensure correct compatibility of 387+pie.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 24, 2020

Change https://golang.org/cl/257277 mentions this issue: cmd/compile: prevent 387+float32+pie from clobbering registers

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 24, 2020

@gopherbot Please open backport issues.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 24, 2020

Backport issue(s) opened: #41619 (for 1.14), #41620 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot gopherbot closed this in ea106cc Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.