Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: nil pointer dereference crash when building with an Android NDK toolchain #34788

Open
mvdan opened this issue Oct 9, 2019 · 10 comments

Comments

@mvdan
Copy link
Member

commented Oct 9, 2019

The only requisite is a version of Go and the NDK r18 installed at $ANROID_NDK. I can reproduce on go version devel +a38a917aee Wed Oct 9 06:14:44 2019 +0000 linux/amd64 and go version go1.13.1 linux/amd64.

Below is the script to reproduce the crash:

#!/bin/bash

set -e

go version
cat $ANDROID_NDK/source.properties

$ANDROID_NDK/build/tools/make-standalone-toolchain.sh --arch=x86 --install-dir=toolchain
TOOLCHAIN=$PWD/toolchain

git clone --depth=1 -b obfs4proxy-0.0.11 https://github.com/Yawning/obfs4 2>/dev/null

cd obfs4/obfs4proxy

GOARCH=386 GOOS=android CC=$TOOLCHAIN/bin/i686-linux-android-clang CGO_ENABLED=1 go build -buildmode=exe .

Here is the output from Go master:

go version devel +a38a917aee Wed Oct 9 06:14:44 2019 +0000 linux/amd64
Pkg.Desc = Android NDK
Pkg.Revision = 18.0.5002713
HOST_OS=linux
HOST_EXE=
HOST_ARCH=x86_64
HOST_TAG=linux-x86_64
HOST_NUM_CPUS=8
BUILD_NUM_CPUS=16
Toolchain installed to toolchain.
# gitlab.com/yawning/obfs4.git/obfs4proxy
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x57a7f3]

goroutine 1 [running]:
cmd/link/internal/ld.elfrelocsect(0xc0006fe000, 0xc004cde190, 0xc000378000, 0x1a0c, 0x389c)
        /home/mvdan/tip/src/cmd/link/internal/ld/elf.go:1356 +0x783
cmd/link/internal/ld.Elfemitreloc(0xc0006fe000)
        /home/mvdan/tip/src/cmd/link/internal/ld/elf.go:1381 +0x10b
cmd/link/internal/x86.asmb2(0xc0006fe000)
        /home/mvdan/tip/src/cmd/link/internal/x86/asm.go:712 +0x3d6
cmd/link/internal/ld.Main(0x825c40, 0x10, 0x20, 0x1, 0x4, 0x8, 0x67ca5d, 0x12, 0x67fc3d, 0x18, ...)
        /home/mvdan/tip/src/cmd/link/internal/ld/main.go:271 +0xd7d
main.main()
        /home/mvdan/tip/src/cmd/link/main.go:65 +0x1bc

And from 1.13.1:

go version go1.13.1 linux/amd64
Pkg.Desc = Android NDK
Pkg.Revision = 18.0.5002713
HOST_OS=linux
HOST_EXE=
HOST_ARCH=x86_64
HOST_TAG=linux-x86_64
HOST_NUM_CPUS=8
BUILD_NUM_CPUS=16
Toolchain installed to toolchain.
# gitlab.com/yawning/obfs4.git/obfs4proxy
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x58275f]

goroutine 1 [running]:
cmd/link/internal/ld.elfrelocsect(0xc000056000, 0xc004097c20, 0xc005472000, 0x1b74, 0x39fe)
        /usr/lib/go/src/cmd/link/internal/ld/elf.go:1356 +0x77f
cmd/link/internal/ld.Elfemitreloc(0xc000056000)
        /usr/lib/go/src/cmd/link/internal/ld/elf.go:1381 +0x10b
cmd/link/internal/x86.asmb2(0xc000056000)
        /usr/lib/go/src/cmd/link/internal/x86/asm.go:712 +0x3d6
cmd/link/internal/ld.Main(0x84bc40, 0x10, 0x20, 0x1, 0x4, 0x8, 0x6964d6, 0x12, 0x69969f, 0x18, ...)
        /usr/lib/go/src/cmd/link/internal/ld/main.go:271 +0xd9a
main.main()
        /usr/lib/go/src/cmd/link/main.go:65 +0x1d6

It seems like the line it fails on is an error line, so I assume this should error gracefully instead of panicking. I'm not sure though.

My intuition is that the build should succeed; this project in question (@grote's) was previously using Go 1.11.x without an issue. If the build should succeed here too, I'd suggest we backport a fix for 1.13.x.

/cc @cherrymui @mdempsky @ianlancetaylor @thanm as per the owners doc.

@mvdan mvdan added the NeedsFix label Oct 9, 2019
@mvdan mvdan added this to the Go1.14 milestone Oct 9, 2019
@thanm thanm self-assigned this Oct 9, 2019
@thanm

This comment has been minimized.

Copy link
Member

commented Oct 9, 2019

I'll take look.

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

I haven't looked into this. My guess is that this may be related to CL http://golang.org/cl/173437. Maybe we missed a r.InitExt somewhere. But as @mvdan said, this probably will only get it from crashing to failing. So maybe there is something else.

(My previous comment is wrong.)

@thanm

This comment has been minimized.

Copy link
Member

commented Oct 9, 2019

This appears to be related to the recent changes to the way TLS works on Android (see issue 29674).

As part of Go 1.13 we transitioned to a new TLS access sequence for Android that loads up the newly defined "runtime.tls_g" symbol. If you compile a Go source file with

GOARCH=386 GOOS=android go tool compile x.go

Here is what the prolog looks like (from go tool objdump):

  x.go:5   0x751  8b0d00000000   MOVL 0, CX          [2:6]R_ADDR:runtime.tls_g
  x.go:5   0x757  8b8900000000   MOVL 0(CX), CX      [2:6]R_TLS_LE
  x.go:5   0x75d  3b6108         CMPL 0x8(CX), SP        

Note the R_TLS_LE relocation. Here is the description from
cmd/internal/objabi/reloctype.go:

	// R_TLS_LE, used on 386, amd64, and ARM, resolves to the offset of the
	// thread-local symbol from the thread local base and is used to implement the
	// "local exec" model for tls access (r.Sym is not set on intel platforms but is
	// set to a TLS symbol -- runtime.tlsg -- in the linker when externally linking).

At the point of the crash in the linker, we're trying to process an R_TLS_LE relocation on a text symbol (regular Go function), but the relocation's Xsym field is nil, so it tries to execute this line:

Errorf(s, "missing xsym in relocation %#v %#v", r.Sym.Name, s)

which faults because r.Sym is also nil. The code that ordinarily would have filled the r.Sym is here:

cmd/link/internal/ld/data.go:230

	case objabi.R_TLS_LE:
		if ctxt.LinkMode == LinkExternal && ctxt.IsELF {
			r.Done = false
			if r.Sym == nil {
				r.Sym = ctxt.Tlsg
			}
			r.Xsym = r.Sym
			r.Xadd = r.Add

however ctxt.Tlsg is now also nil due to CL 169618.

Not quite sure what the right fix is here-- as I understand it we are no longer using the "local exec" model for TLS (is this right?) in which case we should not have the R_TLS_RE in this case?

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

Thanks @thanm for looking into this!

in which case we should not have the R_TLS_RE in this case?

I think this is the right fix.

@thanm thanm modified the milestones: Backlog, Go1.14 Oct 10, 2019
@thanm

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

OK, I have a tentative fix, which I will send shortly. I am wondering where would be the place to put a test for this... I will consult with the team.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 10, 2019

Change https://golang.org/cl/200337 mentions this issue: cmd/compile: fix spurious R_TLE_LE reloc on android/386

@mvdan

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2019

Great! Should the build succeed after this patch, or simply error without a panic?

@thanm

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

@mvdan the build should succeed.

@thanm

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

@gopherbot Please open a backport to 1.13.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 10, 2019

Backport issue(s) opened: #34825 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.