Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: Arbitrary linking failures with go1.15 #43820

Open
dwillemv opened this issue Jan 21, 2021 · 12 comments
Open

cmd/link: Arbitrary linking failures with go1.15 #43820

dwillemv opened this issue Jan 21, 2021 · 12 comments

Comments

@dwillemv
Copy link

@dwillemv dwillemv commented Jan 21, 2021

What version of Go are you using (go version)?

$ go version
go1.15.7

Does this issue reproduce with the latest release?

This happens with go1.15.7, but not on go1.14 or tip.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/home/username/devel/next/bin"
GOCACHE="/home/username/.cache/go-build"
GOENV="/home/username/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/username/go/pkg/mod"
GOOS="linux"
GOPATH="/home/username/go"
GOROOT="/home/username/devel/go"
GOTMPDIR=""
GOTOOLDIR="/home/username/devel/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/username/devel/next/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/dev/shm/go-build423997605=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran go install folder/... folder2/... from a Makefile in a newly cleaned environment. This is a large private repo. Importantly this does not happen on the subsequent build if the cache is not cleared.

What did you expect to see?

Successfull build

What did you see instead?

/home/username/devel/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/dev/shm/go-link-811975522/go.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status

More details

The file (and its containing directory for that matter) that the error message refers to does not exist in the filesystem after the build fails.
This only seems to happen in a newly cleaned build environment, i.e. a clean go cache. It doesn't happen every time make is run and the failing package differs each time.
Git bisect points to a possible cause:
commit d40b0a1
Author: Steven Hartland steven.hartland@multiplay.co.uk
Date: Fri May 8 12:09:00 2020 +0000

cmd/link: fix mode parameter to fallocate on Linux

It stopps happening at:

commit 987ce93
Author: Cherry Zhang cherryyz@google.com
Date: Fri Jun 26 16:35:49 2020 -0400

[dev.link] cmd/link: emit ELF relocations in mmap
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 21, 2021

/dev/shm/go-link-811975522/go.o

Does the filesystem at your /dev/shm support fallocate syscall correctly? Does it fail if you try a different place for the temporary output file?

The file (and its containing directory for that matter) that the error message refers to does not exist in the filesystem after the build fails.

The linker deletes temporary files at exit (success or not). You can use -ldflags=-tmpdir=<dir> flag to set a temporary directory and let the linker keep it. (The directory needs to exist before linking.)

@seankhliao seankhliao changed the title Arbitrary linking failures with go1.15 cmd/link: Arbitrary linking failures with go1.15 Jan 21, 2021
@dwillemv
Copy link
Author

@dwillemv dwillemv commented Jan 21, 2021

/dev/shm is a tmpfs filesystem and the kernel version is 3.17. It seems like most fallocate features were introduced before then.
Thanks for your suggestions. I will see what the output looks like and also try writing to a different type of filesystem. What is the expected content of go.o in general?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 21, 2021

What is the expected content of go.o in general?

An ELF object file. You should be able to open it with tools like objdump, readelf, or nm.

@dwillemv
Copy link
Author

@dwillemv dwillemv commented Jan 22, 2021

Linkmode was set to auto (I assume it defaulted to internal since there was no output), so I changed linkmode to external and .o files were written to tmpdir.

The output of objdump -d go.o was the same for a failing and a successful build. However the error message from ld still refers to a directory in /dev/shm.

This got me wondering why ld is being invoked in the first place if linkmode is internal? Doesn't the go linker handle all linking?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 22, 2021

The linkmode defaults to auto, which means internal linking for pure-Go programs, external linking for cgo programs. You can use -ldflags=-v to tell. If it is external linking, it will print "host link" with the invocation of the external linker.

The output of objdump -d go.o was the same for a failing and a successful build. However the error message from ld still refers to a directory in /dev/shm.

Is there any difference whether the tmpdir is in /dev/shm?

@dwillemv
Copy link
Author

@dwillemv dwillemv commented Jan 23, 2021

Is there any difference whether the tmpdir is in /dev/shm?

The error message looks the same no matter if I specify a tmpdir or not or which linkmode I use. The file path that the error message refers to is always a temp directory in /dev/shm. When I set a tmpdir, the only difference between when I specify linkmode as external it generates some .o files in the tmpdir, but not for auto or internal.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 25, 2021

When I set a tmpdir, the only difference between when I specify linkmode as external it generates some .o files in the tmpdir, but not for auto or internal.

I'm very confused with this. -tmpdir should not affect linkmode. If it is by default external, it should emit some .o files. If it is not, it should not invoke gcc. Could you paste the output of go build -x -ldflags="-tmpdir=<somedir> -v"? Thanks.

@dwillemv
Copy link
Author

@dwillemv dwillemv commented Jan 26, 2021

I am struggling to generate output suitable to post here. The failure only triggers with two or more pkg/... paths and -x makes the output quite verbose as quite a few packages are involved.

Interestingly the failure did not happen for more than a thousand runs when the paths ./subpackage/... ./pkg2/cmd/program were passed, but did happen for ./subpackage/... ./pkg2/cmd/program/... the second time it ran.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 26, 2021

It is okay to post a large output, maybe as an attachment, if you like. If you prefer not to do that, only -ldflags="-tmpdir=<somedir> -v" is probably helpful as well.

@dwillemv
Copy link
Author

@dwillemv dwillemv commented Jan 27, 2021

After adding -v to the ldflags some panics show up in the output:
/usr/bin/ld: final link failed: File truncated
collect2: error: ld returned 1 exit status

unexpected fault address 0x7f17ef7b5d32
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7f17ef7b5d32 pc=0x4e8e61]

goroutine 82 [running]:
runtime.throw(0x6d029c, 0x5)
/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc00516ac28 sp=0xc00516abf8 pc=0x436512
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:739 +0x485 fp=0xc00516ac58 sp=0xc00516ac28 pc=0x44cc85
encoding/binary.littleEndian.PutUint32(...)
/usr/local/go/src/encoding/binary/binary.go:73
encoding/binary.(*littleEndian).PutUint32(0x8b2440, 0x7f17ef7b5d32, 0x4f9, 0x6435bb, 0x0)
:1 +0x41 fp=0xc00516ac78 sp=0xc00516ac58 pc=0x4e8e61
cmd/link/internal/ld.(*relocSymState).relocsym(0xc00516af80, 0x4d23, 0x7f17ef7b5c5f, 0x5cc, 0x64368e)
/usr/local/go/src/cmd/link/internal/ld/data.go:574 +0x884 fp=0xc00516af20 sp=0xc00516ac78 pc=0x57ddc4
cmd/link/internal/ld.(*Link).reloc.func3(0xc000065880, 0xc000065c00, 0xc004cecd10)
/usr/local/go/src/cmd/link/internal/ld/data.go:672 +0xd6 fp=0xc00516afc8 sp=0xc00516af20 pc=0x600536
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00516afd0 sp=0xc00516afc8 pc=0x46b161
created by cmd/link/internal/ld.(*Link).reloc
/usr/local/go/src/cmd/link/internal/ld/data.go:668 +0x10a

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc004cecd18)
/usr/local/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc004cecd10)
/usr/local/go/src/sync/waitgroup.go:130 +0x65
cmd/link/internal/ld.(*Link).reloc(0xc000065880)
/usr/local/go/src/cmd/link/internal/ld/data.go:677 +0x118
cmd/link/internal/ld.Main(0x8718a0, 0x20, 0x20, 0x1, 0x7, 0x10, 0x0, 0x0, 0x6dbb6a, 0x1b, ...)
/usr/local/go/src/cmd/link/internal/ld/main.go:332 +0x1836
main.main()
/usr/local/go/src/cmd/link/main.go:68 +0x1dc

goroutine 81 [runnable]:
cmd/link/internal/ld.foldSubSymbolOffset(0xc000065c00, 0x1d0a, 0x1d08, 0xf20000)
/usr/local/go/src/cmd/link/internal/ld/data.go:126 +0x10b
cmd/link/internal/ld.(*relocSymState).relocsym(0xc005169f80, 0x65d1, 0x7f17eeed5580, 0xa8, 0xf23d6d)
/usr/local/go/src/cmd/link/internal/ld/data.go:340 +0xfb7
cmd/link/internal/ld.(*Link).reloc.func2(0xc000065880, 0xc000065c00, 0xc004cecd10)
/usr/local/go/src/cmd/link/internal/ld/data.go:664 +0xcf
created by cmd/link/internal/ld.(*Link).reloc
/usr/local/go/src/cmd/link/internal/ld/data.go:661 +0xd4

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 27, 2021

I'm really confused here. The panic comes after the invocation of ld? Invoking ld is the last step so that shouldn't be possible. Can you paste the full exact output of linker's -v output? That should include how ld (through gcc) is invoked.

@dwillemv
Copy link
Author

@dwillemv dwillemv commented Jan 27, 2021

Apologies, we would prefer that the information we post is as generic as possible.
The ld error is for */package/cmd/program1, */package/cmd/program2 compiles successfully after that and the panic happens for */package/cmd/program3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants