Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: linkname directive on userspace variable can override runtime variable #72032

Open
srosenberg opened this issue Feb 28, 2025 · 5 comments
Assignees
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@srosenberg
Copy link

Go version

go version go1.23.6 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/stan_cockroachlabs_com/.cache/go-build'
GOENV='/home/stan_cockroachlabs_com/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/stan_cockroachlabs_com/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/stan_cockroachlabs_com/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.6'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/stan_cockroachlabs_com/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1873938282=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Background

For years, CockroachDB has been using the go:linkname hack to stay ahead of the runtime. In this particular case, we've been using go:linkname sched runtime.sched for (periodically) getting a precise count of the runnable goroutines; in [1], you can see this has been running, at least as of Go 1.15.

The regression I'll describe below snuck into our master after the upgrade to Go 1.23 PR was merged [2]. Luckily, one of our nightlies tripped up [3]. What followed after was a very long (several hours) investigation. The issue is best illustrated with a pared down example.

Reproduction

Build the following code using go 1.22 and go 1.23,

cat linkname_regression.go

package main

import _ "unsafe"

type schedt struct{}

//go:linkname sched runtime.sched
var sched schedt

func main() {
	select {
	default:
		println("Hello World!")
	}
}

Output of ./linkname_regression_1_22,

Hello World!

Output of GOTRACEBACK=crash ./linkname_regression_1_23,

Quit (core dumped)

dlv core ./linkname_regression_1_23 core.linkname_regres

(dlv) bt
0  0x0000000000465763 in runtime.futex
   at /usr/local/go/src/runtime/sys_linux_amd64.s:558
1  0x000000000042b250 in runtime.futexsleep
   at /usr/local/go/src/runtime/os_linux.go:69
2  0x0000000000409bc5 in runtime.lock2
   at /usr/local/go/src/runtime/lock_futex.go:116
3  0x0000000000432b93 in runtime.lockWithRank
   at /usr/local/go/src/runtime/lockrank_off.go:24
4  0x0000000000432b93 in runtime.lock
   at /usr/local/go/src/runtime/lock_futex.go:52
5  0x0000000000432b93 in runtime.mcommoninit
   at /usr/local/go/src/runtime/proc.go:932
6  0x0000000000432612 in runtime.schedinit
   at /usr/local/go/src/runtime/proc.go:823
7  0x0000000000461a7c in runtime.rt0_go
   at /usr/local/go/src/runtime/asm_amd64.s:349
   
(dlv) print runtime.sched
main.schedt {}

Summary

As of Go 1.23, go:linkname sched runtime.sched, runtime.sched is linked to the local sched, instead of the other way around. This obviously causes memory corruption since the local struct is a subset of the target/remote. The fact of this regression can be further illustrated via objdump, by comparing the sizes of the linked runtime.sched; in 1.23, it's empty, and in 1.22 it has the expected size.

objdump -t ./linkname_regression_1_23 |grep "runtime.sched$"
000000000050d300 g     O .noptrbss	0000000000000000 runtime.sched
objdump -t ./linkname_regression_1_22 |grep "runtime.sched$"
00000000004daf00 g     O .bss	0000000000001aa8 runtime.sched

[1] https://github.com/search?q=%22go%3Alinkname+sched+runtime.sched%22&type=code
[2] cockroachdb/cockroach#140626
[3] cockroachdb/cockroach#141977 (comment)

What did you see happen?

Test executions would non-deterministically fail, suggesting some form of memory corruption. In hindsight, there wasn't any obvious change in [2], which could have caused it.

What did you expect to see?

We expected the go:linkname hack to continue working. The changes described in [1] don't mention the fact that a "Handshake" would result in linking the local (user) struct into the runtime. Granted the use of go:linkname is a hack, it has been tacitly supported until 1.23.

We have a fix, which basically moves our code into a forked version of the runtime. I think it's still worth mentioning that this was a surprising regression. I realize it will likely not be addressed. Nevertheless, perhaps this issue could be a warning sign for others. Hacking the runtime can cause you delayed pain many years after :)

[1] #67401

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Feb 28, 2025
@ianlancetaylor ianlancetaylor changed the title go:linkname sched runtime.sched regression in 1.23.x cmd/link: linkname directive on userspace variable can override runtime variable Feb 28, 2025
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 28, 2025
@ianlancetaylor
Copy link
Member

That is definitely odd behavior. After #67401 we should be combining a non-linkname variable in the runtime with a linktime variable in the user program. We should either separate the two variables or print an error.

This is related to the general problem of linkname on variables, which is that there is no separation between definition and use. Maybe we should finally fix that.

CC @golang/compiler

@prattmic prattmic added this to the Go1.25 milestone Feb 28, 2025
@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2025
@cherrymui
Copy link
Member

cherrymui commented Mar 3, 2025

We do combine the two variables. But there is no guarantee which one we choose in the final binary. In particular, if we choose the one in the main package

type schedt struct{}

//go:linkname sched runtime.sched
var sched schedt

which is of size zero, the runtime will also operate on a zero-sized sched. And any reads/writes on its fields will end up reading/writing to different data, which would lead to weird behaviors.

The current rule is that for the two packages that declare the variable (regardless which one has linkname, or both), if one is statically initialized (DATA symbol in linker term) and the other is not (BSS symbol in linker term), we treat the initialized one as the definition and the other as a reference, and choose the initialized one (size and data content) in the binary. I think this also matches C toolchain's behavior with -fcommon.

If both are defined with initializers, the link fails.

If both are not initialized, like in the case of this issue, the choice is arbitrary. In the implementation it depends on the symbol loading order, which depends on many factors and is not guaranteed to be stable. This is where things changed between Go 1.22 and 1.23.

Perhaps we could have another heuristic: if both are uninitialized, we pick the one with larger size. This probably works for most cases -- the reference side wants to access some fields but not all. And it should make the current build work. Ideally both sides should have exactly same size (and layout, which is more expensive to check at link time). Perhaps we should require that in the future.

As Ian mentioned, the problem is that there is no separation between definition and use for linknamed variables. Perhaps we should introduce different directives for them. On the other hand, linkname should only be used in legacy code, which we expect to support without code change, and after #67401 we should not add more externally-visible linknames.

@ianlancetaylor
Copy link
Member

I note that picking the variable with the larger size is exactly how traditional C (and Fortran) linkers handle common symbols.

@mknyszek
Copy link
Contributor

mknyszek commented Mar 5, 2025

In triage, @cherrymui do you plan to look into this? Optimistically assigning you, but feel free to unassign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

7 participants