-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: linkname directive on userspace variable can override runtime variable #72032
Comments
go:linkname sched runtime.sched
regression in 1.23.x
That is definitely odd behavior. After #67401 we should be combining a non-linkname variable in the runtime with a linktime variable in the user program. We should either separate the two variables or print an error. This is related to the general problem of linkname on variables, which is that there is no separation between definition and use. Maybe we should finally fix that. CC @golang/compiler |
We do combine the two variables. But there is no guarantee which one we choose in the final binary. In particular, if we choose the one in the main package
which is of size zero, the runtime will also operate on a zero-sized The current rule is that for the two packages that declare the variable (regardless which one has linkname, or both), if one is statically initialized (DATA symbol in linker term) and the other is not (BSS symbol in linker term), we treat the initialized one as the definition and the other as a reference, and choose the initialized one (size and data content) in the binary. I think this also matches C toolchain's behavior with If both are defined with initializers, the link fails. If both are not initialized, like in the case of this issue, the choice is arbitrary. In the implementation it depends on the symbol loading order, which depends on many factors and is not guaranteed to be stable. This is where things changed between Go 1.22 and 1.23. Perhaps we could have another heuristic: if both are uninitialized, we pick the one with larger size. This probably works for most cases -- the reference side wants to access some fields but not all. And it should make the current build work. Ideally both sides should have exactly same size (and layout, which is more expensive to check at link time). Perhaps we should require that in the future. As Ian mentioned, the problem is that there is no separation between definition and use for linknamed variables. Perhaps we should introduce different directives for them. On the other hand, linkname should only be used in legacy code, which we expect to support without code change, and after #67401 we should not add more externally-visible linknames. |
I note that picking the variable with the larger size is exactly how traditional C (and Fortran) linkers handle common symbols. |
In triage, @cherrymui do you plan to look into this? Optimistically assigning you, but feel free to unassign. |
Go version
go version go1.23.6 linux/amd64
Output of
go env
in your module/workspace:What did you do?
Background
For years, CockroachDB has been using the
go:linkname
hack to stay ahead of the runtime. In this particular case, we've been usinggo:linkname sched runtime.sched
for (periodically) getting a precise count of the runnable goroutines; in [1], you can see this has been running, at least as of Go 1.15.The regression I'll describe below snuck into our master after the upgrade to Go 1.23 PR was merged [2]. Luckily, one of our nightlies tripped up [3]. What followed after was a very long (several hours) investigation. The issue is best illustrated with a pared down example.
Reproduction
Build the following code using
go 1.22
andgo 1.23
,Output of
./linkname_regression_1_22
,Output of
GOTRACEBACK=crash ./linkname_regression_1_23
,Summary
As of Go 1.23,
go:linkname sched runtime.sched
,runtime.sched
is linked to the localsched
, instead of the other way around. This obviously causes memory corruption since the local struct is a subset of the target/remote. The fact of this regression can be further illustrated viaobjdump
, by comparing the sizes of the linkedruntime.sched
; in 1.23, it's empty, and in 1.22 it has the expected size.[1] https://github.com/search?q=%22go%3Alinkname+sched+runtime.sched%22&type=code
[2] cockroachdb/cockroach#140626
[3] cockroachdb/cockroach#141977 (comment)
What did you see happen?
Test executions would non-deterministically fail, suggesting some form of memory corruption. In hindsight, there wasn't any obvious change in [2], which could have caused it.
What did you expect to see?
We expected the
go:linkname
hack to continue working. The changes described in [1] don't mention the fact that a "Handshake" would result in linking the local (user) struct into the runtime. Granted the use ofgo:linkname
is a hack, it has been tacitly supported until 1.23.We have a fix, which basically moves our code into a forked version of the runtime. I think it's still worth mentioning that this was a surprising regression. I realize it will likely not be addressed. Nevertheless, perhaps this issue could be a warning sign for others. Hacking the runtime can cause you delayed pain many years after :)
[1] #67401
The text was updated successfully, but these errors were encountered: