Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: occasional failures with "unreachable sym in relocation" since 2021-09-12 #49752

Closed
bcmills opened this issue Nov 23, 2021 · 12 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Nov 23, 2021

greplogs --dashboard -md -l -e ': unreachable sym in relocation: '

2021-11-22T23:00:32-0244343/freebsd-amd64-13_0
2021-11-19T21:57:03-6027b21/freebsd-amd64-13_0
2021-11-16T19:18:26-8122e49-6c36c33/freebsd-amd64-12_2
2021-09-12T01:06:53-0d8a4bf/linux-riscv64-unmatched

CC @cherrymui @thanm

@bcmills
Copy link
Member Author

bcmills commented Nov 23, 2021

These failures started within the 1.18 cycle, so marking as release-blocker to determine whether this is a regression (and whether it may affect other platforms too).

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker labels Nov 23, 2021
@bcmills bcmills added this to the Go1.18 milestone Nov 23, 2021
@cherrymui
Copy link
Member

cherrymui commented Nov 23, 2021

These failures look weird, especially being nondeterministic. Maybe it is due to memory corruption somewhere?

@thanm
Copy link
Contributor

thanm commented Nov 23, 2021

Not really sure what to do with this bug, given the failure mode. Normally I would lease a builder and run some test or set of tests in a loop to try to reproduce -- in this case, even if I were able to reproduce it a few times, what then? I don't see any commonality in terms of the symbols that are suddenly being switched from reachable to unreachable.

@bcmills
Copy link
Member Author

bcmills commented Nov 23, 2021

Maybe it is due to memory corruption somewhere?

“possible memory corruption on FreeBSD” is #46272, open since May. 😩

Still, it seems odd to me that this particular failure mode doesn't show up in the logs until recently. Maybe the memory corruption has a non-trivial interaction with the 1.18 GC changes? (Perhaps the corruption is triggered by having the GC active during some step of the linking process?)

@thanm
Copy link
Contributor

thanm commented Nov 23, 2021

Just out of curiosity, how far back does "greplogs" go? What's the oldest log that it will fetch?

@bcmills
Copy link
Member Author

bcmills commented Nov 23, 2021

Depends on what you've downloaded with fetchlogs. (It looks like the semi-arbitrary cutoff for my database for the main repo is currently 2019-06-06.)

@toothrot
Copy link
Contributor

toothrot commented Dec 8, 2021

Checking on this as a release blocker. Are there any updates? @thanm

@cherrymui
Copy link
Member

cherrymui commented Dec 8, 2021

Still look like memory corruption. Seems no new failure occurred since 11-22. Haven't tried to reproduce.

@thanm
Copy link
Contributor

thanm commented Dec 8, 2021

I kicked off a goswarm on freebsd-amd64-12_2 this morning trying to reproduce the issue. I'll leave it running for a day or two and see if that catches any new instances of the bug. I tried to do something similar for an AMD-specific VM (e.g. netbsd-amd64-9_0-n2d) but all.bash crashes almost immediately there (most of the time doesn't make it past the bootstrap).

@thanm
Copy link
Contributor

thanm commented Dec 8, 2021

OK, it took a while, but the swarm I started this morning did find a new instance of this problem on tip with freebsd-amd64-12_2 just now, so it definitely looks like this bug is alive and kicking.

@thanm
Copy link
Contributor

thanm commented Dec 13, 2021

Related: #46272 (comment). I will retest this bug once we have an updated version of FreeBsd on our builders with the fix.

@cherrymui cherrymui removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 14, 2021
@thanm
Copy link
Contributor

thanm commented Jan 11, 2022

I ran a goswarm test (using the updated freebsd-amd64-13_0 builder image to include the fix) and did not detect any new failures over the course of a 40 hour run with 10 gomotes, so I am going to go ahead and close out this bug. See #46272 for more details on the kernel problem and the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants