Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: debug call injection tests contain many unsafe write barriers #49169

Open
mknyszek opened this issue Oct 26, 2021 · 1 comment
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Oct 26, 2021

It's possible for the debug call injection tests to segfault, because (*debugCallHandler).inject and (*debugCallHandler).handle have write barriers but they're called in a signal handler. Normally write barriers would be prevented here via their caller, sighandler, which has go:nowritebarrierrec but unfortunately because they're called indirectly via the testSigtrap global variable, the analysis doesn't plumb through.

The failure is very rare; a GC has to be actively running during the injection. But, it is possible, and has been reproduced locally by myself and is observable on trybots, if you run the runtime tests enough times.

This is only a test problem, also. Real debug call injection implementations will usually be coordinated by a separate process; the tricky bit here is we're doing all the machinery in the signal handler of the same process.

I tried to do the simple thing and mark the two methods above as go:nowritebarrierrec, but there are enough write barriers in each function that it's tough to capture them all without the code being mostly fragile comments about why it's safe to elide all these write barriers. I think this test code just needs a rethink, so that the primary driver is not on the thread getting all the signals.

The trouble with that is dealing with the result of a panicking injected function. The panic value it returns may be the only reference to that value, so the signal handler needs to put it somewhere the GC can see. That's at odds with the fact that any such store would require a write barrier. There may be something clever here, but I haven't thought of it yet.

CC @prattmic @aclements

@mknyszek mknyszek added NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Oct 26, 2021
@mknyszek mknyszek added this to the Backlog milestone Oct 26, 2021
@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 8, 2021

#49370 works around this for now, as it got more likely.

We should reconsider how the test implementation works.

@mknyszek mknyszek changed the title runtime: debug call injection tests sometimes segfault runtime: debug call injection tests contain many unsafe write barriers Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

1 participant