Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: morestack on gsignal signal: trace/breakpoint trap due to g0 stack misattribution #43853

Closed
prattmic opened this issue Jan 22, 2021 · 4 comments
Labels

Comments

@prattmic
Copy link
Member

@prattmic prattmic commented Jan 22, 2021

In rare cases on linux/amd64 race builds we've seen crashes that look like:

fatal: morestack on gsignal

signal: trace/breakpoint trap (core dumped)

The root cause is signal delivery on a sigaltstack allocated very close to the g0 stack. When cgo is enabled, mstart estimates the g0 stack bounds (cgo side), but this is a rough estimate and the g0 stack.lo may actually be beyond the end of the g0 stack.

On signal delivery, adjustSignalStack may then incorrectly determine that the signal was delivered on the g0 stack . Since the overlap is likely to be very close to g0 stack.lo, functions in signal handling have a high probability of "running out of stack space" and calling morestack. Boom.

Here's one example of overlap I captured:

Our SP on sigtrampgo entry: 0x7f99841fe328
sigaltstack from sigcontext: [0x7f99841ef000, 0x7f99841ff000)
g0 stack from gp.m.g0.stack: [0x7f99841fded8, 0x7f99849fdad8)

mstart contains a fudge factor of 1024 to try to address this inaccuracy, but checking against pthread_attr_getstack indicates that the mstart SP is actually 9616 bytes below the top of the stack (that may be off by 1 page (4096), I need to double check. Either way > 1024 bytes).

cc @cherrymui @aclements

@prattmic
Copy link
Member Author

@prattmic prattmic commented Jan 22, 2021

There is pthread_attr_getstack which can provide accurate stack bounds. However, I'm not convinced we can use this portably. e.g., glibc's implementation looks like it always succeeds, but NetBSD's appears to be able to return NULL for the stack address.

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 22, 2021

Change https://golang.org/cl/285772 mentions this issue: runtime: check for g0 stack last in signal handler

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jan 22, 2021

#26061 is related.

@gopherbot gopherbot closed this in 3a778ff Jan 22, 2021
@prattmic
Copy link
Member Author

@prattmic prattmic commented Jan 22, 2021

Shall we do a backport for this? I certainly affects 1.15 and I believe 1.14 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants