Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
runtime: print all threads in GOTRACEBACK >= all #13161
Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads. This is frustrating. For purely internal reasons, it's currently necessary to set GOTRACEBACK=crash in order to get stacks for goroutines on other threads, but that also gets you runtime frames and an abort at the end, which is often undesirable.
We should make GOTRACEBACK=all (or higher) print stacks for all goroutines, regardless of what thread they're running on. This will make "all" do what it says in the name and will make the only difference between "system" and "crash" be whether or not it aborts at the end of the traceback.
In other words, this is the current behavior of the GOTRACEBACK settings:
This is what it should be:
With this, we would eliminate the distinction between "show other goroutines" and "show other threads", and each GOTRACEBACK level would enable exactly one additional feature.
We could do this using the same signal hand-off mechanism GOTRACEBACK=crash currently uses to interrupt the other threads. Historically we couldn't do this because this mechanism wasn't entirely robust, but it's been improved to the point where it should be reliable.
I looked into this for an hour or so. The mechanism does not seem quite robust enough still. After arranging for GOTRACEBACK=all to pass a SIGQUIT around like in GOTRACEBACK=crash, I only managed to find the missing goroutine running on another thread about half the time. I assume the other half of the time it had stopped by the time the SIGQUIT came in. We will probably need to be more aggressive about searching for the missing goroutines in order to be sure to print them all. And doing so will require cleaning up the SIGQUIT token passing a bit, so I'm not posting my code here. It was awful.
Goroutine 7 is the one that only shows up half the time.
Now that we have #24543 (asynchronous preemption) we could make it so that
Also, just to reference what restarted this conversation: census-instrumentation/opencensus-go#1200 was missing a goroutine in the default traceback, and folks had to add