New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: print all threads in GOTRACEBACK >= all #13161

Open
aclements opened this Issue Nov 5, 2015 · 3 comments

Comments

Projects
None yet
4 participants
@aclements
Member

aclements commented Nov 5, 2015

Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads. This is frustrating. For purely internal reasons, it's currently necessary to set GOTRACEBACK=crash in order to get stacks for goroutines on other threads, but that also gets you runtime frames and an abort at the end, which is often undesirable.

We should make GOTRACEBACK=all (or higher) print stacks for all goroutines, regardless of what thread they're running on. This will make "all" do what it says in the name and will make the only difference between "system" and "crash" be whether or not it aborts at the end of the traceback.

In other words, this is the current behavior of the GOTRACEBACK settings:

none single all system crash
show user frames N Y Y Y Y
show runtime frames N N N Y Y
show other goroutines N N Y Y Y
show other threads N N N N Y
abort N N N N Y

This is what it should be:

none single all system crash
show user frames N Y Y Y Y
show runtime frames N N N Y Y
show other goroutines N N Y Y Y
show other threads N N Y Y Y
abort N N N N Y

With this, we would eliminate the distinction between "show other goroutines" and "show other threads", and each GOTRACEBACK level would enable exactly one additional feature.

We could do this using the same signal hand-off mechanism GOTRACEBACK=crash currently uses to interrupt the other threads. Historically we couldn't do this because this mechanism wasn't entirely robust, but it's been improved to the point where it should be reliable.

/cc @rsc @ianlancetaylor @randall77

@rsc rsc modified the milestones: Go1.7, Go1.7Early Dec 28, 2015

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 5, 2016

@aclements, please decide if this is happening now or kick it down the road and update its milestone.

@aclements

This comment has been minimized.

Member

aclements commented May 5, 2016

Too invasive for the freeze. Commencing kicking.

@aclements aclements modified the milestones: Go1.8Early, Go1.7Early May 5, 2016

@quentinmit quentinmit added the NeedsFix label Sep 30, 2016

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 20, 2016

I looked into this for an hour or so. The mechanism does not seem quite robust enough still. After arranging for GOTRACEBACK=all to pass a SIGQUIT around like in GOTRACEBACK=crash, I only managed to find the missing goroutine running on another thread about half the time. I assume the other half of the time it had stopped by the time the SIGQUIT came in. We will probably need to be more aggressive about searching for the missing goroutines in order to be sure to print them all. And doing so will require cleaning up the SIGQUIT token passing a bit, so I'm not posting my code here. It was awful.

Test program:

package main

import "time"

func main() {
    for i := 0; i < 3; i++ {
        go func() {
            select {}
        }()
    }
    go func() {
        for {
        }
    }()
    time.Sleep(2 * time.Millisecond)
    panic(1)
}

Goroutine 7 is the one that only shows up half the time.

@rsc rsc modified the milestones: Go1.9Early, Go1.8Early Oct 20, 2016

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.9Early May 3, 2017

@aclements aclements modified the milestones: Go1.10, Go1.9Maybe Jul 18, 2017

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@aclements aclements modified the milestones: Go1.11, Unplanned Jul 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment