Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: a sane, simple, Goish way to handle uncaught panics #16340

Closed
zellyn opened this issue Jul 13, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@zellyn
Copy link

commented Jul 13, 2016

Proposal: a sane, simple, Goish way to handle uncaught panics

I'd like to discuss a mechanism for catching/handling/logging panics before they cause the entire Go program to exit, similar to Java's setDefaultUncaughtExceptionHandler, but Goish.

The problem

You've carefully wrapped your calls to that third-party library with a defer/recover, but then that code executes go func() { x := 0; fmt.Println(3/x) }(), and your entire process exits before you can notice, log a stack trace, recover, etc.

Basic mechanism:

Add runtime.SetUncaughtPanicDeferred(func()), which accepts and sets a (global) no-arg function.

When a panic occurs, all deferred functions are popped and executed. If all deferred functions run without recovering, _and the entire process is about to exit, and the global handler function is not nil, execute it as it it were a normal deferred function.

As a normal deferred function, the handler may call recover, may exit without error, or may re-panic.

What this is not:

  • Atexit: "not lgtm. Atexit is a way to make your program wedge instead of exiting. It's too easy to misuse, even limited to package testing." -- rsc
  • OnPanic: "This cannot go into the runtime. It is a disastrous idea." -- Rob Pike
  • Trying to run code while crashing, or running code in a weird, unusual way. The pseudo-deferred is run in precisely the same manner that an actual deferred function would have run, had there been one.
  • a way to try to catch things that normal defer/recover pairs would not catch

Alternative solutions/proposals:

  • panicwrap: fork, and have the parent parse stderr of the child to watch for panics

  • https://groups.google.com/forum/#!topic/golang-nuts/xahrOSZYpf4 - have panic/recover work on goroutine subtrees

  • writing a wrapper function, and using it every time you would have used the go keyword:

    func Go(f func()) {
        go func() {
            defer func() {
                if p := recover(); p != nil {
                    logging.Errorf("Uncaught panic from Goroutine: %v", p)
                    panic(p)
                }
            }()
            f()
        }()
    }
    

Implementation concerns

  • gccgo has its own implementation of panic and recover, so this would require changes there too.
  • currently, deferred functions are followed by a small piece of code that handles what happens if they return - a similar piece of code would need to be generated for the pseudo-deferred function

Questions

The biggest question: is this useful/necessary? I chatted with a couple of Go team members, and they didn't immediately hate the idea…

In my particular case, I'm working with a deployment system that reliably logs to persistent storage (via Kafka) when using explicit calls to logging functions, but loses panics dumped to stderr -- a bad "12-factor" violation that we plan to fix when we can.

But, it does seem like it comes up often (I quoted only some of the past discussions), and has spawned some … interesting … workarounds.

@minux

This comment has been minimized.

Copy link
Member

commented Jul 13, 2016

@minux minux added the Proposal label Jul 13, 2016

@josharian

This comment has been minimized.

Copy link
Contributor

commented Jul 13, 2016

If a third party library panics in an unexpected place, its internal data structures might be inconsistent. Continuing is worse than crashing.

For your needs, I wonder whether something much more specific would suffice--some way to set what fd uncaught panics/throws log to. Although I'd be surprised if there weren't existing tools to help 12 factor apps send stdout/stderr to more appropriate places.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jul 14, 2016

Not lgtm I'm afraid. I'm a firm advocate of crash only software.

On Wed, 13 Jul 2016, 08:41 Josh Bleecher Snyder notifications@github.com
wrote:

If a third party library panics in an unexpected place, its internal data
structures might be inconsistent. Continuing is worse than crashing.

For your needs, I wonder whether something much more specific would
suffice--some way to set what fd uncaught panics/throws log to. Although
I'd be surprised if there weren't existing tools to help 12 factor apps
send stdout/stderr to more appropriate places.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16340 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAAcA-IdnfZNCiZXoGeZBQ_dHoUt6GM-ks5qVPj7gaJpZM4JLBzn
.

@zellyn

This comment has been minimized.

Copy link
Author

commented Jul 15, 2016

This proposal was trying to allay crash-only concerns by doing no more than
a defer/recover could have done had you added one. However, the concern of
leaving third party libraries in an inconsistent state of you're foolish
enough to recover and do anything other than log and re-panic is a killer.
:-(

Perhaps the idea of being able to catch the final stack trace output would
still work...

On Thu, Jul 14, 2016, 4:23 PM Dave Cheney notifications@github.com wrote:

Not lgtm I'm afraid. I'm a firm advocate of crash only software.

On Wed, 13 Jul 2016, 08:41 Josh Bleecher Snyder notifications@github.com
wrote:

If a third party library panics in an unexpected place, its internal data
structures might be inconsistent. Continuing is worse than crashing.

For your needs, I wonder whether something much more specific would
suffice--some way to set what fd uncaught panics/throws log to. Although
I'd be surprised if there weren't existing tools to help 12 factor apps
send stdout/stderr to more appropriate places.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16340 (comment), or
mute
the thread
<
https://github.com/notifications/unsubscribe/AAAcA-IdnfZNCiZXoGeZBQ_dHoUt6GM-ks5qVPj7gaJpZM4JLBzn

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#16340 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AACDWWIOahF2u47V0pbnGEtJ0Pb34clgks5qVpregaJpZM4JLBzn
.

@minux

This comment has been minimized.

Copy link
Member

commented Jul 15, 2016

@zellyn

This comment has been minimized.

Copy link
Author

commented Jul 15, 2016

@minux I agree the answer in my particular case is to fix the deploy environment. However, this has come up enough times that I thought my solution would be generally useful.

But I think the third-party library concern kills this. I'm going to close this proposal, so it doesn't clutter things up.

@tejasmanohar

This comment has been minimized.

Copy link

commented Apr 4, 2017

I think this is an important feature. Here's a real world example-- I have a buffered stats client that needs to be flushed before exit, but I can't easily do this on panic in a program with a lot of goroutines. Currently, I make it flush on os.Interrupt, other signals, and before the program gracefully exits but not when the failure is unexpected (which is arguably more important than when things are OK).

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2017

@tejasmanohar

This comment has been minimized.

Copy link

commented Apr 4, 2017

@davecheney Of course, in a perfect world, we (or the runtime) would never panic, but that's besides the point. It should be easy to write debuggable systems that can gracefully handle failures. Stats, logs, and other buffered things are most useful when things go wrong. Another example is I may want to print some info in the stack of other goroutines atexit.

@zellyn

This comment has been minimized.

Copy link
Author

commented Apr 4, 2017

I agree this is a useful feature. You are unlikely to have any success though: although my solution would be equivalent to wrapping all go calls in bog-standard defer recover handlers, you are likely to get the "You can't assume you can compute while crashing" response from folks.

Godspeed, my friend.

ps. To be fair, wrapping even bog-standard defer/recover pairs around third-party library-spawned goroutines is a bit wonky. Oh well.

@golang golang locked and limited conversation to collaborators Apr 4, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.