Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider handling panics of goroutines #10

Closed
GiedriusS opened this issue Mar 23, 2019 · 5 comments
Closed

Consider handling panics of goroutines #10

GiedriusS opened this issue Mar 23, 2019 · 5 comments

Comments

@GiedriusS
Copy link

GiedriusS commented Mar 23, 2019

run.Group spawns one or more goroutines which execute given functions. However, goroutine do not and cannot inherit panic handlers from parent goroutines so a panic() in one of the child goroutines will kill the whole program. Please consider adding a new parameter to Group.Run() which will add a recover() and then return it as an error to the caller. Until then we are forced to add a wrapper function which will do that like here: thanos-io/thanos#932.

@GiedriusS
Copy link
Author

GiedriusS commented Mar 23, 2019

@peterbourgon cc. Would you be open to adding this to oklog/run?

@peterbourgon
Copy link
Member

No, recovering from panics would definitely be incorrect in a tool like run.Group. With few exceptions, panics should crash your program.

@bwplotka
Copy link

Thx for quick response @peterbourgon

With few exceptions,

Can you give example of those exception?

I think I agree that panics should crash your app / microservice. However currently we have case like this: thanos-io/thanos#874 particularly, some nil pointer exception for no really reason caused by this function (not filling a Posting): https://github.com/improbable-eng/thanos/blob/master/pkg/store/bucket.go#L1357:29

The issue is, that sure, panics should crash, but it completely gives almost 0 debuggability for issues like this. We got a gRPC request which nukes the Go binary leaving us with nothing just meaningless stack trace. Only few people can repro this, on few certain queries that touches few certain blocks (:

The optional logic for run.Group that recovers panics is something useful here, probably controled by some flag, right? Anyway, we are happy host some wrapper over run.Group which does that in Thanos for now.

I believe, that once such issue is fixed we can get our panics to crash app again. ;p

@peterbourgon
Copy link
Member

Can you give example of those exception?

net/http installs a panic handler with each request-serving goroutine, which I personally wouldn't do if I were designing the package from scratch, but makes reasonable sense. Other than that class of service, no other exception comes to mind.

The issue is, that sure, panics should crash, but it completely gives almost 0 debuggability for issues like this. We got a gRPC request which nukes the Go binary leaving us with nothing just meaningless stack trace.

Adding panic recovery to a run.Group actor yourself is trivial.

package main

import (
	"context"
	"log"
	"time"

	"github.com/oklog/run"
)

func main() {
	var g run.Group
	{
		ctx, cancel := context.WithCancel(context.Background())
		g.Add(func() error {
			return foo(ctx)
		}, func(error) {
			cancel()
		})
	}
	log.Printf("exit: %v", g.Run())
}

func foo(ctx context.Context) error {
	oneshot := time.After(time.Second)
	for {
		select {
		case <-oneshot:
			panic("kaboom")
		case <-ctx.Done():
			return ctx.Err()
		}
	}
}
$ go run pr.go
panic: kaboom

goroutine 4 [running]:
main.foo(0x10eaf60, 0xc00005a0c0, 0xc000044788, 0x109a6c5)
        /Users/pbourgon/tmp/pr.go:29 +0x141
main.main.func1(0x10aadc0, 0x0)
        /Users/pbourgon/tmp/pr.go:16 +0x33
github.com/oklog/run.(*Group).Run.func1(0xc000022120, 0xc00000c060, 0xc000010200)
        /Users/pbourgon/src/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /Users/pbourgon/src/github.com/oklog/run/group.go:37 +0xbe
exit status 2
		ctx, cancel := context.WithCancel(context.Background())
		g.Add(func() error {
+			defer func() {
+				if x := recover(); x != nil {
+					log.Printf("debug: %v", x)
+				}
+			}()
			return foo(ctx)
		}, func(error) {
			cancel()
		})
$ go run pr.go
2019/03/25 10:23:10 debug: kaboom
2019/03/25 10:23:10 exit: <nil>

Is there a reason this won't work?

@GiedriusS
Copy link
Author

It will and that's more or less what we are planning to do however we thought it would be nicer to add this kind of functionality into oklog/run - it would be nice to just be able to pass true or false to catch panics and return them as error objects but I understand your POV as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants