New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: ease debugging of where a context was canceled? #26356

Open
matthewceravolo opened this Issue Jul 12, 2018 · 17 comments

Comments

Projects
None yet
8 participants
@matthewceravolo

matthewceravolo commented Jul 12, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/matthew/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/matthew/work"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build356255000=/tmp/go-build -gno-record-gcc-switches"

What did you do?

used context.WithTimeout() to make requests to google calendar api and outlook calendar api

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

What did you expect to see?

Making requests using contexts with timeouts should cancel when the timeout is reached

What did you see instead?

Contexts with timeouts are instantly failing with "context canceled" even though the timeout is set to time.Minute. The error goes away if I remove the timeout context and use one without any limit. It also seems to be transient to some extent

@gopherbot gopherbot added this to the Unreleased milestone Jul 12, 2018

@ianlancetaylor ianlancetaylor changed the title from x/net context canceled to context: context canceled Jul 12, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 12, 2018

Sorry, but there isn't enough information here to say anything useful. Please show us your code, or tell us how to recreate the problem. Thanks.

@matthewceravolo

This comment has been minimized.

matthewceravolo commented Jul 12, 2018

import (
	"net/http"
	"os"
	"time"

	"golang.org/x/net/context"
	"golang.org/x/oauth2"
	"golang.org/x/oauth2/google"
)

var (
	backgroundContext = context.Background()
	oauthConfig       = &oauth2.Config{
		ClientID:     os.Getenv("GOOGLE_PROJECT_CLIENT_ID"),
		ClientSecret: os.Getenv("GOOGLE_PROJECT_SECRET_KEY"),
		Endpoint:     google.Endpoint,
	}
)

type calendarClient struct {
	*calendar.Service
	requestTimeout time.Duration
}

// NewCalendarClient creates a new authenticated Google Calendar client.
func NewCalendarClient(refreshToken string) (wf.Client, error) {
	client, err := newClient(refreshToken)
	if err != nil {
		return nil, err
	}

	service, err := calendar.New(client)
	if err != nil {
		return nil, err
	}

	return &calendarClient{Service: service, requestTimeout: time.Minute}, nil
}

func newClient(refreshToken string) (*http.Client, error) {
	contextWithTimeout, cancel := context.WithTimeout(backgroundContext, time.Minute)
	defer cancel()
	return oauthConfig.Client(contextWithTimeout, &oauth2.Token{RefreshToken: refreshToken}), nil
}

func (client *calendarClient) GetCalendars() (*calendar.CalendarList, error) {
	contextWithTimeout, cancel := context.WithTimeout(backgroundContext, time.Minute)
	defer cancel()
	return client.CalendarList.List().Context(contextWithTimeout).Do()
}

The issue is that when we call GetCalendars for example, it will instantly return "context canceled". When I use an empty context with no timeout, everything works as expected. This all began failing in the past couple of days since updates were made in the /x/net package

@meirf

This comment has been minimized.

Member

meirf commented Jul 13, 2018

I think the behavior you're seeing is expected. I don't see a Go bug.

NewClient (called by oauthConfig.Client):

The returned client is not valid beyond the lifetime of the context.

Your code has:

func newClient(refreshToken string) (*http.Client, error) {
    contextWithTimeout, cancel := context.WithTimeout(backgroundContext, time.Minute)
    defer cancel()
    return oauthConfig.Client(contextWithTimeout, &oauth2.Token{RefreshToken: refreshToken}), nil
}

The defer cancel() causes the context to be cancelled before the client even gets to calendar.New.

"When I use an empty context with no timeout, everything works as expected. This all began failing in the past couple of days since updates were made in the /x/net package." You are describing to things that might have changed in your code (no longer using empty context, changes in x/net). If my theory above is correct: if you undo the x/net changes (to only change one variable) you'll see that any non-empty context will still have the problem you're seeing.

@bcmills

This comment has been minimized.

Member

bcmills commented Jul 13, 2018

I've seen this sort of issue crop up several times now.

I wonder if context.Context should record a bit of caller information (say, the PC of the immediate calling function) in context.WithTimeout and in calls to the CancelFunc returned by context.WithCancel. Then we could add a debugging hook to interrogate why a particular context.Context was cancelled.

Perhaps something along the lines of:

package context

// A DoneReasoner describes the reason a Context is done.
// The Context implementations returned by this package implement DoneReasoner.
type DoneReasoner interface {
	DoneReason() string
}

// DoneReason returns a human-readable description of the reason that ctx.Done() is closed,
// or the empty string if ctx.Done() is not closed.
func DoneReason(ctx context.Context) string {
	select {
	case <-ctx.Done():
	default:
		return ""
	}
	if r, ok := ctx.(DoneReasoner); ok {
			return r.DoneReason()
	}
	return ctx.Err().Error()
}
@bcmills

This comment has been minimized.

Member

bcmills commented Jul 13, 2018

(CC @Sajmani @bradfitz for Context debuggability.)

@bradfitz bradfitz changed the title from context: context canceled to context: ease debugging of where a context was canceled? Jul 13, 2018

@matthewceravolo

This comment has been minimized.

matthewceravolo commented Jul 13, 2018

@meirf I think there was a misunderstanding in how I described the problem. This code has always been using two contexts like I described but only in the past two days since we pulled updates to the vendor package did it start failing. There were not any changes whatsoever related to any of this code for several months besides pulling in the package update.

Only after we pulled in the update and everything started failing, we switched to contexts without timeouts (still multiple) and started seeing things succeed. If your explanation is correct, the defer cancel was not previously working as expected

However, I believe your explanation is incorrect because the error message is returned when GetCalendars is called, not on client retrieval. We always receive a calendarClient back, so calendar.New is being called correctly

@matthewceravolo matthewceravolo changed the title from context: ease debugging of where a context was canceled? to context.WithTimeout is returning context canceled immediately Jul 13, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 13, 2018

Do you happen to have a record of which revision of the context package you were using previously, and which one you updated to?

@bcmills

This comment has been minimized.

Member

bcmills commented Jul 13, 2018

only in the past two days since we pulled updates to the vendor package did it start failing.

Which vendored packages did you update? (Just x/net, or also x/oauth2?)

@matthewceravolo

This comment has been minimized.

matthewceravolo commented Jul 13, 2018

I've narrowed down the precise commit where things began to fail, and there are more vendor updates here than I'd previously thought. It looks like both packages (x/net and x/oauth2) were updated and that the previous version was decently old such that it may be very difficult to narrow anything down, but here are a couple of samples from the revisions:

"checksumSHA1": "dr5+PfIRzXeN+l1VG+s0lea9qz8=", => "checksumSHA1": "GtamqiJoL7PGHsN454AoffBFMa8=",
"path": "golang.org/x/net/context", => "path": "golang.org/x/net/context",
"revision": "f5079bd7f6f74e23c4d65efa0f4ce14cbd6a3c0f", => "revision": "1e491301e022f8f977054da4c2d852decd59571f",
"revisionTime": "2017-07-19T21:11:51Z" => "revisionTime": "2018-05-30T06:29:46Z"

"checksumSHA1": "hyK05cmzm+vPH1OO+F1AkvES3sw=", => "checksumSHA1": "xaHFy3C2djwUXtiURpX3/ruY8IA=",
"path": "golang.org/x/oauth2", => "path": "golang.org/x/oauth2",
"revision": "314dd2c0bf3ebd592ec0d20847d27e79d0dbe8dd", => "revision": "1e0a3fa8ba9a5c9eb35c271780101fdaf1b205d7",
"revisionTime": "2016-12-14T09:25:55Z" => "revisionTime": "2018-06-02T17:55:33Z"

Also, I re-tested the above code by leaving everything the same and completely removing the second context.WithTimeout inside of GetCalendars such that only the client has a timeout configured, and I see the same error

@bcmills

This comment has been minimized.

Member

bcmills commented Jul 13, 2018

https://golang.org/cl/45370 is in that range and loosely matches @meirf's diagnosis. It looks like your vendor update uncovered a latent bug within your code.

@matthewceravolo

This comment has been minimized.

matthewceravolo commented Jul 13, 2018

I've confirmed that commenting out the defer cancel() inside of newClient does in fact successfully return our calendar data as expected, but it is still not matching what was diagnosed above since when the error occurs it is on GetCalendars, it isn't canceling prior to Calendar.New. I don't feel I have a great understanding of exactly what's happening here other than the pre-mature cancel seems to be killing the client's ability to make any future successful requests

Is the expectation that we exercise the client we retrieve to its fullest extent only to call cancel() at the last moment? And that if it's set on this client we shouldnt even need to bother with client.CalendarList.List().Context(), correct?

Can we rely on GC and just not bother holding onto the cancel func?

@fraenkel

This comment has been minimized.

Contributor

fraenkel commented Jul 13, 2018

I think you have it backwards. Get rid of the context in newClient.
I assume you want to have a timeout per service call rather than across the Client use.

@matthewceravolo

This comment has been minimized.

matthewceravolo commented Jul 13, 2018

@fraenkel I'm fine with whichever, but doesn't putting it on the client prevent me from needing to duplicate this context code across all calls?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 14, 2018

@matthewceravolo, please don't retitle the bug. We have a convention for how we use titles.

It still reads above like this is user error and you got lucky in the past, so I'd like to keep this bug about thinking about ways to make this debugging easier for other users making similar errors.

@bradfitz bradfitz changed the title from context.WithTimeout is returning context canceled immediately to context: ease debugging of where a context was canceled? Jul 14, 2018

@Sajmani

This comment has been minimized.

Contributor

Sajmani commented Jul 15, 2018

@bcmills I wouldn't use type assertion to introspect a Context because they are wrapped so often. If you want to attach a Done reason to a Context, I'd use a Context value. This is overhead, though, so I'd make it opt-in in somehow. Perhaps provide a context.WithDebug method that adds a Context value for storing debugging info. The code in WithTimeout and WithCancel can check for this debug value to decide whether to walk the call stack and record any information there. The context debug value needs to be installed at the top of the context stack to be seen by all derived contexts, so I'd enable it in the request handler (say) with a parameter like ?debugcontext=1.

@bcmills

This comment has been minimized.

Member

bcmills commented Jul 17, 2018

I wouldn't use type assertion to introspect a Context because they are wrapped so often.

Good point. A Context.Value call with an exported (but opaque) key to retrieve the reason code should be fine.

If you want to attach a Done reason to a Context, I'd use a Context value. This is overhead, though, so I'd make it opt-in in somehow. Perhaps provide a context.WithDebug method that adds a Context value for storing debugging info. The code in WithTimeout and WithCancel can check for this debug value to decide whether to walk the call stack and record any information there.

Ironically, Context.Value is O(N) with the depth of Context nesting, whereas runtime.Caller(1) is O(1): the mere act of checking whether debugging is enabled may be significantly more costly than storing the debug information unconditionally.

@Sajmani

This comment has been minimized.

Contributor

Sajmani commented Jul 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment