Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: context: add APIs for writing and reading cancelation cause #51365

Open
Sajmani opened this issue Feb 25, 2022 · 24 comments
Open

proposal: context: add APIs for writing and reading cancelation cause #51365

Sajmani opened this issue Feb 25, 2022 · 24 comments
Labels
Projects
Milestone

Comments

@Sajmani
Copy link
Contributor

@Sajmani Sajmani commented Feb 25, 2022

This proposal addresses issue #26356 by extending package context with new functions for writing and reading the "cause" of the context's cancelation. This approach is similar to what was proposed in #46273.

Design notes and alternatives

Naming

I chose the name Cause because it's short. Alternatives include Reason (as in #46273) and WhyDone.

Cause is an error

Based on user feedback in #26356, we've made cause an error so that the programmer can include structured data and introspect it with errors.Is/As. Alternatives would be a string or a stack trace (captured automatically).

Compatibility

As package context is covered by the Go 1 compatibility guarantee, we are unable to change any of the existing types or functions in the package. In particular

  • we cannot add Cause as a method of the Context interface
  • we cannot change the existing CancelFunc type to take a cause
  • we cannot change the existing WithCancel, WithDeadline, or WithTimeout function signatures
  • we cannot change Err to return any values other than nil, Canceled, or DeadlineExceeded

Performance considerations

Based on user feedback, our goal is to introduce no new performance overhead for existing uses of Context (hence the rejection of the design that captured stack traces automatically for existing cancelations). We consider it acceptable to increase the size of some context implementation structs by a small amount (specifically, adding an error field to cancelCtx).

Cause(ctx) vs. ctx.Err()

@andreimatei pointed out that in the current prototype, it's possible for Cause(ctx) to be nil when ctx.Err() is non-nil. This is working as intended: we expect users to check ctx.Err() before reading Cause(ctx). However, @andreimatei suggested users might want to check Cause(ctx) instead of ctx.Err(), in which case we'd need to guarantee that Cause(ctx) is non-nil exactly when ctx.Err() is non-nil. If we do this, I'd recommend Cause(ctx) == ctx.Err() at all times except when the cause has been explicitly set to non-nil. This would complicate the handling of code that sets the cause to nil.

Concerns for custom Context implementations

@andreimatei raised concerns about the fact that this design does not provide a way for custom Context implementations to control the behavior of Cause(ctx). As a result, it's possible for Cause(ctx) to return a non-nil error while the custom Context's Done channel has not yet been closed. I pointed out that a custom Context can control Cause(ctx) by wrapping a standard Context returned by WithCancelCause; @andreimatei pointed out that this makes it difficult for custom Context implementations to achieve their efficiency goals.

We could address this concern by exposing a key for Context.Value that looks up the Cause, which would allow custom Context implementations to override the lookup for that key. This would require changing the current prototype that reuses the existing cancelCtxKey to find the cause.

Proposed API changes

I've prototyped this change in CL https://go-review.googlesource.com/c/go/+/375977, including several tests to exercise various interleaving of cancelations with and without causes.

package context

// WithCancelCause behaves like WithCancel but returns a CancelCauseFunc instead of a CancelFunc.
// Calling cancel with a non-nil error (the "cause") records that error in ctx;
// it can then be retrieved using Cause(ctx).
//
// Example use:
//   ctx, cancel := context.WithCancelCause(parent)
//   cancel(myError)
//   ctx.Err() // returns context.Canceled
//   context.Cause(ctx) // returns myError
func WithCancelCause(parent Context) (ctx Context, cancel CancelCauseFunc)

// A CancelCauseFunc behaves like a CancelFunc but additionally sets the cancelation cause.
// This cause can be retrieved by calling Cause on the canceled Context or any of its derived Contexts.
// If the context has already been canceled, CancelCauseFunc does not set the cause.
type CancelCauseFunc func(cause error)

// Cause returns a non-nil error if a parent Context was canceled using a CancelCauseFunc that was passed that error.
// Otherwise Cause returns nil.
func Cause(c Context) error

// WithDeadlineCause behaves like WithDeadline but also sets the cause of the
// returned Context when the deadline is exceeded. The returned CancelFunc does
// not set the cause.
func WithDeadlineCause(parent Context, d time.Time, cause error) (Context, CancelFunc)

// WithTimeoutCause behaves like WithDeadlineCause(parent, time.Now().Add(timeout), cause).
func WithTimeoutCause(parent Context, timeout time.Duration, cause error) (Context, CancelFunc)
@gopherbot gopherbot added this to the Proposal milestone Feb 25, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Feb 25, 2022
@WIZARDISHUNGRY
Copy link

@WIZARDISHUNGRY WIZARDISHUNGRY commented Feb 25, 2022

// Cause returns a non-nil error if a parent Context was canceled using a CancelCauseFunc that was passed that error.
// Otherwise Cause returns nil.
func Cause(c Context) error

It would be nice to use errors.Is or errors.Unwrap directly on ctx.Err(), but I assume a ton of existing code does err == context.Canceled.

@Sajmani
Copy link
Contributor Author

@Sajmani Sajmani commented Feb 25, 2022

@WIZARDISHUNGRY That's right: the Context API guarantees that Err only ever returns nil, Canceled, or DeadlineExceeded, and plenty of client code relies on this. We can't break it.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 2, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

@rsc rsc commented Mar 9, 2022

Commented on #26356 and #46273 to try to get feedback from people who are interested in this problem.

@OneOfOne
Copy link
Contributor

@OneOfOne OneOfOne commented Mar 9, 2022

I've needed this in multiple projects.

@bullgare
Copy link

@bullgare bullgare commented Mar 10, 2022

I think, it could work if will be adopted by packages like net/http, sql, and so on.

@Sajmani
Copy link
Contributor Author

@Sajmani Sajmani commented Mar 10, 2022

Hi Dmitry, what would integration with http and sql look like? They already integrate with Context.

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Mar 10, 2022

@Sajmani from the documentation https://pkg.go.dev/context#Context.Err it's not clear to me that it only returns those values. To me it looks like it just lists some of the usual errors you might get. I've always assumed that custom context implementations can return different errors.

@bullgare
Copy link

@bullgare bullgare commented Mar 10, 2022

@Sajmani If that libraries would add a cancellation reason.
To know at least which library did it and, ideally, why.
I had a case when on production I got random "context canceled" log messages. And in the case like that you don't even know where to dig and how to investigate it further. Or how to reproduce it on a local machine.
Cancellation cause could help with that potentially.

@Sajmani
Copy link
Contributor Author

@Sajmani Sajmani commented Mar 14, 2022

@egonelbre While we might reasonably interpret the documentation of Err to mean it could return other values, I believe there is code that already assumes it only returns those values. I believe it's best to leave Err's behavior unchanged to avoid breaking such code.

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Mar 14, 2022

@Sajmani sure, I do fear that changing the behavior in that case would cause more problems. However, it also means that code checking Err() == context.Canceled might be wrong already, when custom contexts are used. It might be good to either suggest in context.Context documentation to use errors.Is for checking the error or alternatively change the wording to make clear that only those values can be returned.

I haven't found an easy way to find how common that problem is. One that I did find is https://github.com/pion/ice/blob/39ae3a4fa890e29069c764039a98dff44ca94255/context.go#L23 -- which suggests that there are more such contexts.

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Mar 14, 2022

FWIW, I have always interpreted the Err docs to mean it can only return nil, Canceled, or DeadlineExceeded, which is why I asked #46273 (comment).

If we add WithCancelReason how do you propose to update the docs for the Err method to reflect the new behavior?

I do prefer to use errors.Is(err, context.DeadlineExceeded) in new code since Go 1.13.

But that's not the end of it. In order for #46273 to be acceptable I think the context.Context interface docs should probably specify that all implementations only return errors for which errors.Is(err, DeadlineExceeded) || errors.Is(err, Canceled) is true.

My rationale is that the current docs pretty clearly (at least to me) specify that there are only two ways a context can become canceled and the context package provides error values to distinguish those two dispositions. Code that wants to make choices about how to handle a context cancelation based on the reason currently has a finite set of cases to handle whether you do it with == or errors.Is. Allowing Err to return arbitrary error values changes that equation.

The proposal sidesteps that by allowing contexts to provide two errors.

Questions about this proposal:

The proposed design of WithDeadlineCause and WithTimeoutCause are curious. I am trying to think through how to use them and why I might want to.

This seems maybe useful:

func orchestrate(ctx context.Context) {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Minute, errors.New("outer timeout"))
    defer cancel()
    ...
    doWork(ctx)
}

func doWork(ctx context.Context) {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Second, errors.New("inner timeout"))
    defer cancel()
    ...
    if err := ctx.Err(); err != nil {
        log.Printf("aborting work: %v, because: %v", err, context.Cause(ctx))
    }
}

Now our logs will tell us which of the two timeouts triggered the work to abort. It may still be ambiguous in the case that the inner timeout ended up having the same deadline as its parent due to it potentially being started near the end of the parent context's time range.

But what if orchestrate also has a reason to explicitly cancel the work before the timeout?

func orchestrate(ctx context.Context) {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Minute, errors.New("outer timeout"))
    defer cancel()
    ...
    go doWork(ctx)
    ...
    if err := isSomethingWrong(); err != nil {
        cancel()
    }
}

Now we lack the ability to provide a cause without adding another wrapping context like this:

func orchestrate(ctx context.Context) {
    ctx, hardCancel := context.WithCancelCause(ctx)
    defer cancel(nil) // Is this idiomatic?
    ctx, cancel := context.WithTimeoutCause(ctx, time.Minute, errors.New("outer timeout"))
    defer cancel()
    ...
    go doWork(ctx)
    ...
    if err := isSomethingWrong(); err != nil {
        hardCancel(err)
    }
}

This seems a bit unfortunate and I wonder if there is a better way.

@Sajmani
Copy link
Contributor Author

@Sajmani Sajmani commented Mar 14, 2022

Great point about having causes both on deadline and direct cancelation.

IMO the code that calls WithCancelCause and WithTimeoutCause is the clearest: it makes it clear that there are two different reasons why this context might be canceled and provides them both.

cancel(nil) should work fine: it will cancel the context and record a nil Cause. Later cancelations won't write the cause because the Context has already been canceled.

While you don't technically need to call the second cancel—since the first deferred cancel will cascade down to the derived context and cancel it—it seems like good hygiene to write it as you suggest.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 14, 2022

That example does suggest a synchronization consideration, though: both WithTimeoutCause and CancelCauseFunc should probably avoid annotating the cause if the Context was already done for some reason (for example, a parent cancellation).

That is: there should be at most one reason attributed to the Context, and it should be the one (if any) that actually caused the transition from not-done to done.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 30, 2022

Does anyone object to adding this API? Specifically (quoting the top comment):

package context

// WithCancelCause behaves like WithCancel but returns a CancelCauseFunc instead of a CancelFunc.
// Calling cancel with a non-nil error (the "cause") records that error in ctx;
// it can then be retrieved using Cause(ctx).
//
// Example use:
//   ctx, cancel := context.WithCancelCause(parent)
//   cancel(myError)
//   ctx.Err() // returns context.Canceled
//   context.Cause(ctx) // returns myError
func WithCancelCause(parent Context) (ctx Context, cancel CancelCauseFunc)

// A CancelCauseFunc behaves like a CancelFunc but additionally sets the cancelation cause.
// This cause can be retrieved by calling Cause on the canceled Context or any of its derived Contexts.
// If the context has already been canceled, CancelCauseFunc does not set the cause.
type CancelCauseFunc func(cause error)

// Cause returns a non-nil error if a parent Context was canceled using a CancelCauseFunc that was passed that error.
// Otherwise Cause returns nil.
func Cause(c Context) error

// WithDeadlineCause behaves like WithDeadline but also sets the cause of the
// returned Context when the deadline is exceeded. The returned CancelFunc does
// not set the cause.
func WithDeadlineCause(parent Context, d time.Time, cause error) (Context, CancelFunc)

// WithTimeoutCause behaves like WithDeadlineCause(parent, time.Now().Add(timeout), cause).
func WithTimeoutCause(parent Context, timeout time.Duration, cause error) (Context, CancelFunc)

@hherman1
Copy link

@hherman1 hherman1 commented Mar 31, 2022

Which cancel cause will be reported if multiple parent contexts have been cancelled with cause? I’m assuming it’s the nearest parent?

@andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Mar 31, 2022

I would make the following changes and/or clarifications to the API:

  1. Make and specify Cause(ctx) to return nil if ctx.Error() == nil.
  2. Make Cause(ctx) return Canceled or DeadlineExceeded (depending on the case) when ctx was canceled with through CancelFunc and not through a CancelCauseFunc, and when the timeout expired.
  3. Specify cancel(nil) to result in Cause(ctx) == Canceled.
  4. Make WithDeadlineCause and WithTimeoutCause return CancelCauseFunc instead of CancelFunc, with semantics similar to WithCancelCause. (*)
  5. Specify how a custom ctx implementation can participate in the Cause(ctx) determination, and export the stdlib key required to do so.

(*) About this point, @Sajmani you've said, in response to @ChrisHines:

IMO the code that calls WithCancelCause and WithTimeoutCause is the clearest: it makes it clear that there are two different reasons why this context might be canceled and provides them both.

I personally don't buy it. A single call to WithTimeoutCause returning a CancelCauseFunc that you immediately defer would also make it clear that there are two reasons why the ctx might be canceled.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 31, 2022

@hherman1

Which cancel cause will be reported if multiple parent contexts have been cancelled with cause?

Personally, I would prefer that it be left unspecified (because synchronization makes guarantees difficult), but implementations should try to preserve the Cause that caused the Done channel to become closed. The other potential “causes” are not actually causal: they don't have any other observable effect on the Context.

@Sajmani
Copy link
Contributor Author

@Sajmani Sajmani commented Apr 1, 2022

@hherman1 @bcmills Indeed the behavior of the current prototype is that the cause associated with the first cancelation is propagated all the way down the chain, just like the Err is. This means a later cancelation, even if it's "nearer", won't update the cause, since the Context has already been canceled.

@Sajmani
Copy link
Contributor Author

@Sajmani Sajmani commented Apr 1, 2022

@andreimatei Of your suggestions, #5 is the one I'm most in agreement with, as that's a fundamental issue with the ability to define custom Contexts. The rest are all attempts to make Cause(ctx) an alternative to ctx.Err(), which is not my proposal. IMO it's best for code to check ctx.Err() as they do today, and consult Cause(ctx) for additional information. I'm happy to hear other points of view on this, of course, and if the proposal committee prefers your semantics, I'm OK with that.

@rsc
Copy link
Contributor

@rsc rsc commented Apr 13, 2022

On the one hand I'd like to enable custom contexts, but on the other hand I am not sure about exposing the implementation detail that is the context key. What stops someone from using WithCancelCause and then wrapping the result with their own implementation? Why do we need to expose the key?

@andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Apr 27, 2022

On the one hand I'd like to enable custom contexts, but on the other hand I am not sure about exposing the implementation detail that is the context key. What stops someone from using WithCancelCause and then wrapping the result with their own implementation? Why do we need to expose the key?

Nothing technically stops a custom ctx implementation from wrapping WithCancelCause, but that would mean at least one extra memory allocation.

On the one hand I'd like to enable custom contexts, but on the other hand I am not sure about exposing the implementation detail that is the context key.

I'd also bring a principle argument: Context is an interface to allow for custom implementations that interoperate with the standard library, but the proposal is to reserve functionality exclusively to the standard library; that seems wrong. I'd argue that, by making Cause use a special key, that "key" is no longer specific to the implementation of the standard library. Rather, it's logically part of the Context interface.

@rsc
Copy link
Contributor

@rsc rsc commented May 4, 2022

I'd argue that, by making Cause use a special key, that "key" is no longer specific to the implementation of the standard library. Rather, it's logically part of the Context interface.

Right, and that's a larger API change, which it would be good to avoid, at least at first. What if we want to optimize the implementation at some future point?

How often do custom contexts need this and would it be so awful for them to call WithCancelCause?

@natefinch
Copy link
Contributor

@natefinch natefinch commented May 5, 2022

The rest are all attempts to make Cause(ctx) an alternative to ctx.Err(), which is not my proposal. IMO it's best for code to check ctx.Err() as they do today, and consult Cause(ctx) for additional information.

Why make people check two things? I'd prefer for ctx.Err() to be left as a legacy API that only exists for backwards compatibility, and point everyone at context.Cause as the new way to check context results.

As long as we use error wrapping, we can put both the existing sentinel errors and the passed-in cause error in the same value returned by context.Cause(ctx). We then specify that you have to use errors.Is/As/Unwrap to examine the error.

The vast majority of functions only return a single error. If you want to pass the error that describes what happened to the context up the stack, it has to be a single error value. If you split the information into two errors, now you're putting the onus on the caller to figure out how to merge these two pieces of information.

I would propose error.Is(context.Cause(ctx), context.Cancelled) always reports the same as ctx.Err() == context.Canceled and similarly for context.DeadlineExceeded.

And then, also return true for errors.Is/As when passed the error/type that was passed in via cause(myerror).

And finally, that errors.Unwrap(context.Cause(ctx)) return the error value that was passed to cancel(myerror).

For applications that use 1.13 error wrapping (the vast majority of new code these days), the error returned from Cause would then be the only thing you'd need to return up the stack. For code that doesn't care about the cause of the error, it can just check errors.Is(err, context.Canceled) or errors.Is(err, context.DeadlineExceeded) ... but for code that wants access to the cause, it can extract that error with errors.Unwrap(), or just log the whole thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests