Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: context: add a way to detach from parent cancellation #40221

Open
CAFxX opened this issue Jul 15, 2020 · 16 comments
Open

proposal: context: add a way to detach from parent cancellation #40221

CAFxX opened this issue Jul 15, 2020 · 16 comments
Labels
Projects
Milestone

Comments

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Jul 15, 2020

What

I propose to add a function in context that, given a parent Context returns a new child Context with the same values of the parent, but that will not be canceled when the parent is canceled. This function can initially live in x/net/context, and be migrated to context later.

In addition to the above, we should clarify a point that is only laid out implicitly in the context API docs, i.e., that values attached to the Context can also be used after the Context has been canceled

Why

This is useful in multiple frequently recurring and important scenarios:

  • handling of rollback/cleanup operations in the context of an event (e.g., HTTP request) that have to continue regardless of whether the triggering event is canceled (e.g., due to timeout or the client going away)
  • handling of long-running operations triggered by an event (e.g., HTTP request) that terminates before the termination of the long-running operation

This is doable today by not propagating the triggering event's context and replacing it instead with a new context obtained from context.Background(). This is problematic though, as the new context does not contain any of the values attached to the context of the triggering event, and these values are important to e.g., ensure correct logging/tracing/error recovery functionality (a common scenario when using a middleware-based approach to request/event handling).

As noted below with @davecheney, a nice consequence that naturally falls out of this approach is that it effectively turns cancellation into a regular context value that can be overridden in children contexts. It doesn't solve the problem of cancellation being conflated with the intent of context being just a "bag of values" (that would almost certainly require breaking changes to solve) but it's an effective step into alleviating the situation, and it's backwards compatible.

As noted further below with @martisch the benefit of this approach is that it's pretty much as minimal, composable and inline with the current design of the context package as possible, requiring a single new public API and eschewing conflating additional mechanisms (goroutines).

An important point to be made is that all existing implementations of this (see below) rely on an internal/undocumented guarantee of cancelCtx. If this proposal is shot down at least that guarantee should be explicitly documented in the exported API.

Existing implementations

Looking around it is possible to find multiple reimplementations of this proposal, almost identicals but with different names. I'm not advocating for a specific name here.

Implementations are trivial and would add a single public function to context.

@gopherbot gopherbot added this to the Proposal milestone Jul 15, 2020
@ianlancetaylor ianlancetaylor changed the title Proposal: context: add a way to detach from parent cancellation proposal: context: add a way to detach from parent cancellation Jul 15, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jul 15, 2020
@davecheney
Copy link
Contributor

@davecheney davecheney commented Jul 15, 2020

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values. I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jul 15, 2020

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values.

Yes, without said conflation this proposal wouldn't be needed.

I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

While my primary intent here (that is much more immediate) wasn't to move that discussion forward... in a way, this proposal could also be seen as a first step in sidestepping (undoing?) the conflation, since it effectively turns cancellation into a regular copy-on-write value like all other values carried by the Context. It doesn't magically solve all the other problems with cancellation, but it's nevertheless a step in the direction of alleviating the problems with conflation.

@martisch
Copy link
Contributor

@martisch martisch commented Jul 15, 2020

I also have seen a detach package before with a function FromContext that takes a parent context and creates a new context without propagation of the cancelation deadline and returns a new context that keeps all the values of the parent.

It might however be better to encapsulate the specifc use case of detached background tasks by having a Task struct and a Go(ctx, f(ctx)) Task method that takes a function f to execute in a new goroutine. That Task struct can then offer methods itself e.g. useful for canceling and checking if the task is finished. The Go function can have an internal mechanisms with registration or an additional parameter to decide what values to expose in the detached context.

This might also avoid adding more exposed API to context itself.

@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jul 15, 2020

@martisch i wouldn't add goroutines to the mix. There are legitimate use cases (rollback/cleanup operations that can run in a defer or inside an if err != nil {} being an example) where they are not needed.

Also, your counterproposal seem to be way more complex in terms of number of new exposed APIs (one new package, one new struct/interface and at least one new method) compared to the one proposed here. You're right though it wouldn't need to be in context, although I question the intuitiveness of that (since all packages in the wild are basically called some close variation of context, I have to assume that's where people do expect such a function to live).

To be 100% clear though, I completely agree that the current design of context is less than ideal due to the conflation mentioned above. And it would be ideal if a new context design solved that.

But, as mentioned above, my intent here was not really to propose a new design for context: it's just to address an omission that fails to cover important use cases, within the framework of the current context design. To that end, creating a new/separated "task" package with a design that seem to differ significantly from that of context could lead to a significantly less intuitive and discoverable mechanism (but maybe this is just because I misunderstood your idea... in which case an example/sketch of what you have in mind may help clarify that).

It's worth noting that a new context design that solved the conflation would naturally address the scenarios mentioned in the proposal. Therefore, even if this proposal was accepted, it would not pose additional issues in case users had to migrate to the new design. As such, there's not much for the argument that these scenarios must wait for a context redesign to be addressed.

@navytux
Copy link
Contributor

@navytux navytux commented Jul 17, 2020

Directly related:

#19643
#36503 (comment)

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 17, 2020

Note that the detach package mentioned by @martisch is essentially the same API as was proposed in #19643 (thanks @navytux for the cross-reference!). See #19643 (comment) in particular for some discussion of why that API is needed.

To summarize: values stored in a Context, such as logging or tracing entries, may be scoped to a specific call or operation and flushed or finalized at the end of that call or operation. The API for detaching from such a Context should provide a means to extend or annotate such a value while it is still valid.

@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jul 17, 2020

values stored in a Context, such as logging or tracing entries, may be scoped to a specific call or operation and flushed or finalized at the end of that call or operation

Do we have any actual example of this need? I am thinking at all middlewares we are using and can't think of any that would need that (including tracing and logging).

Furthermore, even if it was really needed, middleware (again, thinking mostly of logging and tracing) will normally already happily let you override the current values (e.g. a logger or trace/span), so I'm not really convinced about the need for a specific, more complicated to cover it. Maybe a concrete example may help understand.

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 17, 2020

Do we have any actual example of this need?

...and presumably others: anything with a local buffer of logs or traces, or a local tree of spans or regions, that needs the buffer or parts of the tree to be flushed to a disk or a remote log server when they are “complete”.

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 17, 2020

Furthermore, even if it was really needed, middleware (again, thinking mostly of logging and tracing) will normally already happily let you override the current values (e.g. a logger or trace/span)

You can only override values that you know about. The point of Context.Value is that most parts of the program don't need to know what's in it. You should be able to use a library without telling that library in particular about which (if any) logging and/or tracing libraries the rest of your program (above and below it) happens to be using.

In practice, that means one of two things:

  1. Synchronous functions and operations, as (potentially) in the case of your “rollback/cleanup operations” use-case. But those should generally be agnostic to cancellation anyway: either they cannot fail or block (in which case there is nothing to cancel), or they must be robust to failure to clean up (in which case why bother cleaning up at all?), or they leak arbitrary resources (in which case we are back to “failure to clean up” by way of an OOM signal).

    • “Detach” would be a misnomer for such an API, because it is not actually detaching from the parent context or the parent operation — rather, it is ignoring cancellation for a task that is still logically part of and scoped within the same operation.
  2. Some global, standardized notion of “the [non-]copyable parts of a context.Context” that decouples ”a library spawning an asynchronous task” from “the set of scoped logging systems in use in the program”.

    • This is exactly #19643. You could perhaps swap the defaults (“keep by default” vs. “discard by default”), but I don't see a way to make the API fundamentally simpler.
@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jul 20, 2020

Do we have any actual example of this need?

...and presumably others: anything with a local buffer of logs or traces, or a local tree of spans or regions, that needs the buffer or parts of the tree to be flushed to a disk or a remote log server when they are “complete”.

Thanks for the list. I am familiar with some, but not all, of these... can you help me understand exactly what you wouldn't be able to do with the current proposal?

My current impression is that flushing a logger when the triggering request completes is not a very convincing example, as whatever middleware added the logger to the context could trivially flush the logger when the request terminates; similarly the asynchronous task knows which logger it is using, so it can flush it when the task completes (I am setting aside for a second the fact that flushing logs at every request is probably not such a common requirement; all loggers and tracers I worked with normally flush based on time or buffer size, and at process shutdown).

The other use-case I read somewhere is IIUC ensuring that values attached to the context that are needed before cancellation but not after are not retained while the derived child context stays alive. I must say I never faced this problem... maybe some concrete examples of where this was needed could help me understand better.

“Detach” would be a misnomer for such an API, because it is not actually detaching from the parent context or the parent operation — rather, it is ignoring cancellation for a task that is still logically part of and scoped within the same operation.

Clarified I did not specifically argue for a specific name, thanks! (our internal impl is not called Detach, just to give context)

You can only override values that you know about. The point of Context.Value is that most parts of the program don't need to know what's in it. You should be able to use a library without telling that library in particular about which (if any) logging and/or tracing libraries the rest of your program (above and below it) happens to be using.

Doesn't this apply in reverse to #19643 as well? The code that registers a value as "preservable" should be aware of all possible uses of a certain context value.

Also, in that design, what would happen if different code paths were to require different values to be preserved?

Synchronous functions and operations, as (potentially) in the case of your “rollback/cleanup operations” use-case. But those should generally be agnostic to cancellation anyway: either they cannot fail or block (in which case there is nothing to cancel), or they must be robust to failure to clean up (in which case why bother cleaning up at all?), or they leak arbitrary resources (in which case we are back to “failure to clean up” by way of an OOM signal).

I'm not sure about the point here; the use case I'm trying to cover is the non-synchronous one (w.r.t. parent cancellation; if the parent has already been cancelled because the client connection broke there's no real difference between running in a separate goroutine or not)

Some global, standardized notion of “the [non-]copyable parts of a context.Context” that decouples ”a library spawning an asynchronous task” from “the set of scoped logging systems in use in the program”.

I understand that there's a pre-existing design in #19643, but I think it's worth noting that it was declined because of lack of evidence it was the correct one. Has this changed?

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 20, 2020

Thanks for the list. I am familiar with some, but not all, of these... can you help me understand exactly what you wouldn't be able to do with the current proposal?

Depends on the library, but in general you end up with some form of use-after-Close bug when you invoke some library that adds a logging or trace span during an asynchronous call. That could manifest as a silent failure to log something you intended to log, or an explicit error return from a call you expected to always succeed, or in extreme cases a run-time panic.

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 20, 2020

Doesn't this apply in reverse to #19643 as well? The code that registers a value as "preservable" should be aware of all possible uses of a certain context value.

Also, in that design, what would happen if different code paths were to require different values to be preserved?

Different code paths cannot require different values. (Values should be preserved by intersecting the set of preservable keys with the set of keys found in the parent context.)

The preservation semantics are based on the meaning of the context value, not any detail about its usage, so the “preserver” code is independent of the point of use. For example, if you have a tracing library, “detaching” for an asynchronous operation should be an event in the parent trace that results in its own (related but distinct) tracing span: that is the only coherent way to “preserve” continuity of the trace, and it should be correct to do any time a context is used asynchronously.

@bcmills
Copy link
Member

@bcmills bcmills commented Jul 20, 2020

I understand that there's a pre-existing design in #19643, but I think it's worth noting that it was declined because of lack of evidence it was the correct one. Has this changed?

I think we have pretty solid evidence that using a context asynchronously is incorrect, or at least error-prone, but I'm not sure that we have any more evidence than we did about whether the API proposed in #19643 is ideal.

@bsiegert
Copy link
Contributor

@bsiegert bsiegert commented Nov 18, 2020

FWIW, speaking for the Frameworks Go team within Google, having a way to detach from the parent context would be enormously beneficial for us. We want the trace context, request IDs and some other values preserved in all work that is started from a request handler, or from the framework-specific Task primitive.

In fact, we have static analysis that flags the use of context.Background in this kind of handler, and it keeps tripping up on open-source code that the internal stuff depends on.

@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Feb 4, 2021

I think we have pretty solid evidence that using a context asynchronously is incorrect, or at least error-prone

To advance the conversation it would be ideal to have the evidence about being "error-prone" be available for discussion.

As for incorrect, I am not sure where this may come from, as I see nothing in the Context contract that seem to suggest that. If anything, it seems to (indirectly) suggest the opposite:

Use context Values only for request-scoped data that transits processes and APIs, not for passing optional parameters to functions.

The same Context may be passed to functions running in different goroutines; Contexts are safe for simultaneous use by multiple goroutines.

I'm assuming that "request-scoped" can't be interpreted to mean that as soon as the request is "over" the values in the Context can't be used anymore: otherwise, it would be illegal to e.g. use a logger attached to a context as soon as the context is canceled (since your code may be running in a different goroutine, and the response may already have been fully completed by the time your code attempts to use said logger). That interpretation would basically make the use of any value inherently racy since there would be no way for a user to ensure that a value remains valid until they are done with it.

Depends on the library, but in general you end up with some form of use-after-Close bug when you invoke some library that adds a logging or trace span during an asynchronous call. That could manifest as a silent failure to log something you intended to log, or an explicit error return from a call you expected to always succeed, or in extreme cases a run-time panic.

If we were to agree on the previous point, then this would be an issue with the specific library - as it's relying on unspecified behavior, no? True, if you have already flushed something it's unrealistic to pretend to unflush it so in some cases you won't be able to satisfy the call (I don't really consider the other two cases, error and panic, as valid; in the former, it would be on the user that ignores the returned error, in the latter it would be a bug in the library for relying on unspecified behavior).

But, crucially, this is not something that is allowed today because anyway all values are lost when people are forced to use context.Background() to achieve similar "detach semantics" - so it's not like we would be breaking existing code. Only new code that were to use the proposed Detach would be affected.

That being said, I agree it we may at least consider the possibility of having values attached to a context opt-in to (or opt-out of) propagation to a detached context (e.g. via an optional interface on the context key? or with yet another value in the context?)

@matttproud
Copy link
Contributor

@matttproud matttproud commented Feb 26, 2021

evidence about being "error-prone" be available for discussion.

I have not seen a particularly cogent demonstration of these to date (at least in defect reports), which is not to say they could exist. From my own experience, however, I have seen many cases of folks attempting to "work around" cancellation and simulate detachment by using context.Background inappropriately (in non-program root contexts) and create other errors that are easier to induce in production. Such errors are particularly poignant with loss of a comprehensible cancellation or shutdown semantics. Those problems are material and concrete today.

But generally speaking even the loss of context metadata for (not exclusively for but rather citing it because it is concrete) tracing is particularly bad when using context.Background in non-root circumstances, and each metadata-setting library needs to provide a way to let a user re-attach the data as needed, and I have not seem a holistic solution beyond our in-house detachment APIs.

func (b *BankService) TransferMoney(ctx context.Context, in *transferpb.Request) (out *transferpb.Response, error) {
  // Validate request: sufficient funds, source and destination accounts are correct.
  
  // Low latency and internally batched for later if failures arise.
  err := b.moveMoney(ctx, in.GetSourceIBAN(), in.GetDestIBAN(), in.GetDate(), in.GetAmount())
  if err != nil {
    return nil, err
  }

  // Keep data store tidy as a way to minimize transfer kiting (https://en.wikipedia.org/wiki/Check_kiting).
  // 
  // It is slow and requires a partial data set audit and compaction of records and should not be done
  // directly in the critical path.  Execution is best-effort, so we do not care about errors.  If it fails,
  // periodic data store reconciliation/cleansing efforts find, report, and act on problems.  This is merely
  // to provide a more realistic view of account standing.
  //
  // The operation itself is derived from the request TransferMoney receives, so it should still receive its
  // authorization credentials that were attached to the ctx as well as any trace spans for purposes of
  // operational forensics.
  //
  // But because the data store reconciliation can be slow and is run at batch priority (don't want to starve)
  // user-interactive requests, its execution often outlives the remaining bits of TransferMoney.
  if in.GetAmount() > outstandingLimit {
    // Dilemma:
    // 1. use context.Background() and lose any sense of metadata and have to do a bespoke re-attachment dance for
    //     all relevant data
    // 2. use local ctx and run the risk of compaction not running to completion
    go b.compactRecords(ctx, in.GetSourceIBAN(), in.GetDestIBAN())
  }

  // Skim the millicents from transfers involving Initech clients.
}

In short, I am a proponent of either an API in the /x/ hierarchy or something that is a first-class companion of package context that enables formal detachment capability. A formal API to do this is a critical part of a well-lit path of the user journey to service these workloads. On quick survey of our product's users, I estimate we have about several hundred uses for long-running operation (LRO) alone.

request-scoped data

I am glad you raised that excerpt. This has been something I have wanted to correct in the documentation, at least propose. In short, that it is an overspecification. My bread and butter are servers, so you'll see that bias what I write. Typical servers deal with two scopes (possibly three):

  • server-scoped: bound to the lifetime of the server itself. This is relevant for setting up the domain types used by the server for its lifetime (e.g., open up a well-known database to service user queries).

  • request-scoped: bound to individual requests in HTTP or RPC handlers or similar.

  • less common ones, that may be derived from either server- or request-scoped contexts:

    • session scoped: oriented around a persistent user connection over multiple operations
    • pipeline scoped: a long set of batch operation that may outlive the initiating request

Contexts remain valid for all of these situations, and the documentation indicating otherwise does it a disservice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants