Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: use structured concurrency #29011

Closed
smurfix opened this issue Nov 29, 2018 · 21 comments

Comments

Projects
None yet
@smurfix
Copy link

commented Nov 29, 2018

It is increasingly apparent that unstructured goroutines tend to behave, conceptually, like the "go to" statement which Edgar Dijkstra famously Considered Harmful. A good introduction to the problem is
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/

Other languages also start to "get it", e.g. Kotlin:
https://medium.com/@elizarov/structured-concurrency-722d765aa952

In my experience, contrasting Python+asyncio ("go") vs. Python+Trio ("structured"), thinking about coroutines in a "structured" way (and, importantly, having runtime support for enforcing said structure) helps avoid a whole class of bugs and is very helpful WRT structuring code in a way that can simplify problem spaces immensely.

As one possible and not-too-disruptive step towards structured concurrency I would like to propose that in Go 2, "go func()" shall return some opaque value which must be assigned to some variable (possibly of built-in type goroutine). The idea is that at the point where that variable goes out of scope, the Go runtime shall wait until the associated goroutine has terminated. Thus, if required for compatibility, an easy way to get the current behavior would be to assign to a global, or append to a global list.

Other, more disruptive implementations are of course possible; in particular, other languages use the concept of a CoroutineScope (Kotlin) / TaskGroup (Python 3.8) / Nursery (Trio) which all coroutines must be attached to, primarily so that they may be cancelled when another coroutine runs into an error that requires the whole group to end prematurely (or regularly – one interesting example is the Happy Eyeballs algorithm for opening a TCP connection, where the terminating condition is "a coroutine successfully opens a connection"). The obvious problem is that Go so far doesn't have any way to generically signal a goroutine to please terminate itself, but maybe somebody else has an idea how to do that.

@gopherbot gopherbot added this to the Proposal milestone Nov 29, 2018

@gopherbot gopherbot added the Proposal label Nov 29, 2018

@ianlancetaylor ianlancetaylor changed the title Proposal: Go 2: Use structured concurrency proposal: Go 2: use structured concurrency Nov 29, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2018

I think these ideas are definitely interesting. But your specific suggestion would break essentially all existing Go code, so that is a non-starter.

@bcmills

This comment has been minimized.

Copy link
Member

commented Nov 29, 2018

The obvious problem is that Go so far doesn't have any way to generically signal a goroutine to please terminate itself, but maybe somebody else has an idea how to do that.

We typically use a context.Context for that. (See also #28342.)

(The errgroup package coordinates errors from goroutines with Context cancellation. Perhaps it should be moved into the standard library?)

@smurfix

This comment has been minimized.

Copy link
Author

commented Nov 30, 2018

@bcmills Thanks; not yet being too familiar with Go I missed that (and errgroup). Interestingly, errgroup already does almost everything we'd need for structured concurrency, except with a lot of required boilerplate.

Thus, as a first approximation, I'd add these bits:

  • Store the current group within the context; add a method to retrieve it.
  • The background context already contains an errgroup.Group, which main runs in. For backwards compatibility, every go func() also (lazily) creates its own top-level group, because terminating main needs to be able to abort goroutines instead of waiting for them.
  • There now is no need to explicitly create a group that's not already a member of some other group, which allows you to …
  • Add implicit contexts, instead of passing them around. This requires context.Context.Run (f func()) and context.Current() methods and, most likely, a bit of compiler support.
  • You now can add some nice syntax to simplify errgroup.Current(context.Current()).Go(func). go2 func(args…)? ;-)
  • You now can add ubiquitous cancellation support: add context.Current().Done() to every select statement ever.
  • As soon as that's done, code analysis is able to determine whether it's safe to cancel "plain" goroutines instead of aborting them when main ends.
@njsmith

This comment has been minimized.

Copy link

commented Nov 30, 2018

I'm not at all an expert on Go, but @smurfix pointed me to this because I wrote that "considered harmful" article that's been going around. (Apologies to anyone who's had it deal with it being waved around like it was some kind of brutal takedown of Go; that was never my intention.)

If someone (unwisely) put me in charge of figuring out how to retrofit structured concurrency to Go, I'd also look hard at ways to make errgroup, and its two components of cancellation support and error propagation, as minimal-boilerplate and ubiquitous as possible. Errgroup is already pretty verbose, and my uninformed impression is that making cancellation and error propagation easier are pretty high up most Go programmer's desiderata regardless of the whole structured concurrency thing.

To toot my own horn a bit more, this post has thoughts on making cancellation more usable, and specifically discusses the C#/Go cancel token model.

@bcmills

This comment has been minimized.

Copy link
Member

commented Nov 30, 2018

Thus, as a first approximation, I'd add these bits:

  • Store the current group within the context; add a method to retrieve it.

Some more detail on the use-cases for this would be helpful. If you're writing synchronous functions, you can nearly always start a subgroup instead.

(See the first section of my 2018 GopherCon talk for more depth on this point.)

  • The background context already contains an errgroup.Group, which main runs in. For backwards compatibility, every go func() also (lazily) creates its own top-level group, because terminating main needs to be able to abort goroutines instead of waiting for them.

Programs that don't want to wait for goroutines can already call os.Exit to terminate immediately. (I would not expect that to change with a more structured approach to concurrency.)

  • Add implicit contexts, instead of passing them around. This requires context.Context.Run (f func()) and context.Current() methods and, most likely, a bit of compiler support.

Is that more-or-less #21355?

@smurfix

This comment has been minimized.

Copy link
Author

commented Nov 30, 2018

Store the current group within the context; add a method to retrieve it.

After thinking about this a bit more, IMHO it's better to be explicit about the (current) group.

  • Rename errgroup.WithGroup to GoGroup (or whatever) and let it return just the new Group.
  • The "child" context can be implicitly passed to the goroutines started using errgroup.Group.Go; the parent doesn't need it.

Programs that don't want to wait for goroutines can already call os.Exit to terminate immediately

Sure, but the current semantics is that falling off the end of main aborts all running goroutines, and changing that would be a "many programs will still compile but no longer terminate" problem which arguably is worse than breaking compilation.

Is that more-or-less #21355?

Yes. Most of the problems described there do not apply when you limit the "implicit context" to structured concurrency. The traditional "go func()" will still require an explicit ctx argument (if that context is not in static scope anyway).

@smurfix

This comment has been minimized.

Copy link
Author

commented Nov 30, 2018

Also, I'd like to add another reason why implicit contexts should be part of the language. Math. Do you round up, down, to zero or to even? Is creating and/or using NaN and/or Infa fatal error? How accurate are trigonometric resultss required to be?
You can*t pass an explicit context to a multiplication …

@creker

This comment has been minimized.

Copy link

commented Nov 30, 2018

You can*t pass an explicit context to a multiplication

I don't want to pass any context to a multiplication, explicit or not. Why would you even want to do that?

@njsmith

This comment has been minimized.

Copy link

commented Dec 1, 2018

He's talking about IEEE-754 floating point operations, which have a bunch of configuration switches that are passed implicitly, like which rounding mode you want to use, and whether certain operations should generate a SIGFPE. CPUs all support these, people doing numeric work sometimes want to set these, and I guess in theory Go might want to allow people to change the FP settings within a single goroutine without those settings leaking out to affect other goroutines that might be scheduled onto the same OS thread.

I doubt this is a high priority for the Go devs though, and even if it was I doubt you'd want to manage this through Go's Context object, and even if you did then it doesn't have much to do with structured concurrency, so maybe better to stay on topic.

@njsmith

This comment has been minimized.

Copy link

commented Dec 1, 2018

The most fundamental idea in "structured concurrency" is that if you encourage people to structure their code so that goroutine lifetimes are bounded by the lifetime of the creating function, then this has many benefits.

Right now you can structure things this way in Go, but it's way more cumbersome than just typing go myfunc(), so Go ends up encouraging the "unstructured" style. In the analogy, it's like a language that has both for loops and go to, but every time you use a for loop you have to type an extra 3 lines of code. So just some better sugar would help. Ideally the structured form should be as easy to use as go myfunc(), and statically distinguishable so the more extremist have the option of using a linter to outlaw unstructured concurrency within their codebase.

One of the major advantages of structured concurrency is that it helps with error propagation, because when a goroutine exits with an error, it means you have somewhere to propagate that error to.

And then the other big thing is the connection to cancellation – structured concurrency and standardized ubiquitous cancellation support go really well together. If your concurrency is structured, then that makes it much easier to implicitly propagate cancellation: if you cancel the parent function, of course that should propagate to the children whose lifetimes are bound to it. And, if you want to automatically propagate errors out of children, then you need a way to automatically unwind the siblings, which cancellation provides.

I think the arguments for making cancellation a first-class language feature are compelling in any case (e.g. it's seriously unfortunate that in Go 1, socket operations not only use a totally different cancellation system than everything else, but that hooking socket cancellation up to Context cancellation is extremely convoluted). But structured concurrency makes it even more compelling, and more viable.

@creker

This comment has been minimized.

Copy link

commented Dec 1, 2018

I don't like the idea of implicit cancellation propogation. It would be better if compiler/runtime could somehow implement it automatically or provide some default behavior but I don't think that's possible. In all cases cancellation is implemented by hand and very much depends on the specific operation. Semantics are also different. Like, how exactly operation is cancelled, when, is it even possible to cancel it properly or it will just hang around in the background and discard any results it produces.

That's where I have a problem. With explicit cancellation propagation I have a clear contract that this specific operation supports cancellation. I don't even need documentation. With implicit propagation I first have to know, does the operation even support cancellation. Documentation could be missing, out of date or even misleading. I often would have to read through the source code which might be very complex. As a library consumer/implementer I would have to constantly question, do I have to implement cancellation everywhere?

What I'm in support of is making cancellation a built-in type like error. That way we at least solve the problem of stuttering and long type name but also encourage people to use it, as it's now an actual language feature, not some library helper.

@jonas-schulze

This comment has been minimized.

Copy link

commented Dec 1, 2018

Using the error approach, if a function has a (something, error) return value, the runtime could inject the cancellation by replacing

  • any successful return (error == nil) of still-running functions
  • any new call to a similar function

by the proposed "canceled" error at any point of intervention (e.g. where it would otherwise schedule a different goroutine). All we would need is to execute all deferred statements (thus encouraging something like defer cleanup(), if really needed).

This might be too much implicit magic, though. Also, there are some use cases where defers are replaced for performance reasons, so some new optional defer (= handle cancelation) is needed anyways. Alternatively, one may get rid of the defer stack and have the compiler hard code it on all return points and only call the defer stack on cancellation at runtime intervention. The "fixed" (= having defers removed) code would still be broken, but the reason to replace them in the first place would be gone.

@jonas-schulze

This comment has been minimized.

Copy link

commented Dec 1, 2018

Having this cancellation behavior should be compatible with a nursery described by Python+Trio, it should even allow for an implicit one.

@njsmith

This comment has been minimized.

Copy link

commented Dec 1, 2018

Yes, by far the hardest part of built-in cancellation is deciding which operations are cancellable.

You definitely don't want to make every schedule point a cancellation point. It's essentially impossible to write code that works correctly under these circumstances. Java and Windows have both tried, and both got burned, badly.

On the other hand, the core reason you need cancellation is to terminate operations that otherwise would never terminate, so the minimal set of cancel points is: every operation that can block indefinitely. Enabling cancellation for just this set seems to work well in practice.

Fortunately, these are generally I/O primitives like reading from a socket, so can already fail and return an error, and so it's not a huge stretch to add "cancelled" as a new error type that they can return. And then unwinding state to handle cancellation uses the regular error unwinding code you need to write anyway.

The article I mentioned earlier covers all this in more detail.

@creker

This comment has been minimized.

Copy link

commented Dec 1, 2018

Having cancellation implemented at preemption points would put too much of a burden on a programmer. You now have to think about a possibility of being cancelled at any point and should write correct clean-up code everywhere to support that. It brings memories of C++ exceptions where you have pretty much the same problem and many peoples code is not exception safe.

It would constrain the runtime implementation. There's a plan to implement preemptive scheduling which would probably eliminate these preemption points. Cancellation would also suffer from the same problems that current scheduler does - preemption is not possible in busy loops, for example. Inserting preemption points in loops doesn't work great, as it seems.

I think safe points for cancellation should be defined by a programmer, not injected by the compiler.

these are generally I/O primitives like reading from a socket

There're other places where you would like cancellation. For example, channel operations and select. Current cancellation context fits into this as it's Done method returns a channel you can wait on.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2018

There are likely good ideas in the area of structured concurrency that we can do better at, in the language or the standard library or both. This specific proposal, though, just proposes one technique: a way to simplify waiting for a goroutine to complete. It's not clear that we should modify the language to support that specific case while ignoring other cases.

In particular if we change the go statement to return a value (which in practice would have to be optional) then there are quite a few things we could do with that value. Is this one idea the best choice?

I think we need to have a larger discussion about structured concurrency, like at least trying to enumerate all the interesting cases, before it makes sense to propose specific language changes.

@njsmith

This comment has been minimized.

Copy link

commented Jan 6, 2019

@ianlancetaylor

I think we need to have a larger discussion about structured concurrency, like at least trying to enumerate all the interesting cases, before it makes sense to propose specific language changes.

Yeah, that makes a lot of sense. Where do these discussions generally happen?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 7, 2019

@njsmith There are various forums--see https://golang.org/wiki/Questions. For this the golang-nuts mailing list is probably the best place.

@dansouza

This comment has been minimized.

Copy link

commented Feb 11, 2019

FWIW: it is considered good practice in Go to never create a goroutine if you don't have a plan for it to terminate, so if you're doing it the Go way, all your goroutines finish at some point, or are cancellable (through context.Context).

Also Go already provides a mechanism to group goroutines into a logical group and know when all of them have finished their work (akin to Trio, as far as I understood), the sync.WaitGroup API, it's just more explicit - but there's nothing keeping one from creating helpers.

@sustrik

This comment has been minimized.

Copy link

commented Feb 15, 2019

For the people interested in the topic: We've created a forum to discuss structured concurrency in cross-language way. Feel free to add insights from the Go's point of view.

https://trio.discourse.group/c/structured-concurrency

@jab

This comment has been minimized.

Copy link

commented Jun 2, 2019

Just found my way here via a link from https://trio.discourse.group/t/structured-concurrency-in-golang/174. Linking back there from here in case anyone is interested in continuing to discuss over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.