proposal: sync: mechanism to select on condition variables #16620

bradfitz · 2016-08-05T17:07:17Z

Sometimes condition variables are the best fit for a problem, and sometimes channels are. They don't compose well, though.

I've increasingly been finding myself in a position where I would like to select on both channel(s) and a condition variable. This has come up a few times in http2, where I use condition variables for flow control, but there are plenty of channels (including the net/http.Request.Cancel channel) in play too. Sometimes I need to both wait for flow control, or wait for a user to cancel their request, or the peer side to close their connection, etc.

I would love to be able to select on a condition variable, like if the sync.Cond type had a method:

// WaitChan returns a channel which receives a value when awoken
// by Broadcast or Signal. Unlike Wait, WaitChan does not unlock
// or lock the c.L mutex.
func (c *Cond) WaitChan() <-chan struct{} {
     // ....
}

The semantics would be the receiving from it acts like a runtime_Syncsemacquire on c.sema. We'd need to define what happens if another case in the select fires first. (does the semacquire get canceled?).

Thoughts?

The text was updated successfully, but these errors were encountered:

bradfitz · 2016-08-05T17:07:59Z

/cc @ianlancetaylor @griesemer @randall77 @josharian

ianlancetaylor · 2016-08-05T17:29:01Z

It seems to me that you can turn a condition variable into a channel by writing

c := make(chan struct{})
go func() {
    cond.L.Lock()
    cond.Wait()
    c <- struct{}{}
}()

I'm sure I'm missing something, but what?

bradfitz · 2016-08-05T17:39:00Z

Yeah, except then I can leak goroutines if I'm not careful. I also pay the additional 4+ KB cost per select, which can be blocking for some time.

bradfitz · 2016-08-05T20:29:46Z

I would still reacquire the mutex afterwards.

Instead of this example from our docs:

c.L.Lock()
for !condition() {
    c.Wait()
}
... make use of condition ...
c.L.Unlock()

I would write:

c.L.Lock()
for !condition() {
    c.L.Unlock()
    select {
    case <-ctx.Done():
    case <-c.WaitChan():
    }
    c.L.Lock()
}
... make use of condition ...
c.L.Unlock()

bradfitz · 2016-08-05T20:57:16Z

There's nothing atomically special about the Cond.Wait method. It's just an unlock, wait, and lock:

https://github.com/golang/go/blob/release-branch.go1.7/src/sync/cond.go#L53

I don't see how this is any more of a footgun than anybody using the sync package in general. Or anybody using two goroutines and sharing global state without the sync package. Bad Go code is full of data races.

I'm proposing this as a mechanism for people who do know what they're doing. Not as some recommended high-level mechanism.

minux · 2016-08-05T22:49:49Z

A more generic solution is perhaps for the compiler to pass stack frame size hints to the runtime. for example, in Ian's example, the compiler could recognize that the function will usually use very little stack, even less than the minimum stack, so it could hint the runtime to start the goroutine with a smaller stack. In general, if we make the compiler determine the initial stack size for a new goroutine, could that solve at least half of the problem? (goroutine leaking might still occur, but that's another issue.)

bradfitz · 2016-08-05T23:02:03Z

Kinda. Really I want a magic type of hchan (sentinel value of hchan.elemtype) which makes hchan.buf mean it's a pointer to a notifyList, and have the select implementation on entry to select do the notifyListAdd, and then if separate case fires in select, we do something like a notifyListRemove, and unregister that waiter.

Otherwise I'm afraid of the leaks where my condition variable is woken up (because, say, I selected on my context.Done()), but then I find that my condition is satisfied and I exit the function, and never wait on the condition variable again. I need to tell the cond.Wait running in the goroutine to stop. Which makes me think it's better done in the implementation of select with a special channel type.

This wouldn't be a language change. But it would involve modifying the runtime to support an API addition.

minux · 2016-08-05T23:13:26Z

could you design a user facing sync.Cond API for this?

we need to automatically lock cond.L when the channel
unblocks and the unlock needs to happen for the goroutine
that is actually winning the race. This seems too magical
to me. Maybe I'm missing something.

bradfitz · 2016-08-05T23:15:40Z

@minux, I covered all this above.

ianlancetaylor · 2016-09-01T19:57:59Z

Something that has come to trouble me about this: channels are level-triggered. A channel has a value or it does not, and values do not disappear until somebody reads them.

Condition variables are edge-triggered. A call to the Signal method is a trigger that wakes up anything that is waiting right now. If nothing is waiting, nothing happens.

If we turn an edge-triggered mechanism into a level-triggered mechanism, then we either do something laborious to drain the level (read the value from the channel) if nothing needs it, or we have too many triggers.

This is particularly obvious when we consider the Broadcast method. The Broadcast method is easy to understand in terms of edge-triggering: everything that is waiting right now is alerted. Implementing that was level-triggering, without race conditions, is much more complex. Do we wake all select statements waiting on the channel right now? What value do we hand them?

The select statement, unlike channels, can be either edge-triggered or level-triggered. It would be meaningful to design some way to attach an edge trigger to a select statement. It just wouldn't go through a channel.

So I think this is a fairly subtle change.

bcmills · 2016-09-08T21:38:36Z

@ianlancetaylor

channels are level-triggered. A channel has a value or it does not, and values do not disappear until somebody reads them.

default cases convert between "edge-triggered" and "level-triggered" events on channels. (For example, consider the usage of time.Ticker.C.)

In particular, a send-with-default converts an edge-triggered event into a level-triggered event by "latching" on the first edge-crossing. A receive-with-default converts a level-triggered event to an edge-triggered event by "unlatching" the channel when the edge-trigger no longer applies.

The Broadcast method is easy to understand in terms of edge-triggering: everything that is waiting right now is alerted. Implementing that was level-triggering, without race conditions, is much more complex. Do we wake all select statements waiting on the channel right now? What value do we hand them?

On a Broadcast, we would wake all select statements that have evaluated the case to receive it (and reacquire the lock before unblocking the case, which would either require a language change or a very deep runtime hook).

With the "channel as cond-var" pattern we can implement without such a hook, Broadcast corresponds to close (and unfortunately you end up needing to allocate a new channel every time you Broadcast).

With a built-in mechanism, we could do even better: we could have the Signal and Broadcast operations receive values, and the Wait operation return the value from the corresponding signal. (If we want something compatible with sync.Cond, I would propose that we send c.L as the value.) You could imagine using the same (or a similar) mechanism for implementing futures natively: a future is a selectable that can be sent to exactly once but received from any number of times. And there are also some interesting possibilities involving Broadcast with a channel (or another Cond) as a parameter...

The tricky part, I think, is constructing the API such that either the sender or the receiver can abandon a case after it has marked the other side as "ready".

funny-falcon · 2016-10-16T15:23:37Z

I add proposal for "future" internal type #17466 , and it looks similar (in spirit) to this proposal.

rsc · 2016-11-07T21:49:50Z

After implementation of #8898 we'd necessarily have the runtime support for alternate kinds of channels. At that point it might make sense to try this.

josharian · 2016-12-04T21:28:19Z

If we do this, we might also want to extend it to sync.Lockers. I'm working with some code now where I have a sync.Mutex that might be held by someone else for a long time and a context.Context to respect. For the same reasons as Brad already outlined, I'd like to be able to use a select to either acquire the mutex or read from ctx.Done().

select {
case <-mu.C():
	// do work
	mu.Unlock()
case <-ctx.Done():
	// clean up, mu is not locked
}

funny-falcon · 2016-12-04T23:40:50Z

I'd like to be able to use a select to either acquire the mutex or read from ctx.Done().

@josharian Looks like channel with buffer of size 1:

Lock() is write to channel,
Unlock() is read from.

we'd necessarily have the runtime support for alternate kinds of channels.

Then, why not implement "future"?

But, perhaps I didn't quite understand it. Excuse me then.

eaburns · 2017-06-29T12:14:20Z

Sorry to stumble in, but this edge/level-trigger goroutine leak issue reminds me of an old golang-dev thread [1] about "non-parallel channels". On that thread, I (admittedly without much thought) proposed a new way to make a channel/goroutine pair where the two are intimately tied, such that when all receivers are garbage collected, the goroutine can also be garbage collected. It's a language change, and not fully-thought-out, but perhaps some such thing could present a more general solution to this type of goroutine leak without needing special handling for select or timer-specific channels.

[1] https://groups.google.com/d/msg/golang-dev/VoswK7OBckY/-4ZPi3XoSI4J

cespare · 2017-07-22T03:15:18Z

I keep running into this when plumbing context through things.

For example, I have an implementation of a connection pool which is cleanly expressed with a mutex and a sync.Cond. If there are no available connections nor capacity to create new ones, a caller waits to borrow with cond.Wait. I want to be able to pass a context.Context through and cancel the wait, so it would be nice to select on both cond.Wait and ctx.Done().

bcmills · 2017-07-24T18:11:46Z

@cespare Are you running into significant cases where you can't easily replace the sync.Cond with a channel (for example, due to too much overhead from allocating the channel)? If so, could you describe them in a bit more detail?

zombiezen · 2017-07-24T19:09:01Z

IIUC, @bcmills, you are proposing that folks do something like https://play.golang.org/p/zyi-LBlyBN?

That would solve the use-case I stumbled upon this issue for. But I wouldn't say it's obvious. :) I have no idea about the runtime efficiency.

bcmills · 2017-07-24T19:40:38Z

@zombiezen Provided that you can keep the locked:unlocked ratio low (to reduce contention), I would only use a mutex for the Cond, not for the Mutex itself.

But I think the example you linked is a bit too simple anyway: the Cond usage you're showing there looks more like Signal than Broadcast, and for that kind of usage the channel can function as both the Mutex and the Cond: https://play.golang.org/p/beoQFQEx0e

bcmills · 2017-07-26T21:47:12Z

@jba sync.Cond itself is not subject to spurious wakeups, but if you're using the same Cond to signal crossings of many independent thresholds, each waiter gets signaled for potentially many threshold crossings that aren't the one it is interested in.

cespare · 2017-09-05T21:05:45Z

@bcmills following up on my July comments:

@cespare Are you running into significant cases where you can't easily replace the sync.Cond with a channel (for example, due to too much overhead from allocating the channel)? If so, could you describe them in a bit more detail?

Let's set aside efficiency for now (though in some cases, as others have pointed out, that's definitely a relevant concern).

I took one of my sync.Cond use cases, which is a connection pool that I mentioned over in #21165 (comment), and extracted out some the interesting stuff into an example that I can share. (The example does not include the full complexity, but I think it's interesting enough to discuss.)

Check it out here: https://github.com/cespare/misc/tree/master/condchan

I included the sync.Cond-based implementation (which is what we use in our real code) and I also wrote a channel-based implementation modeled on your demo code at https://play.golang.org/p/beoQFQEx0e.

I'd be interested in what you think about the code. Do you see simplifications that preserve the desired properties that I listed?

I found that writing the channel-based version was a fun and interesting exercise, and I had to think pretty hard to come up with something that works (and to convince myself that it's correct). I had a bunch of previous iterations that had fairly subtle bugs. I'm sure that the cond-based solution is also bug-prone in this way, but on balance, I feel like the channel-based version is significantly trickier/subtler.

bcmills · 2017-09-06T17:23:57Z

I'd be interested in what you think about the code. Do you see simplifications that preserve the desired properties that I listed?

Yes: cespare/misc#1

I found that writing the channel-based version was a fun and interesting exercise, and I had to think pretty hard to come up with something that works

Agreed: a typical CS education covers the usage of condition variables, so using channels instead can make it harder to map the problems we find to known patterns.

EdSchouten · 2018-10-04T19:57:46Z

Also running into this issue, where I have a gRPC call that waits for a state transformation on a server and reports changes. If clients cancel these blocking/waiting RPCs, I currently leave goroutines behind, waiting on the condition variable.

What about having a func (c *Cond) WaitWithContext(context.Context) error? By using context objects, support for timeouts comes for free.

bcmills · 2018-10-04T21:27:29Z

@EdSchouten, is there a reason you can't replace the condition variable with a channel?

See my talk on Rethinking Classical Concurrency Patterns: the section on condition variables starts on slide 37, and there is more detail on patterns for broadcast events in the backup slides (101-105).

virtuald · 2018-10-11T16:01:35Z

I just found an implementation of a condition variable using channels at https://github.com/anacrolix/missinggo/blob/master/chancond.go . It's similar to some of the backup slides in @bcmills talk referenced above. While it seems useful for event notifications, because of its locking behaviors it would be prone to race conditions if you tried to use it for some of the things one might traditionally use a condition variable for.

jonas-jasas · 2019-01-28T16:16:54Z

Cancellable Cond package that implements all sync.Cond methods and passes all sync.Cond tests.
https://gitlab.com/jonas.jasas/condchan

Please write your comments about it.

rogpeppe · 2019-01-28T17:36:38Z

@jasajona I'm not convinced, I'm afraid. It can panic when Signal is called at the same time as Broadcast.

For example, this code panics immediately on my machine:

package main

import (
	"gitlab.com/jonas.jasas/condchan"
)

func main() {
	var cond condchan.CondChan
	go func() {
		for {
			cond.Signal()
		}
	}()
	go func() {
		for {
			cond.Broadcast()
		}
	}()
	select {}
}

Concurrency code is tricky!

jonas-jasas · 2019-01-28T19:41:21Z

Thank you for pointing out a bug! Fixed it and made additional test:
https://gitlab.com/jonas.jasas/condchan/commit/96de89d3cf0e4cd6b14e519d06ddc87fc4c365cf
That was a small optimization that caused this bug. Please share if you have any other ideas how to break CondChan lib!

smasher164 · 2020-11-23T01:55:47Z

Is the evaluation of this proposal still on hold for #8898?

ianlancetaylor · 2020-11-23T16:19:19Z

@smasher164 I suppose so. But I'm not sure that there is a clear consensus that this proposal should be implemented. See also #21165.

mei-rune · 2022-07-20T00:21:19Z

#9578

Jille · 2023-05-27T11:46:24Z

Both https://github.com/anacrolix/missinggo/blob/master/chancond.go and https://gitlab.com/jonas.jasas/condchan have a race condition where they can drop a Signal() if the waiting thread isn't yet receiving from the channel. (I filed an issue).

For those needing it, I've implemented a Cond with WaitContext(): https://pkg.go.dev/github.com/Jille/contextcond

benhoyt · 2023-10-06T03:32:26Z

For those watching this, the shiny new context.AfterFunc function and the Cond example in the docs help solve this. I've just updated some tedious code that manages a map of waiter channels to use a context.AfterFunc that calls cond.Broadcast, and it's a lot simpler and seems to work nicely.

I couldn't actually use context.AfterFunc directly as this project is still on Go 1.20, but I simulated it based on this. Once we've updated to Go 1.21 I'll just swap that out for context.AfterFunc proper.

Jille · 2023-10-06T08:36:48Z

@benhoyt That only works if you can Broadcast your semaphore every time one context gets cancelled. If you use Signal instead of Broadcast, context.AfterFunc won't solve this.

bradfitz added the Proposal label Aug 5, 2016

bradfitz changed the title ~~sync: mechanism to select on conditional variables~~ proposal: sync: mechanism to select on conditional variables Aug 5, 2016

This comment has been minimized.

Sign in to view

quentinmit added this to the Proposal milestone Aug 8, 2016

funny-falcon mentioned this issue Oct 16, 2016

Proposal: add "future" internal type (similar to channel) #17466

Closed

bradfitz added the Proposal-Hold label Nov 7, 2016

rsc changed the title ~~proposal: sync: mechanism to select on conditional variables~~ proposal: sync: mechanism to select on condition variables Apr 10, 2017

bcmills mentioned this issue Jun 8, 2017

sync: add example for Cond #20491

Open

bcmills mentioned this issue Sep 7, 2017

proposal: context.WithSignal #21521

Closed

lukechampine mentioned this issue Sep 26, 2017

rewrite the whole upload process NebulousLabs/Sia#2354

Merged

bradfitz mentioned this issue Dec 21, 2017

proposal: sync: provide Cond.TimedWait(d time.Duration) #23211

Closed

djherbis mentioned this issue Jan 7, 2018

Add Stream.Cancel() djherbis/stream#5

Merged

scrye mentioned this issue Feb 26, 2018

Implement infra Transport for normal UDP scionproto/scion#1418

Merged

nvanbenschoten mentioned this issue Oct 1, 2018

perf: kv falls off a cliff at ~12 nodes cockroachdb/cockroach#26178

Closed

bcmills mentioned this issue Oct 24, 2018

proposal: Go 2: non-reference channel types #28366

Open

gandro mentioned this issue Mar 11, 2020

container/ring: Fix various data races and race conditions cilium/hubble#177

Merged

sea-postkassa pushed a commit to sea-postkassa/condchan that referenced this issue Jun 23, 2021

Bug: golang/go#16620 (comment)

96de89d

aojea mentioned this issue Mar 1, 2022

cacher: don't accept requests if stopped kubernetes/kubernetes#108414

Merged

mei-rune mentioned this issue Jul 29, 2022

Can add SubmitWithContext(ctx context.Context, task func(ctx context.Context)) error panjf2000/ants#237

Closed

DeedleFake mentioned this issue Oct 27, 2022

proposal: x/exp/future: new package to implement futures #56461

Open

flycash mentioned this issue Nov 2, 2022

syncx.Cond: Wait(time.Duration) 调用 ecodeclub/ekit#116

Closed

benhoyt mentioned this issue Oct 3, 2023

feat(state): add core support for Pebble Notices canonical/pebble#295

Merged

eikemeier mentioned this issue Apr 22, 2024

proposal: sync/atomic: Add Chan atomic channel #66960

Open

proposal: sync: mechanism to select on condition variables #16620

proposal: sync: mechanism to select on condition variables #16620

Comments

bradfitz commented Aug 5, 2016

bradfitz commented Aug 5, 2016

ianlancetaylor commented Aug 5, 2016

bradfitz commented Aug 5, 2016

This comment has been minimized.

bradfitz commented Aug 5, 2016

This comment has been minimized.

bradfitz commented Aug 5, 2016

This comment has been minimized.

minux commented Aug 5, 2016 via email

bradfitz commented Aug 5, 2016

minux commented Aug 5, 2016 • edited Loading

bradfitz commented Aug 5, 2016

ianlancetaylor commented Sep 1, 2016

bcmills commented Sep 8, 2016

funny-falcon commented Oct 16, 2016

rsc commented Nov 7, 2016

josharian commented Dec 4, 2016

funny-falcon commented Dec 4, 2016

eaburns commented Jun 29, 2017

cespare commented Jul 22, 2017

bcmills commented Jul 24, 2017

zombiezen commented Jul 24, 2017 • edited Loading

bcmills commented Jul 24, 2017 • edited Loading

bcmills commented Jul 26, 2017

cespare commented Sep 5, 2017

bcmills commented Sep 6, 2017

EdSchouten commented Oct 4, 2018

bcmills commented Oct 4, 2018

virtuald commented Oct 11, 2018

jonas-jasas commented Jan 28, 2019

rogpeppe commented Jan 28, 2019 • edited Loading

jonas-jasas commented Jan 28, 2019

smasher164 commented Nov 23, 2020

ianlancetaylor commented Nov 23, 2020

mei-rune commented Jul 20, 2022

Jille commented May 27, 2023

benhoyt commented Oct 6, 2023

Jille commented Oct 6, 2023

minux commented Aug 5, 2016 •

edited

Loading

zombiezen commented Jul 24, 2017 •

edited

Loading

bcmills commented Jul 24, 2017 •

edited

Loading

rogpeppe commented Jan 28, 2019 •

edited

Loading