New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make sure blocked channels run operations in FIFO order #11506

Closed
rsc opened this Issue Jul 1, 2015 · 12 comments

Comments

Projects
None yet
@rsc
Contributor

rsc commented Jul 1, 2015

I've had mail conversations with two different projects using Go recently, both centered around the surprise that if many goroutines are blocked on a particular channel, and that channel becomes available, it is still possible for running goroutines who "drive by" at the right time to get their operation in before the goroutines that are blocked, and it is possible for the goroutines who are blocked to be reordered as a result. If this happens repeatedly, the effect is that the blocked goroutines can block arbitrarily long even though the channel is known to be ready at regular intervals. I wonder if we should adjust the channel implementation to ensure that when a channel does become available for sending or receiving, blocked operations take priority over "drive by" operations. It seems to me that this can be done by completing the blocked operation as part of the operation that unblocks it.

Sends and receives on unbuffered channels already behave this way: a send with blocked receivers picks the first receiver goroutine off the queue, delivers the value to it, readies it (puts it on a run queue), and continues execution. That is, the send completes the blocked operation. Receives on unbuffered channels similarly complete blocked sends.

Sends and receives on buffered channels do not behave this way:

A send into a buffered channel with blocked receivers stores the value in the buffer, wakes up a receiver, and continues executing. When the receiver is eventually scheduled, it checks the channel, and maybe it gets lucky and the value is still there. But maybe not, in which case it goes to the end of the queue.

A receive out of a buffered channel copies a value out of the buffer, wakes a blocked sender, and continues. When the sender is eventually scheduled, it checks the channel, and maybe it gets lucky and there is still room in the buffer for the send. But maybe not, in which case it goes to the end of the queue.

It seems to me that it would be easy and symmetric with the unbuffered channel operations for a send with blocked receivers to deliver the value straight to a receiver, completing the first pending receive operation. Similarly a receive with blocked senders could take its own value out of the channel buffer and then complete the send into the newly emptied space. That would be the only operation of the four that needs to transfer a pair of values instead of just one.

It doesn't seem to me that it would hurt performance to do this, and in fact it might help, since there would not be all the unnecessary wakeups that happen now. It would give much more predictable behavior when goroutines queue up on a stuck channel: once queued, that's the order they're going to be served, guaranteed.

From a "happens before" point of view, I'm talking nonsense. There is no "happens before" for two different goroutines blocking on the same channel, and so there is no difference between all the execution orders. I understand that. But from a "what actually happens" point of view, there certainly is a difference: if you know that the goroutines have been stacking up one per minute for an hour before the channel finally gets unblocked and they all complete, you know what order they blocked and might reasonably expect them to unblock in that same order. This remains roughly as true at shorter time scales.

Especially since so many Go programs care about latency, making channels first come, first served seems worth doing. And it looks to me like it can be done trivially and at no performance cost. The current behavior looks more like a bug than an intended result.

I'm not talking about a language change, just a change to our implementation.

Thoughts?

@robpike @dvyukov @randall77 @aclements

@rsc rsc self-assigned this Jul 1, 2015

@rsc rsc added this to the Go1.6Early milestone Jul 1, 2015

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase Jul 1, 2015

Contributor

I would make this change. The current behavior sounds unfriendly to me, and years past I used a similar policy for locks and waits (in Java) and thought it gave good behavior. Starvation is bad.

Contributor

dr2chase commented Jul 1, 2015

I would make this change. The current behavior sounds unfriendly to me, and years past I used a similar policy for locks and waits (in Java) and thought it gave good behavior. Starvation is bad.

@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Jul 2, 2015

Contributor

Language semantics require that the values be FIFO, but the implications of that with multiple goroutines reading the values are unspecified and muddy at best. That said, I believe the claim here that it seems more natural for blocked ones to go first; it just seems intuitive.

I think you could even make a case that the current situation is almost a bug.

There may be a noticeable performance hit for heavy contention, though, because I believe this style requires asymptotically more context switches. I still think channels should behave as you suggest.

Contributor

robpike commented Jul 2, 2015

Language semantics require that the values be FIFO, but the implications of that with multiple goroutines reading the values are unspecified and muddy at best. That said, I believe the claim here that it seems more natural for blocked ones to go first; it just seems intuitive.

I think you could even make a case that the current situation is almost a bug.

There may be a noticeable performance hit for heavy contention, though, because I believe this style requires asymptotically more context switches. I still think channels should behave as you suggest.

@i3d

This comment has been minimized.

Show comment
Hide comment
@i3d

i3d Jul 2, 2015

Contributor

I'd fully support this. I think this is a more "natural" result that I was expecting from one of the discussion threads we had.

Contributor

i3d commented Jul 2, 2015

I'd fully support this. I think this is a more "natural" result that I was expecting from one of the discussion threads we had.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Jul 2, 2015

Contributor

I already have a CL for basically exactly this.

https://go-review.googlesource.com/#/c/9345/

Contributor

randall77 commented Jul 2, 2015

I already have a CL for basically exactly this.

https://go-review.googlesource.com/#/c/9345/

@aclements

This comment has been minimized.

Show comment
Hide comment
@aclements

aclements Jul 2, 2015

Member

From a specification perspective, it's true that our current approach satisfies the happens-before graph, but it does not satisfy liveness. That's a formally defined, reasonable, and desirable property we could specify as a requirement of channels without specifying something as specific as FIFO blocking.

Implementation-wise, I'm concerned about making blocking strictly FIFO order. This seems like the exact same mistake as making map order deterministic or goroutine scheduling deterministic. People can and will come to depend on this and eventually we may want to weaken it. In particular, this is asking for scalability problems. One of the cardinal rules that came out of my thesis was to not impose strict ordering on operations unless it is a fundamental requirement. It is by no means a fundamental requirement of the blocking order.

I fully support addressing the liveness issue, and I like the approach of eliminating the wake-up race, but if we're going to do that, we should think about whether we can introduce bounded randomness into the blocking order to prevent people from depending on strict FIFO blocking.

Member

aclements commented Jul 2, 2015

From a specification perspective, it's true that our current approach satisfies the happens-before graph, but it does not satisfy liveness. That's a formally defined, reasonable, and desirable property we could specify as a requirement of channels without specifying something as specific as FIFO blocking.

Implementation-wise, I'm concerned about making blocking strictly FIFO order. This seems like the exact same mistake as making map order deterministic or goroutine scheduling deterministic. People can and will come to depend on this and eventually we may want to weaken it. In particular, this is asking for scalability problems. One of the cardinal rules that came out of my thesis was to not impose strict ordering on operations unless it is a fundamental requirement. It is by no means a fundamental requirement of the blocking order.

I fully support addressing the liveness issue, and I like the approach of eliminating the wake-up race, but if we're going to do that, we should think about whether we can introduce bounded randomness into the blocking order to prevent people from depending on strict FIFO blocking.

@cespare

This comment has been minimized.

Show comment
Hide comment
@cespare

cespare Jul 2, 2015

Contributor

@aclements FIFO blocking seems harder to incorrectly rely on than map ordering. Perhaps, like goroutine scheduling, it should only be randomized with -race?

Contributor

cespare commented Jul 2, 2015

@aclements FIFO blocking seems harder to incorrectly rely on than map ordering. Perhaps, like goroutine scheduling, it should only be randomized with -race?

@sougou

This comment has been minimized.

Show comment
Hide comment
@sougou

sougou Jul 2, 2015

Contributor

This change will improve tail latency during bursty load, which is what we observed in one of our benchmarks in vitess.
If accepted, should the language specification be amended to reflect this behavior?

Contributor

sougou commented Jul 2, 2015

This change will improve tail latency during bursty load, which is what we observed in one of our benchmarks in vitess.
If accepted, should the language specification be amended to reflect this behavior?

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase Jul 2, 2015

Contributor

Flip a coin (linear feedback shift register), choose one of the top two, till we come up with a better reason for some other policy.Starvation is exponentially rare, service is nearly fifo, but order will usually differ if there's more than one waiter.

Contributor

dr2chase commented Jul 2, 2015

Flip a coin (linear feedback shift register), choose one of the top two, till we come up with a better reason for some other policy.Starvation is exponentially rare, service is nearly fifo, but order will usually differ if there's more than one waiter.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Jul 2, 2015

Contributor

I think we're overthinking the randomization. With randomized goroutine scheduling (which is already going in, at least for -race testing), we'll get randomized channel ordering for free. The random scheduling should change the order in which waiters get queued and the order in which sends get matched to the waiters.

Contributor

randall77 commented Jul 2, 2015

I think we're overthinking the randomization. With randomized goroutine scheduling (which is already going in, at least for -race testing), we'll get randomized channel ordering for free. The random scheduling should change the order in which waiters get queued and the order in which sends get matched to the waiters.

@dvyukov

This comment has been minimized.

Show comment
Hide comment
@dvyukov

dvyukov Jul 2, 2015

Member

You seem to assume that the entity that benefits from reduced latency is a goroutine only. This is true when a goroutine services a request and needs to acquire some aux resource (e.g. a database connection). But this is not true for all producer-consumer/pipelining scenarios, where the entity that benefits from reduced latency is a message in a chan. Today messages in chans are serviced strictly FIFO and with minimal latency: the next running goroutine picks up the first message in a chan. What you propose improves the database connection pool scenario but equally worsens the producer-consumer scenario. Because under your proposal we handoff the first message in the chan to a goroutine that will run who-knows-when, and a goroutine that drives by the chan next moment either blocks or services the second message ahead of the first one. If we switch the implementation we will start receiving complaints about the other scenarios.
Also sase synchronization primitives is usually the wrong level for fairness. They can't ensure user-perceived fairness. Consider that to service a request you need to do 10 DB queries. DB pool has fair
queue, however older requests (that do 10-th query ) complete with newer requests (that do first query). As the result some requests can experience no waiting time, while others can wait for a hundred of requests in total.
What you propose is known to significantly reduce performance due to requirement of lock-step scheduling order. Which becomes significantly worse if you add a bunch of unrelated goroutines to the mix, so that you have lock-step with a very long step time.
Regarding the unnecessary wakeup, it's easy to fix. See #8900.

Member

dvyukov commented Jul 2, 2015

You seem to assume that the entity that benefits from reduced latency is a goroutine only. This is true when a goroutine services a request and needs to acquire some aux resource (e.g. a database connection). But this is not true for all producer-consumer/pipelining scenarios, where the entity that benefits from reduced latency is a message in a chan. Today messages in chans are serviced strictly FIFO and with minimal latency: the next running goroutine picks up the first message in a chan. What you propose improves the database connection pool scenario but equally worsens the producer-consumer scenario. Because under your proposal we handoff the first message in the chan to a goroutine that will run who-knows-when, and a goroutine that drives by the chan next moment either blocks or services the second message ahead of the first one. If we switch the implementation we will start receiving complaints about the other scenarios.
Also sase synchronization primitives is usually the wrong level for fairness. They can't ensure user-perceived fairness. Consider that to service a request you need to do 10 DB queries. DB pool has fair
queue, however older requests (that do 10-th query ) complete with newer requests (that do first query). As the result some requests can experience no waiting time, while others can wait for a hundred of requests in total.
What you propose is known to significantly reduce performance due to requirement of lock-step scheduling order. Which becomes significantly worse if you add a bunch of unrelated goroutines to the mix, so that you have lock-step with a very long step time.
Regarding the unnecessary wakeup, it's easy to fix. See #8900.

@RLH

This comment has been minimized.

Show comment
Hide comment
@RLH

RLH Jul 2, 2015

Contributor

I like the imagery of a drive by goroutine a lot, the literature also uses
the boring term barging. To put a number on the cost of fair locks Doug Lea
(http://gee.cs.oswego.edu/dl/papers/aqs.pdf) reports a 1 to 2 orders of
magnitude slowdown over locks that allow barging. They
(java.util.concurrency circa 2004) also stepped back from a definition of
fair to be less than a strict FIFO.
In the intervening decade since that paper Hannes Payer has done some
promising work on queues with even weaker semantics such as a allowing a
pop to return some value near, as opposed to at, the front of the queue.
I believe we can build fairness on top of weaker unfair semantics so not
cooking the stronger semantics into Go is the way forward.

On Thu, Jul 2, 2015 at 7:52 AM, Dmitry Vyukov notifications@github.com
wrote:

You seem to assume that the entity that benefits from reduced latency is a
goroutine only. This is true when a goroutine services a request and needs
to acquire some aux resource (e.g. a database connection). But this is not
true for all producer-consumer/pipelining scenarios, where the entity that
benefits from reduced latency is a message in a chan. Today messages in
chans are serviced strictly FIFO and with minimal latency: the next
running goroutine picks up the first message in a chan. What you
propose improves the database connection pool scenario but equally worsens
the producer-consumer scenario. Because under your proposal we handoff the
first message in the chan to a goroutine that will run who-knows-when, and
a goroutine that drives by the chan next moment either blocks or services
the second message ahead of the first one. If we switch the implementation
we will start receiving complaints about the other scenarios.
Also sase synchronization primitives is usually the wrong level for
fairness. They can't ensure user-perceived fairness. Consider that to
service a request you need to do 10 DB queries. DB pool has fair
queue, however older requests (that do 10-th query ) complete with newer
requests (that do first query). As the result some requests can experience
no waiting time, while others can wait for a hundred of requests in total.
What you propose is known to significantly reduce performance due to
requirement of lock-step scheduling order. Which becomes significantly
worse if you add a bunch of unrelated goroutines to the mix, so that you
have lock-step with a very long step time.
Regarding the unnecessary wakeup, it's easy to fix. See #8900
#8900.


Reply to this email directly or view it on GitHub
#11506 (comment).

Contributor

RLH commented Jul 2, 2015

I like the imagery of a drive by goroutine a lot, the literature also uses
the boring term barging. To put a number on the cost of fair locks Doug Lea
(http://gee.cs.oswego.edu/dl/papers/aqs.pdf) reports a 1 to 2 orders of
magnitude slowdown over locks that allow barging. They
(java.util.concurrency circa 2004) also stepped back from a definition of
fair to be less than a strict FIFO.
In the intervening decade since that paper Hannes Payer has done some
promising work on queues with even weaker semantics such as a allowing a
pop to return some value near, as opposed to at, the front of the queue.
I believe we can build fairness on top of weaker unfair semantics so not
cooking the stronger semantics into Go is the way forward.

On Thu, Jul 2, 2015 at 7:52 AM, Dmitry Vyukov notifications@github.com
wrote:

You seem to assume that the entity that benefits from reduced latency is a
goroutine only. This is true when a goroutine services a request and needs
to acquire some aux resource (e.g. a database connection). But this is not
true for all producer-consumer/pipelining scenarios, where the entity that
benefits from reduced latency is a message in a chan. Today messages in
chans are serviced strictly FIFO and with minimal latency: the next
running goroutine picks up the first message in a chan. What you
propose improves the database connection pool scenario but equally worsens
the producer-consumer scenario. Because under your proposal we handoff the
first message in the chan to a goroutine that will run who-knows-when, and
a goroutine that drives by the chan next moment either blocks or services
the second message ahead of the first one. If we switch the implementation
we will start receiving complaints about the other scenarios.
Also sase synchronization primitives is usually the wrong level for
fairness. They can't ensure user-perceived fairness. Consider that to
service a request you need to do 10 DB queries. DB pool has fair
queue, however older requests (that do 10-th query ) complete with newer
requests (that do first query). As the result some requests can experience
no waiting time, while others can wait for a hundred of requests in total.
What you propose is known to significantly reduce performance due to
requirement of lock-step scheduling order. Which becomes significantly
worse if you add a bunch of unrelated goroutines to the mix, so that you
have lock-step with a very long step time.
Regarding the unnecessary wakeup, it's easy to fix. See #8900
#8900.


Reply to this email directly or view it on GitHub
#11506 (comment).

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented Oct 23, 2015

CL https://golang.org/cl/9345 mentions this issue.

@gopherbot gopherbot closed this in 8e496f1 Nov 5, 2015

gopherbot pushed a commit that referenced this issue Nov 8, 2015

runtime: simplify chan ops, take 2
This change is the same as CL #9345 which was reverted,
except for a small bug fix.

The only change is to the body of sendDirect and its callsite.
Also added a test.

The problem was during a channel send operation.  The target
of the send was a sleeping goroutine waiting to receive.  We
basically do:
1) Read the destination pointer out of the sudog structure
2) Copy the value we're sending to that destination pointer
Unfortunately, the previous change had a goroutine suspend
point between 1 & 2 (the call to sendDirect).  At that point
the destination goroutine's stack could be copied (shrunk).
The pointer we read in step 1 is no longer valid for step 2.

Fixed by not allowing any suspension points between 1 & 2.
I suspect the old code worked correctly basically by accident.

Fixes #13169

The original 9345:

This change removes the retry mechanism we use for buffered channels.
Instead, any sender waking up a receiver or vice versa completes the
full protocol with its counterpart.  This means the counterpart does
not need to relock the channel when it wakes up.  (Currently
buffered channels need to relock on wakeup.)

For sends on a channel with waiting receivers, this change replaces
two copies (sender->queue, queue->receiver) with one (sender->receiver).
For receives on channels with a waiting sender, two copies are still required.

This change unifies to a large degree the algorithm for buffered
and unbuffered channels, simplifying the overall implementation.

Fixes #11506

Change-Id: I57dfa3fc219cffa4d48301ee15fe5479299efa09
Reviewed-on: https://go-review.googlesource.com/16740
Reviewed-by: Ian Lance Taylor <iant@golang.org>

@golang golang locked and limited conversation to collaborators Nov 4, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.