You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases parallelism is not needed/desirable -- fine-grained chan-based
generators/iterators, or "coroutine-like" approach when you have several
independent stacks and manually switch between them.
Sync channels substantially limit parallelism, to the point where pipeline-like
parallelization does not work. But they do not eliminate it entirely, as after an
operation both producer and consumer are runnable. Current implementation handles this
"contact bounce" under-parallelization very inefficiently -- goroutines are
constantly scheduled and de-scheduled, threads are constantly parked and unparked, work
migrate between threads, etc.
The idea is to limit parallelism on sync channels even more aggressively. So that
fine-grained communication over sync channels can be very efficient. But w/o eliminating
concurrency nor affecting semantics.
Implementation:
1. Each P has an additional local work queue, let's call it Domain. Work form Domain
can't be stolen by other P's. Domain is has maximum size of 4 goroutines.
2. When a goroutine unblocks another goroutine on a sync chan, it adds the other
goroutine to own Domain (Domain of the current P) instead of putting it to the normal
work queue. This effectively limits parallelism. We can use Domains in other situations,
but this is outside of context of this bug.
3. When a goroutine blocks, P schedules one of the goroutines from Domain (if it's not
empty). In some circumstances (e.g. a goroutine is preempted due to time slice
exhaustion, or blocks in syscall) P can decide to just move all goroutines in the Domain
to normal work queue.
Then, *before* a goroutine blocks on a sync chan it saves Hchan and elem pointers in G
and switches to another goroutine in Domain *once*. Or if it sees that a goroutine in
the Domain want to accomplish a pairing operation on the same chan, it executes the
operation with that other goroutine.
This allows to implement tight communication over sync channels with the cost of just
several function calls and w/o several degradation with GOMAXPROCS>1. Communication
domains form and break up dynamically. Program semantics are not affected.
Here is a prototype implementation:
https://golang.org/cl/74180043/
I've benchmarked it on a real application where chan-based generators were abounded due
to forementioned reasons:
https://groups.google.com/d/msg/golang-dev/0IElw_BbTrk/cGHMdNoHGQEJ
The change provides 17.5x speedup (and it's not the limit as runtime.switchto can be
significantly more efficient).
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: