New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make chan-based generators faster #8903

Open
dvyukov opened this Issue Oct 7, 2014 · 0 comments

Comments

Projects
None yet
2 participants
@dvyukov
Member

dvyukov commented Oct 7, 2014

In some cases parallelism is not needed/desirable -- fine-grained chan-based
generators/iterators, or "coroutine-like" approach when you have several
independent stacks and manually switch between them.
Sync channels substantially limit parallelism, to the point where pipeline-like
parallelization does not work. But they do not eliminate it entirely, as after an
operation both producer and consumer are runnable. Current implementation handles this
"contact bounce" under-parallelization very inefficiently -- goroutines are
constantly scheduled and de-scheduled, threads are constantly parked and unparked, work
migrate between threads, etc.

The idea is to limit parallelism on sync channels even more aggressively. So that
fine-grained communication over sync channels can be very efficient. But w/o eliminating
concurrency nor affecting semantics.

Implementation:
1. Each P has an additional local work queue, let's call it Domain. Work form Domain
can't be stolen by other P's. Domain is has maximum size of 4 goroutines.
2. When a goroutine unblocks another goroutine on a sync chan, it adds the other
goroutine to own Domain (Domain of the current P) instead of putting it to the normal
work queue. This effectively limits parallelism. We can use Domains in other situations,
but this is outside of context of this bug.
3. When a goroutine blocks, P schedules one of the goroutines from Domain (if it's not
empty). In some circumstances (e.g. a goroutine is preempted due to time slice
exhaustion, or blocks in syscall) P can decide to just move all goroutines in the Domain
to normal work queue.
Then, *before* a goroutine blocks on a sync chan it saves Hchan and elem pointers in G
and switches to another goroutine in Domain *once*. Or if it sees that a goroutine in
the Domain want to accomplish a pairing operation on the same chan, it executes the
operation with that other goroutine.

This allows to implement tight communication over sync channels with the cost of just
several function calls and w/o several degradation with GOMAXPROCS>1. Communication
domains form and break up dynamically. Program semantics are not affected.

Here is a prototype implementation:
https://golang.org/cl/74180043/

I've benchmarked it on a real application where chan-based generators were abounded due
to forementioned reasons:
https://groups.google.com/d/msg/golang-dev/0IElw_BbTrk/cGHMdNoHGQEJ
The change provides 17.5x speedup (and it's not the limit as runtime.switchto can be
significantly more efficient).

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment