Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
runtime: Add loosely ordered channels? #16364
What version of Go are you using (
What operating system and processor architecture are you using (
What did you do?
I ran a benchmark to see how much time it is needed to process N elements using multiple cores. So what benchmark below does is it runs "myvalue += 1" N times in each of 8 goroutines for both consumer and producer threads and checks the results.
Generally in some cases it would be great to use a single channel for distributing load (in this case, adding "1") among workers and actually get job done faster when you use more cores if the operations themselves do not take much time.
It is not achievable with current channels because they imply ordering constraints for events and sometimes you don't need that. So I suggest to consider (maybe?) adding loosely-ordered channels that would allow one to both reduce channel send and receive cost as well as allowing them to scale.
In this "+1" example the only solution that actually benefits from adding more cores is the sharded channel one.
What did you expect to see?
I would really like for channels to scale when more cores are used instead of them slowing down. I do not believe it is possible with current channel constraints so option to allow creation of loosely-ordered channels would be nice instead.
What did you see instead?
You can see that the only solution that scales (ns/op decreases when you add more cores) is sharded channel one. I have 4 physical cores and 8 logical ones so do not pay too much attention to results of 8 threads.
As far as I can see, the only difference between a loosely-ordered channel and an unbuffered channel would be that with a loosely-ordered channel it would be unpredictable whether the goroutine reading from the channel would be able to see memory writes done by the goroutine sending on the channel before the actual send. My apologies if I misunderstand.
First I would say that I think that would be very difficult to implement. Any use of channels implies locking. So (I think) you are suggesting that the channel implementation should be rewritten to use only relaxed memory reads and writes.
Second, my first reaction is that these would be very hard to use correctly. C++ has many different kinds of atomic operations, and they are extremely hard for non-experts to use correctly. We explicitly do not want to emulate that in Go.
Sorry, Ian, I am not really sure about why memory writes would not be seen by a reader side, but maybe you are right that it can be possible if there are no memory barriers when working with a loosely-ordered channel.
The simplest implementation of loosely-ordered channel is just a sharded one, basically. So you still need to take a mutex when trying to read or write to each shard in this case so all the memory guarantees are the same as for a mutex or a channel.
The only downside of sharded channel is that if distribution is not even enough then some shards would not have any entries while others could have too many. So if that happens you might just try to "steal" entries from other shards (e.g. take random shard and try to get entries from there). If you did not manage to find any entries for reasonable amount of tries (e.g. 3 tries) then you could force shards rebalancing (take a mutex per each shard and then shuffle elements around).
All of that will work well if you have a huge stream of events which, in my opinion, is not uncommon to try to process in go.
So, I forgot to mention why it is even a proposal for go runtime. Go does not have generics and does not have good means to block when there are no events to get. So it would be really ugly if implemented in go. I saw one (strange) attempt to do this here: http://zhen.org/blog/ring-buffer-variable-length-low-latency-disruptor-style/
I'm sorry, I don't understand what you mean by a loosely-ordered channel. Can you explain more precisely?
In Go it is already possible for many goroutines to read from a single channel (and for many goroutines to write to a single channel) so I don't understand what a sharded channel would look like.
Are you suggesting that when a buffered channel has many readers, we implement several different buffers for the channel, and let each goroutine read from one buffer? Thus there would theoretically be less lock contention? How would we decide when to use multiple buffers?
By loosely-ordered channel I understand a channel that does not guarantee FIFO. What it means more specifically is that two writers could write "a" and "b" to a channel (in that order) and readers could get it as "b" and "a". It is also possible that "a" and "b" could be written at the same moment in time (e.g. in less than 1 cycle of a time difference from different cores) so there is no sensible way to even define what order of events here means. It is a weaker guarantee than a FIFO and allows to receive and send events with much higher throughput.
Yes, if implemented and used properly :). Basically any "proper usage" would mean that you need several (e.g. 4+) goroutines trying to rcv/send to a channel at the same time.
If you decide to use multiple buffers at runtime then it would break FIFO guarantee of a channel so it must be specified when doing make(...). Very limited suggestion for syntax would be
I would suggest looking at this problem from the following standpoint: there used to be an issue with garbage collector latency that was too high sometimes. You, as a program developer, could split your executable into several instances and shard data manually between instances if it was possible. Sometimes it is not as easy to shard data though so programs that really need huge heaps, a lot of connections, or both, had to find other ways around it to achieve reasonable GC latency.
My suggestion about adding opportunity to specify the fact that you do not care about order of events allows you to avoid doing custom sharding for channels when channel communication is very convenient but becomes a bottleneck. Sometimes distribution of events would be uneven so you need to have some kind of rebalancing sometimes. And the further you go with it the more obvious it becomes that it might be better to solve this problem once and share it with everyone :)
I like this idea and would take it further.
I suggest having a set of channel types (or annotations) that allow users to make tradeoffs when they know what they are doing.
For example, let's have different optimized channel implementations for SPSC, SPMC, and MPSC situations[*].
In general this would let us 'respect the developer' a bit more, instead of assuming a worst-case MPMC situation.
[*] For future searchers: SPSC means Single-producer/Single-consumer, SPMC means Single-producer/Multi-consumer, MPSC means Multi-producer/Single-consumer.