Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upproposal: clone and splice, new channel primitives #26282
Comments
robpike
added
the
Proposal
label
Jul 9, 2018
gopherbot
added this to the Proposal milestone
Jul 9, 2018
josharian
added
LanguageChange
Go2
labels
Jul 9, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
urandom
Jul 10, 2018
Are there any more details on how the clone works? Say that we have the following code:
var parent, cl chan int
go func() {
for {
select {
case i := <-parent:
fmt.Println(i)
}
}
}()
go func() {
for {
select {
case i := <-cl:
fmt.Println("Clone", i)
if i == 3 { return }
}
}
}()
for i := 0; i <5; i++ {
if i == 2 {
cl = clone parent
}
parent <- i
}Would that print something like:
0
1
2
Clone 2
3
Clone 3
4
I.e. would writing 4 to parent block since reading from cl stopped after receiving 3? How would things change if parent was buffered, would it matter at all? What if parent wasn't read from at all, would writing to it block if there are active clones?
Also, I have no idea what splice does from the description :)
urandom
commented
Jul 10, 2018
•
|
Are there any more details on how the clone works? Say that we have the following code: var parent, cl chan int
go func() {
for {
select {
case i := <-parent:
fmt.Println(i)
}
}
}()
go func() {
for {
select {
case i := <-cl:
fmt.Println("Clone", i)
if i == 3 { return }
}
}
}()
for i := 0; i <5; i++ {
if i == 2 {
cl = clone parent
}
parent <- i
}Would that print something like:
I.e. would writing Also, I have no idea what |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kardianos
Jul 10, 2018
Contributor
If I read it right, splice c1 c2 makes c1 point to c2 so a write to c1 can be read from c2.
|
If I read it right, splice c1 c2 makes c1 point to c2 so a write to c1 can be read from c2. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
urandom
Jul 10, 2018
@kardianos
So this is useful for cases where you want to weave two 'streams' into one, as it work. I.e. you read from c2, and write to both c1 and c2 and get the values from both channels while you read, as they are written? Or the reverse of clone as it were?
urandom
commented
Jul 10, 2018
•
|
@kardianos |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jimmyfrasche
Jul 10, 2018
Member
edit: discussion moved to #26343
I'd like to add drain(c) for consideration.
drain(c) would be roughly equivalent to the current idiom of
go func() {
for range c {
}
}()
except that drain doesn't create a goroutine: it just has c discard any receives in the scheduler without blocking the sender and causes other attempts to receive from c to panic. It's kind of like close for readers.
The code it replaces follows a fairly simple pattern but, even if the compiler recognized it, it wouldn't be able to perform the same optimization as that relies on the certainty that c can no longer be read from in another goroutine.
This is less useful than splice or clone but this seems like a good place to talk about additional channel primitives that would be more efficient and less fragile as builtins.
Also, if the drained channel was created by clone it could be removed from the fan out logic since the runtime knows that no one's listening, so it is somewhat more related than it appears.
|
edit: discussion moved to #26343 I'd like to add
except that The code it replaces follows a fairly simple pattern but, even if the compiler recognized it, it wouldn't be able to perform the same optimization as that relies on the certainty that This is less useful than Also, if the drained channel was created by |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
neild
Jul 11, 2018
Contributor
Would splice c1 c2 require that c1 and c2 have the same element type, or merely compatible ones? Can I splice a chan int64 into a chan time.Duration? A chan int32?
|
Would |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
I would expect that c1 and c2 would need identical channel types. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
urandom
Jul 11, 2018
@jimmyfrasche
When a channel is set to be drained, would that include a special syntax on the write side that would inform the writer that the channel is being drained and no longer processing the data?
if drained := c <- someData; drained {
return
}
urandom
commented
Jul 11, 2018
|
@jimmyfrasche if drained := c <- someData; drained {
return
}
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
urandom
Jul 11, 2018
Also, if the channel being drained is a clone, would the stream automatically stop sending to it?
urandom
commented
Jul 11, 2018
|
Also, if the channel being drained is a clone, would the stream automatically stop sending to it? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jimmyfrasche
Jul 11, 2018
Member
Never said this explicitly: It would only be safe to use drain when there was a single reader (possibly of a cloned chan) the same way it's only safe to close a chan when there's a single writer.
@urandom drain(clone(c)) would cancel out. Whatever runtime voodoo handles the fan out necessary to support clone could immediately drop any drained channel. That's just an optimization, but it seems like a useful/easy one.
I suppose you would need to detect a drained channel. There's something disquieting about making the send statement into an expression, but detection would need to be paired with a send.
If you couldn't detect it, something like
go func() {
for id := 1;; id++ {
c <- id
}
}()
would just spin indefinitely. Though that is the case now if you create a drain goroutine manually. If this example detected a drain, it could just return.
That makes me wonder about something like
for {
select {
case drainedChan <- 1:
case regularChan <- 2:
}
}
drainedChan can always send but regularChan might not be able to. To keep the semantics you'd need to hit the drainedChan case but you may need to prioritize sends to regularChan since it's the only one doing work outside of the goroutine. In this example, if you could detect that the chan was drained, you could set the chan to nil.
(I'm still waiting for someone smarter than me to explain why this is a terrible idea, though
|
Never said this explicitly: It would only be safe to use drain when there was a single reader (possibly of a cloned chan) the same way it's only safe to close a chan when there's a single writer. @urandom I suppose you would need to detect a drained channel. There's something disquieting about making the send statement into an expression, but detection would need to be paired with a send. If you couldn't detect it, something like
would just spin indefinitely. Though that is the case now if you create a drain goroutine manually. If this example detected a drain, it could just return. That makes me wonder about something like
(I'm still waiting for someone smarter than me to explain why this is a terrible idea, though |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
neild
Jul 11, 2018
Contributor
Channels currently pass data in only one direction. Being able to detect a drained channel would make that bidirectional. It seems simpler and more consistent for a drain to be precisely equivalent to go func() { for _ = range ch { } )() (aside from not creating a new goroutine).
|
Channels currently pass data in only one direction. Being able to detect a drained channel would make that bidirectional. It seems simpler and more consistent for a drain to be precisely equivalent to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
robpike
Jul 11, 2018
Contributor
This has become a discussion about an operation not even mentioned by the proposer. If you want to talk about drain, which is a fine thing to talk about, please open a new issue for that.
|
This has become a discussion about an operation not even mentioned by the proposer. If you want to talk about drain, which is a fine thing to talk about, please open a new issue for that. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jimmyfrasche
Jul 11, 2018
Member
@robpike sorry for the hijack. To return to the original proposal, the definition of clone seems to imply infinitely-buffered channels.
I would have expected cloning a channel to behave something more like:
For an unbuffered chan, c, c <- v blocks until c and all clones of c have received v.
For a buffered chan, c, c <- v blocks if c or any clones of c have filled their buffer.
|
@robpike sorry for the hijack. To return to the original proposal, the definition of clone seems to imply infinitely-buffered channels. I would have expected cloning a channel to behave something more like: For an unbuffered chan, For a buffered chan, |
jimmyfrasche
referenced this issue
Jul 12, 2018
Closed
proposal: new channel primitive, drain(c) #26343
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jimmyfrasche
Jul 12, 2018
Member
What happens if I do this:
splice(c1, c2)
close(c1)
c2 <- v
Do receives on c2 always succeed instantly getting a zero value unless there's a blocked send on c2?
Does c2 close as well?
Does it take c1 out of the equation leaving c2 as it was before the splice, effectively unsplicing the two?
Does it just panic?
|
What happens if I do this:
Do receives on c2 always succeed instantly getting a zero value unless there's a blocked send on c2? Does c2 close as well? Does it take c1 out of the equation leaving c2 as it was before the splice, effectively unsplicing the two? Does it just panic? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bakul
Jul 15, 2018
Splice:
The use case of splice seems to be this: a filter that receives data from c1 and sends some subset of, or transformed, data out to c2. If at some point the filter just wants to passthrough data, doing no filtering, splice c1 c2 would patch through the data. That is anything sent to c1 can be read from c2, allowing the filter goroutine to get out of the way (or terminate). If so, any buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).
Clone:
Is c1 := clone c0; c2 := clone c1 equivalent to c1 := clone c0; c2 :- clone c0? Assuming no intervening reads.
Given a channel with an N item bluffer, the leading item can not be discarded until all the receiver pick it up, right? Would it make sense to add an operation to expand buffering?
This is not strictly a clone operation since not all the functionality is cloned. It is more of a fanout. There can be a corresponding fanin (which sort of overlaps with the splice operation). Conceptually:
cA, Cb, Cc, ... := fanout cR
fanin cA, cB, Cc .. cW
For fanout, cR can be a readonly channel. For fanin, cW can be a write only channel.
Fanout from cR means anything written to the other end of cR can be read from cA, cB, cC etc.
Fanin to cW means anything written to cA, cB, cC etc. can be read from the other end of cW.
An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel). I can envision
cIn, cOut := unsplice c
go myFilter(cIn, cOut)
The effect is as follows:
before: producer =>c=> consumer
after: producer =>c ~~ cIn=> filter =>cOut ~~ c=> consumer
The reader and writer of c need not be aware of the existence of the filter tap. Any buffered data in c at the time of unsplice can be read from cIn by the filter (and not by the consumer of c), since a decision to insert a filter can be triggered based on received items from c.
PS:
- I used fanin & fanout to make the symmetry explicit. I don't care what actual keywords get used.
- I think Go should've generated read and write end explicitly (just like unix pipe(2)). Something like
cIn, cOut := make(chan int, ...). Too late now. Another useful function may have been to allow creating bidirectional channels. E.g.c1, c2 := make(chan int, 5, "bidirectional")so that writes on c1 get read from c2 and vice versa. This would make unsplice even more useful (e.g. inserting an protocol or encryption module based on negotiation). - Would've preferred it if these things were possible to implement in user code, rather than build in the language.
bakul
commented
Jul 15, 2018
|
Splice: Clone: Given a channel with an N item bluffer, the leading item can not be discarded until all the receiver pick it up, right? Would it make sense to add an operation to expand buffering? This is not strictly a clone operation since not all the functionality is cloned. It is more of a fanout. There can be a corresponding fanin (which sort of overlaps with the splice operation). Conceptually:
For fanout, cR can be a readonly channel. For fanin, cW can be a write only channel. An equally interesting operation would be the inverse of
The effect is as follows:
The reader and writer of c need not be aware of the existence of the filter tap. Any buffered data in c at the time of unsplice can be read from cIn by the filter (and not by the consumer of c), since a decision to insert a filter can be triggered based on received items from c. PS:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jimmyfrasche
Jul 15, 2018
Member
[A]ny buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).
That does seem reasonable to me and a bit more user-friendly. I have been wondering about that restriction, too. Maybe there's a good reason for it that I'm not seeing.
This is not strictly a clone operation since not all the functionality is cloned.
That's true. Perhaps split would be a more descriptive name. (And split/splice is a fun word combo.)
fanin/fanout
This could be achieved by multiple calls to splice/clone.
An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel).
Could you file a separate proposal for that?
Would've preferred it if these things were possible to implement in user code, rather than build in the language.
The equivalent functionality is possible in user code, but it requires additional channels and goroutines. This proposal would avoid that overhead and make it simpler to build more complex topologies.
That does seem reasonable to me and a bit more user-friendly. I have been wondering about that restriction, too. Maybe there's a good reason for it that I'm not seeing.
That's true. Perhaps
This could be achieved by multiple calls to
Could you file a separate proposal for that?
The equivalent functionality is possible in user code, but it requires additional channels and goroutines. This proposal would avoid that overhead and make it simpler to build more complex topologies. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bakul
Jul 17, 2018
Could you file a separate proposal for that?
I am just exploring features that may dovetail with this proposal at the moment.
The equivalent functionality is possible in user code, but it requires additional channels and goroutines.
What I was getting at is to be able to implement something similar in user code without needing additional channels or goroutines. The proposed changes seem non-trivial and are perhaps better suited as a library component of some sort. Plus I would like to be able to avoid context switching overhead for simple but useful operations. For example:
func sum(a, b <-chan int) <-chan int {
c := make(chan int, ...)
go func() for { c <- (<- a) + (<- b) }()
return c
}
[yeah, I know this has problems; the code is just to illustrate the idea]
Conceptually this can be implemented without a goroutine: a receive on c will result in a receive on a nd b and if either blocks, so does the receive on c. The receive returns when both a and b yield a value and a sum can be written into c. Symmetrically, if a value is written into a or b, the sum can be done apriori and written into c. Syntax wise c := a + b captures this perfectly but I don't expect any Go authors to go for that!
IMHO a pipe is a better abstraction than an iterator in other languages and it would be nice if it is available without needing to use goroutines. Since this proposal started from a suggestion by Doug McIlroy, I couldn't help wondering if we can make Go channels as easy to use as pipes are in a shell!
On the other hand I can't think of a lower level building block to do this as yet.
bakul
commented
Jul 17, 2018
I am just exploring features that may dovetail with this proposal at the moment.
What I was getting at is to be able to implement something similar in user code without needing additional channels or goroutines. The proposed changes seem non-trivial and are perhaps better suited as a library component of some sort. Plus I would like to be able to avoid context switching overhead for simple but useful operations. For example:
[yeah, I know this has problems; the code is just to illustrate the idea] IMHO a pipe is a better abstraction than an iterator in other languages and it would be nice if it is available without needing to use goroutines. Since this proposal started from a suggestion by Doug McIlroy, I couldn't help wondering if we can make Go channels as easy to use as pipes are in a shell! On the other hand I can't think of a lower level building block to do this as yet. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
CAFxX
Jul 18, 2018
Contributor
Regarding splice-like functionality: I have sometimes encountered the case in which I have two channels, one from which I'm supposed to read and another I'm supposed to write to but -and here's the tricky bit- only when the two operations (receive and send) are not going to block. Something similar to this, but in the following example the send on writeCh could block so it's not quite what I needed:
for {
select {
// ... other cases ...
case e := <-readCh:
// ...optionally drop/manipulate e here...
writeCh <- e
}
}
There are some ways in which this can be written to satisfy the non-blocking requirements but they tend to get hairy and in any case they (in general) require buffering at least one element (even if both channels are unbuffered) or creating dedicated "channel pump" goroutines.
I am not sure if this is a common enough scenario, but I wanted to submit it for consideration anyway because, as I mentioned above, I found myself in this situation a couple of times. I am aware that refactoring the code would solve the problem: to my credit, in both cases, when I encountered this the channels were provided by libraries, so there wasn't much I could do.
|
Regarding
There are some ways in which this can be written to satisfy the non-blocking requirements but they tend to get hairy and in any case they (in general) require buffering at least one element (even if both channels are unbuffered) or creating dedicated "channel pump" goroutines. I am not sure if this is a common enough scenario, but I wanted to submit it for consideration anyway because, as I mentioned above, I found myself in this situation a couple of times. I am aware that refactoring the code would solve the problem: to my credit, in both cases, when I encountered this the channels were provided by libraries, so there wasn't much I could do. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pepijndevos
Jul 18, 2018
I've ran into the exact problem @CAFxX described, and indeed solved it by completely refactoring everything. But I think it'd be really useful to have completely unbuffered pipelines. May I suggest the following extension to splice:
splice a b func
Which would be of type
chan a, chan b, func(b) a
This essentially imlements a completely unbuffered map operation on channels. Maybe that opens a whole new can of worms about things like filter and reduce, but nonetheless I think it is useful to be able to write pipelines without implicit buffers.
pepijndevos
commented
Jul 18, 2018
|
I've ran into the exact problem @CAFxX described, and indeed solved it by completely refactoring everything. But I think it'd be really useful to have completely unbuffered pipelines. May I suggest the following extension to splice:
Which would be of type
This essentially imlements a completely unbuffered |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bcmills
Aug 21, 2018
Member
This essentially [implements] a completely unbuffered
mapoperation on channels.
Polymorphic functions (#15292) seem like a better fit for that than broad built-in operators.
The operators that @robpike proposes interact with channel internals in an interesting way: they potentially allow the runtime to alias buffers between channels.
A splice variant that feeds through a transform function, in contrast, requires an extra allocation for each channel (to store the transformation function) and does not allow repeated operations to collapse: you can't skip any function in the chain. That makes it different enough that I wouldn't want to call it the same operator.
Polymorphic functions (#15292) seem like a better fit for that than broad built-in operators. The operators that @robpike proposes interact with channel internals in an interesting way: they potentially allow the runtime to alias buffers between channels. A |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
j7b
Aug 26, 2018
The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?), and generally unnecessary, for example https://play.golang.org/p/YaqdWEwcxQa eliminates Copy by giving rat and channels thereof a common interface. On a related note, the operations on series as implemented don't need a split that spawns goroutines, at most one term of a source series needs to be "buffered" and the reads happen in a well-defined order. The common interface was inspired by a footnote in "Squinting at Power Series" and the inference about terms and order is from same (I feel indebted to the author, his work made objecting to this proposal one of the most challenging (the paper's well above my pay grade) and rewarding experiences I've had as a humble student of computer science).
The clone operation as proposed makes a key property of channels, synchronization, ambiguous. If a shared buffer is what's desired, a channel is a great way to synchronize access to it, as the original implementation of Split does (where the buffer is basically a linked list of goroutine stacks guarded by channels used as mutexes). Arbitrarily large channel capacity and demand-based allocation of channel buffers could also be a useful alternative to clone, #20352 proposes explicit creation of channels of infinite capacity.
My thanks again to the proposer and for powser[12].go.
j7b
commented
Aug 26, 2018
|
The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?), and generally unnecessary, for example https://play.golang.org/p/YaqdWEwcxQa eliminates Copy by giving rat and channels thereof a common interface. On a related note, the operations on series as implemented don't need a split that spawns goroutines, at most one term of a source series needs to be "buffered" and the reads happen in a well-defined order. The common interface was inspired by a footnote in "Squinting at Power Series" and the inference about terms and order is from same (I feel indebted to the author, his work made objecting to this proposal one of the most challenging (the paper's well above my pay grade) and rewarding experiences I've had as a humble student of computer science). The clone operation as proposed makes a key property of channels, synchronization, ambiguous. If a shared buffer is what's desired, a channel is a great way to synchronize access to it, as the original implementation of Split does (where the buffer is basically a linked list of goroutine stacks guarded by channels used as mutexes). Arbitrarily large channel capacity and demand-based allocation of channel buffers could also be a useful alternative to clone, #20352 proposes explicit creation of channels of infinite capacity. My thanks again to the proposer and for powser[12].go. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
networkimprov
commented
Sep 15, 2018
•
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ianlancetaylor
Oct 2, 2018
Contributor
Presumably after executing c1 := clone(c), writing a value to c will block until there is space available for both c and c1.
What is the buffer size of the result of calling clone? Do you get the same buffer size as the argument? Is clone permitted on unbuffered channels? It could still work though you could only write to the original channel when each clone had a goroutine reading from it.
What happens if you write to the channel that is returned by calling clone? What happens if you read from a channel that is passed to splice? Do the operations panic or simply fail in some way?
|
Presumably after executing What is the buffer size of the result of calling What happens if you write to the channel that is returned by calling |
robpike commentedJul 9, 2018
Here are a couple of suggestions made by Doug McIlroy, original author of test/chan/powser[12].go and instigator of pipes in Unix. They are intriguing.
In Doug's words:
====
splice c1 c2where channel c1 contains no buffered data,
identifies the write end of channel c1 with that
of channel c2. Channel c1 becomes write-only.
clone cmakes a new read-only channel positioned at the
same place in the data stream as readable
channel c. Thereafter both channels read the
same data sequence from the stream at unrelated
rates.
Splice allows a filter to elide itself from a pipeline
when it has no further substantive work to do, rather than
going into a copy loop.
Clone enables (buffered) fanout of a data stream.
Buffered data items may be garbage-collected when they
have been delivered to all readers.
These two capabilities are of general utility in stream
processing. golang.org/test/chan/powser1.go is one
application that could profitably use them--to eliminate
the many instances of Copy and also the per-element
go-routines spawned for every Split. Some Unix variants
have offered pipe splicing, and fanout is a staple of
dataflow algorithms. The current workarounds consume an
awful lot of bootless machine cycles.