Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: clone and splice, new channel primitives #26282

Open
robpike opened this issue Jul 9, 2018 · 32 comments

Comments

@robpike
Copy link
Contributor

commented Jul 9, 2018

Here are a couple of suggestions made by Doug McIlroy, original author of test/chan/powser[12].go and instigator of pipes in Unix. They are intriguing.

In Doug's words:

====

splice c1 c2
where channel c1 contains no buffered data,
identifies the write end of channel c1 with that
of channel c2. Channel c1 becomes write-only.

clone c
makes a new read-only channel positioned at the
same place in the data stream as readable
channel c. Thereafter both channels read the
same data sequence from the stream at unrelated
rates.

Splice allows a filter to elide itself from a pipeline
when it has no further substantive work to do, rather than
going into a copy loop.

Clone enables (buffered) fanout of a data stream.
Buffered data items may be garbage-collected when they
have been delivered to all readers.

These two capabilities are of general utility in stream
processing. golang.org/test/chan/powser1.go is one
application that could profitably use them--to eliminate
the many instances of Copy and also the per-element
go-routines spawned for every Split. Some Unix variants
have offered pipe splicing, and fanout is a staple of
dataflow algorithms. The current workarounds consume an
awful lot of bootless machine cycles.

@robpike robpike added the Proposal label Jul 9, 2018

@gopherbot gopherbot added this to the Proposal milestone Jul 9, 2018

@urandom

This comment has been minimized.

Copy link

commented Jul 10, 2018

Are there any more details on how the clone works? Say that we have the following code:

var parent, cl chan int

go func() {
    for {
          select {
          case i := <-parent:
                fmt.Println(i)
          }
    }
}()

go func() {
    for {
          select {
          case i := <-cl:
               fmt.Println("Clone", i)
               if i == 3 { return }
          }
    }
}()

for i := 0; i <5; i++ {
     if i == 2 {
             cl = clone parent
     }
     parent <- i
}

Would that print something like:

0
1
2
Clone 2
3
Clone 3
4

I.e. would writing 4 to parent block since reading from cl stopped after receiving 3? How would things change if parent was buffered, would it matter at all? What if parent wasn't read from at all, would writing to it block if there are active clones?

Also, I have no idea what splice does from the description :)

@kardianos

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2018

If I read it right, splice c1 c2 makes c1 point to c2 so a write to c1 can be read from c2.

@urandom

This comment has been minimized.

Copy link

commented Jul 10, 2018

@kardianos
So this is useful for cases where you want to weave two 'streams' into one, as it work. I.e. you read from c2, and write to both c1 and c2 and get the values from both channels while you read, as they are written? Or the reverse of clone as it were?

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jul 10, 2018

edit: discussion moved to #26343

I'd like to add drain(c) for consideration.

drain(c) would be roughly equivalent to the current idiom of

go func() {
  for range c {
  }
}()

except that drain doesn't create a goroutine: it just has c discard any receives in the scheduler without blocking the sender and causes other attempts to receive from c to panic. It's kind of like close for readers.

The code it replaces follows a fairly simple pattern but, even if the compiler recognized it, it wouldn't be able to perform the same optimization as that relies on the certainty that c can no longer be read from in another goroutine.

This is less useful than splice or clone but this seems like a good place to talk about additional channel primitives that would be more efficient and less fragile as builtins.

Also, if the drained channel was created by clone it could be removed from the fan out logic since the runtime knows that no one's listening, so it is somewhat more related than it appears.

@neild

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

Would splice c1 c2 require that c1 and c2 have the same element type, or merely compatible ones? Can I splice a chan int64 into a chan time.Duration? A chan int32?

@robpike

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2018

I would expect that c1 and c2 would need identical channel types.

@urandom

This comment has been minimized.

Copy link

commented Jul 11, 2018

@jimmyfrasche
When a channel is set to be drained, would that include a special syntax on the write side that would inform the writer that the channel is being drained and no longer processing the data?

if drained := c <- someData; drained {
    return
}
@urandom

This comment has been minimized.

Copy link

commented Jul 11, 2018

Also, if the channel being drained is a clone, would the stream automatically stop sending to it?

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

Never said this explicitly: It would only be safe to use drain when there was a single reader (possibly of a cloned chan) the same way it's only safe to close a chan when there's a single writer.

@urandom drain(clone(c)) would cancel out. Whatever runtime voodoo handles the fan out necessary to support clone could immediately drop any drained channel. That's just an optimization, but it seems like a useful/easy one.

I suppose you would need to detect a drained channel. There's something disquieting about making the send statement into an expression, but detection would need to be paired with a send.

If you couldn't detect it, something like

go func() {
  for id := 1;; id++ {
    c <- id
  }
}()

would just spin indefinitely. Though that is the case now if you create a drain goroutine manually. If this example detected a drain, it could just return.

That makes me wonder about something like

for {
  select {
  case drainedChan <- 1:
  case regularChan <- 2:
  }
}

drainedChan can always send but regularChan might not be able to. To keep the semantics you'd need to hit the drainedChan case but you may need to prioritize sends to regularChan since it's the only one doing work outside of the goroutine. In this example, if you could detect that the chan was drained, you could set the chan to nil.

(I'm still waiting for someone smarter than me to explain why this is a terrible idea, though 😆 )

@neild

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

Channels currently pass data in only one direction. Being able to detect a drained channel would make that bidirectional. It seems simpler and more consistent for a drain to be precisely equivalent to go func() { for _ = range ch { } )() (aside from not creating a new goroutine).

@robpike

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2018

This has become a discussion about an operation not even mentioned by the proposer. If you want to talk about drain, which is a fine thing to talk about, please open a new issue for that.

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

@robpike sorry for the hijack. To return to the original proposal, the definition of clone seems to imply infinitely-buffered channels.

I would have expected cloning a channel to behave something more like:

For an unbuffered chan, c, c <- v blocks until c and all clones of c have received v.

For a buffered chan, c, c <- v blocks if c or any clones of c have filled their buffer.

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jul 12, 2018

What happens if I do this:

splice(c1, c2)
close(c1)
c2 <- v

Do receives on c2 always succeed instantly getting a zero value unless there's a blocked send on c2?

Does c2 close as well?

Does it take c1 out of the equation leaving c2 as it was before the splice, effectively unsplicing the two?

Does it just panic?

@bakul

This comment has been minimized.

Copy link

commented Jul 15, 2018

Splice:
The use case of splice seems to be this: a filter that receives data from c1 and sends some subset of, or transformed, data out to c2. If at some point the filter just wants to passthrough data, doing no filtering, splice c1 c2 would patch through the data. That is anything sent to c1 can be read from c2, allowing the filter goroutine to get out of the way (or terminate). If so, any buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).

Clone:
Is c1 := clone c0; c2 := clone c1 equivalent to c1 := clone c0; c2 :- clone c0? Assuming no intervening reads.

Given a channel with an N item bluffer, the leading item can not be discarded until all the receiver pick it up, right? Would it make sense to add an operation to expand buffering?

This is not strictly a clone operation since not all the functionality is cloned. It is more of a fanout. There can be a corresponding fanin (which sort of overlaps with the splice operation). Conceptually:

cA, Cb, Cc, ... := fanout cR
fanin cA, cB, Cc .. cW

For fanout, cR can be a readonly channel. For fanin, cW can be a write only channel.
Fanout from cR means anything written to the other end of cR can be read from cA, cB, cC etc.
Fanin to cW means anything written to cA, cB, cC etc. can be read from the other end of cW.

An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel). I can envision

cIn, cOut := unsplice c
go myFilter(cIn, cOut)

The effect is as follows:

before: producer =>c=> consumer
after:  producer =>c ~~ cIn=> filter =>cOut ~~ c=> consumer

The reader and writer of c need not be aware of the existence of the filter tap. Any buffered data in c at the time of unsplice can be read from cIn by the filter (and not by the consumer of c), since a decision to insert a filter can be triggered based on received items from c.

PS:

  1. I used fanin & fanout to make the symmetry explicit. I don't care what actual keywords get used.
  2. I think Go should've generated read and write end explicitly (just like unix pipe(2)). Something like cIn, cOut := make(chan int, ...). Too late now. Another useful function may have been to allow creating bidirectional channels. E.g. c1, c2 := make(chan int, 5, "bidirectional") so that writes on c1 get read from c2 and vice versa. This would make unsplice even more useful (e.g. inserting an protocol or encryption module based on negotiation).
  3. Would've preferred it if these things were possible to implement in user code, rather than build in the language.
@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jul 15, 2018

@bakul

[A]ny buffered data in c1 can be copied out to c2 at the time of splice (though that would potentially block splice until there is enough space in c2!).

That does seem reasonable to me and a bit more user-friendly. I have been wondering about that restriction, too. Maybe there's a good reason for it that I'm not seeing.

This is not strictly a clone operation since not all the functionality is cloned.

That's true. Perhaps split would be a more descriptive name. (And split/splice is a fun word combo.)

fanin/fanout

This could be achieved by multiple calls to splice/clone.

An equally interesting operation would be the inverse of splice. Suppose the same filter now needs to be reapplied based on some condition (or you want to tap into a channel).

Could you file a separate proposal for that?

Would've preferred it if these things were possible to implement in user code, rather than build in the language.

The equivalent functionality is possible in user code, but it requires additional channels and goroutines. This proposal would avoid that overhead and make it simpler to build more complex topologies.

@bakul

This comment has been minimized.

Copy link

commented Jul 17, 2018

Could you file a separate proposal for that?

I am just exploring features that may dovetail with this proposal at the moment.

The equivalent functionality is possible in user code, but it requires additional channels and goroutines.

What I was getting at is to be able to implement something similar in user code without needing additional channels or goroutines. The proposed changes seem non-trivial and are perhaps better suited as a library component of some sort. Plus I would like to be able to avoid context switching overhead for simple but useful operations. For example:

func sum(a, b <-chan int) <-chan int {
    c := make(chan int, ...)
    go func() for {  c <- (<- a) + (<- b) }()
    return c
}

[yeah, I know this has problems; the code is just to illustrate the idea]
Conceptually this can be implemented without a goroutine: a receive on c will result in a receive on a nd b and if either blocks, so does the receive on c. The receive returns when both a and b yield a value and a sum can be written into c. Symmetrically, if a value is written into a or b, the sum can be done apriori and written into c. Syntax wise c := a + b captures this perfectly but I don't expect any Go authors to go for that!

IMHO a pipe is a better abstraction than an iterator in other languages and it would be nice if it is available without needing to use goroutines. Since this proposal started from a suggestion by Doug McIlroy, I couldn't help wondering if we can make Go channels as easy to use as pipes are in a shell!

On the other hand I can't think of a lower level building block to do this as yet.

@CAFxX

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2018

Regarding splice-like functionality: I have sometimes encountered the case in which I have two channels, one from which I'm supposed to read and another I'm supposed to write to but -and here's the tricky bit- only when the two operations (receive and send) are not going to block. Something similar to this, but in the following example the send on writeCh could block so it's not quite what I needed:

for {
  select {
  // ... other cases ...
  case e := <-readCh:
    // ...optionally drop/manipulate e here...
    writeCh <- e
  }
}

There are some ways in which this can be written to satisfy the non-blocking requirements but they tend to get hairy and in any case they (in general) require buffering at least one element (even if both channels are unbuffered) or creating dedicated "channel pump" goroutines.

I am not sure if this is a common enough scenario, but I wanted to submit it for consideration anyway because, as I mentioned above, I found myself in this situation a couple of times. I am aware that refactoring the code would solve the problem: to my credit, in both cases, when I encountered this the channels were provided by libraries, so there wasn't much I could do.

@pepijndevos

This comment has been minimized.

Copy link

commented Jul 18, 2018

I've ran into the exact problem @CAFxX described, and indeed solved it by completely refactoring everything. But I think it'd be really useful to have completely unbuffered pipelines. May I suggest the following extension to splice:

splice a b func

Which would be of type

chan a, chan b, func(b) a

This essentially imlements a completely unbuffered map operation on channels. Maybe that opens a whole new can of worms about things like filter and reduce, but nonetheless I think it is useful to be able to write pipelines without implicit buffers.

@bcmills

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

This essentially [implements] a completely unbuffered map operation on channels.

Polymorphic functions (#15292) seem like a better fit for that than broad built-in operators.

The operators that @robpike proposes interact with channel internals in an interesting way: they potentially allow the runtime to alias buffers between channels.

A splice variant that feeds through a transform function, in contrast, requires an extra allocation for each channel (to store the transformation function) and does not allow repeated operations to collapse: you can't skip any function in the chain. That makes it different enough that I wouldn't want to call it the same operator.

@j7b

This comment has been minimized.

Copy link

commented Aug 26, 2018

The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?), and generally unnecessary, for example https://play.golang.org/p/YaqdWEwcxQa eliminates Copy by giving rat and channels thereof a common interface. On a related note, the operations on series as implemented don't need a split that spawns goroutines, at most one term of a source series needs to be "buffered" and the reads happen in a well-defined order. The common interface was inspired by a footnote in "Squinting at Power Series" and the inference about terms and order is from same (I feel indebted to the author, his work made objecting to this proposal one of the most challenging (the paper's well above my pay grade) and rewarding experiences I've had as a humble student of computer science).

The clone operation as proposed makes a key property of channels, synchronization, ambiguous. If a shared buffer is what's desired, a channel is a great way to synchronize access to it, as the original implementation of Split does (where the buffer is basically a linked list of goroutine stacks guarded by channels used as mutexes). Arbitrarily large channel capacity and demand-based allocation of channel buffers could also be a useful alternative to clone, #20352 proposes explicit creation of channels of infinite capacity.

My thanks again to the proposer and for powser[12].go.

@networkimprov

This comment has been minimized.

Copy link

commented Sep 15, 2018

EDIT:
Could clone be called tee? Clone sounds like a variation of make that duplicates the buffer. And it's easily misread as close.

@j7b for "demand-based allocation of channel buffers" see #20868

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

Presumably after executing c1 := clone(c), writing a value to c will block until there is space available for both c and c1.

What is the buffer size of the result of calling clone? Do you get the same buffer size as the argument? Is clone permitted on unbuffered channels? It could still work though you could only write to the original channel when each clone had a goroutine reading from it.

What happens if you write to the channel that is returned by calling clone? What happens if you read from a channel that is passed to splice? Do the operations panic or simply fail in some way?

@jdef

This comment has been minimized.

Copy link

commented Oct 17, 2018

Similar, but maybe not the exact same problem (re: splice) that's cropped up a few of my projects: I want to select from an arbitrary number of chans. It's actually possible, and ugly, to do this via the reflect package by aggregating SelectCase's and invoking reflect.Select. Did I mention that it's ugly? It would be more natural to express along these lines:

select {
case <-done:
case signal := <-control:
case i, v, ok := <-sliceOfIdenticallyTypedChans:
  // i, v, ok are the same as returned by `reflect.Select`
  // https://golang.org/pkg/reflect/#Select
}
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 17, 2018

@jdef Thanks, but that is really a different problem that should be discussed on a different issue. This one is about splice and clone.

@wsc1

This comment has been minimized.

Copy link

commented Oct 18, 2018

splice c1 c2
where channel c1 contains no buffered data,
identifies the write end of channel c1 with that
of channel c2. Channel c1 becomes write-only.

How does one determine that c1 contains no buffered data? maybe the tieing of write ends should happen after buffered data made it out or be restricted to unbuffered channels (which would be simpler but more limited in applicable use cases).

Also, for use case of temporarily pass-through copying a transformer/filter, one would need to undo the splice somehow.

@wsc1

This comment has been minimized.

Copy link

commented Oct 19, 2018

Also, for use case of temporarily pass-through copying a transformer/filter, one would need to undo the splice somehow.

Actually, one could add

splice c1 c2 N

which would block for N elements passing through and then return with c1 and c2 back to normal.

@wsc1

This comment has been minimized.

Copy link

commented Oct 19, 2018

In Doug's words:

====

splice c1 c2
where channel c1 contains no buffered data,
identifies the write end of channel c1 with that
of channel c2. Channel c1 becomes write-only.

I find the term splice a bit off, it's more of a fuse or tie. To me, splice has meaning for magnetic tape, where a piece of one roll of tape is placed into some position of another.

@RalphCorderoy

This comment has been minimized.

Copy link

commented Oct 19, 2018

Hi @wsc1, Long before magnetic tape, splice was used for weaving the ends of two ropes together to become one: Splice the main brace! It seems apt, and is in use for a similar operation in Linux's splice(2). https://en.wikipedia.org/wiki/Splice

@wsc1

This comment has been minimized.

Copy link

commented Oct 19, 2018

The splice operation is a dangerous notion for a type that's intended to be shared between goroutines (how can c1 become write-only without invalidating declarations?),

If splice c1 c2 just functions as a return from a goroutine, I don't see how this could be a problem

similarly if splice c1 c2 N blocks.

@wsc1

This comment has been minimized.

Copy link

commented Oct 19, 2018

The clone operation as proposed makes a key property of channels, synchronization, ambiguous.

As initially stated, yes, but I think @jimmyfrasche's take:

I would have expected cloning a channel to behave something more like:

For an unbuffered chan, c, c <- v blocks until c and all clones of c have received v.

For a buffered chan, c, c <- v blocks if c or any clones of c have filled their buffer.

fixes the ambiguity (and potentially infinite buffer size) nicely.

@wsc1

This comment has been minimized.

Copy link

commented Oct 19, 2018

Hi @wsc1, Long before magnetic tape, splice was used for weaving the ends of two ropes together to become one: Splice the main brace! It seems apt, and is in use for a similar operation in Linux's splice(2). https://en.wikipedia.org/wiki/Splice

I had forgotten the nautical sense. Thanks, it does indeed seem like a good name in light of that.

@wsc1

This comment has been minimized.

Copy link

commented Oct 19, 2018

Given that these ideas for splice and clone are in part motivated by stream processing chains, perhaps it makes sense to consider them in conjunction with making channels have an extra parameter to specify a block size in addition to channel capacity? Would others consider this issue an appropriate place to discuss this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.