Skip to content

proposal: use channels as iterators #48567

@deanveloper

Description

@deanveloper

Establishing a standard for iteration

Related: #43557 and #47707

Problem

With the proposals for the slices and maps packages, as well as other proposals talking about generic data structures (ie container/list and container/set), it has become clear that we need some pattern to iterate over structures.

Common iteration patterns

There are two common iterator patterns from what I have seen.

The first is a pretty standard pattern, aptly named the "Iterator Pattern". This is a pattern such that you return an iterator which is repeatedly called over time, and the iterator will return the next value each time the function is called. This is the iteration pattern that #43557 has recommended for Go. One of the great benefits of this pattern is that it is standard among many languages. However, these iterators tend to be quite cumbersome because the writer of the iterator needs to manage state over time, which can be extremely difficult, especially when dealing with complex data structures like maps, or recursive data structures like trees.

The second pattern is a bit more modern and is typically named the "Generator Pattern". In this pattern, the iterator is a single function which is passed a "yield" function (or in many languages, a yield keyword which may be only be used in Generator Functions). Each time yield is called, control goes to the caller of the iterator, and then control is given back to the iterator again, until yield is called again, or until the iterator ends. The amazing thing about the generator pattern, is that the iterator is extremely easy to write, and it ends up looking almost like a solution where one would simply append to a slice (and return the slice in the end). This is the pattern which #47707 recommends.

Using channels as iterators

In Go, we actually already have a form of iterating using the generator pattern: goroutines and channels. This can be done by making a function which returns a read-only channel. This function creates a goroutine, and sends values on the channel. It would look something like this, which is valid Go code today:

func IntRangeIter(from, to int) <-chan int {
	ch := make(chan int)
	go func() {
		defer close(ch)

		for i := from; i < to; i++ {
			ch <- i
		}
	}()
	return ch
}

func main() {
	for i := range IntRangeIter(0, 5) {
		fmt.Printf("%d, ", i)
	}
	fmt.Println()
}
// Output: 0, 1, 2, 3, 4, 

There are a few problems with this pattern in current Go. The main issue is that we must exhaust the iterator channel in order for the spawned goroutine to be destroyed. Otherwise, the goroutine in the iterator will simply hang on ch <- i forever.

The second issue is performance. Channels in Go unfortunately have a lot of overhead. In my own testing, recursive iterators took ~500x longer to use channels to iterate over a binary tree (compared to calling a function on each element). Iterating over a slice instead of a tree, it took ~100x longer to use channels.

Optimizations can definitely be made though, as using Javascript's generator pattern (which also uses coroutines in some form) can iterate over the binary tree in ~10x longer than calling a function on each element in Go.

Proposal

This proposal has three parts:

  1. Establish a standard of using channels for iterators in the standard library.
  2. Reconsider proposal: runtime: garbage collect goroutines blocked forever #19702, and garbage-collect goroutines which are blocked forever.
  3. Add optimizations for goroutines/channels so that using them as iterators does not cost so much time.

For part 1, this would mean we take actions such as adding Iter() <-chan T for container/list and container/set.

For part 2, we reconsider #19702 (such that cleanup is not done - the goroutine vanishes when it is GC'd). This allows us to spawn these goroutines, and the caller of the iterator does not need to communicate to the iterator that we are done iterating.

Part 3 is likely the least important of the three, but it is still extremely important. Currently, this pattern is two orders of magnitude slower than calling a function on each element. Languages (Kotlin, Rust) are already adopting this pattern and use a coroutine implementation, but do not see the same immense performance hits that Go does.

The wonderful thing about this update, is that nothing about the language itself needs to change. Tooling does not need to be updated, as channels are already range-able.

Example

func FibonacciIter() <-chan int {
	ch := make(chan int)

	go func() {
		defer close(ch)

		a, b := 0, 1
		for {
			ch <- a

			c := a + b
			a, b = b, c
		}
	}()

	return ch
}

// Usage
func main() {
	//          0
	//       1     4
	//      2 3   5
	t := TreeOf[int](0, 1, 2, 3, 4, 5)
	for i := range t.InOrderIter() {
		fmt.Printf("%d, ", i)
	}
	fmt.Println()
}
// Output: 2, 1, 3, 0, 5, 4, 

Other solutions

Many solutions were discussed in #43557, and I highly recommend checking them out. Here are a couple other solutions that also involve solutions to allow using channels as iterators:

  • Adding runtime.Deadlocked() to tell if the current goroutine is deadlocked.

This is my second favorite solution behind allowing goroutines to be GC'd.

Alternatively, this could be unexported, and imported (via //go:linkname) in something like chans.Generator. This would give the generator function a yield function, which would select between runtime.deadlocked() and sending on the iterator channel.

chans.Generator would look something like this:

//go:linkname deadlocked runtime.deadlocked
func deadlocked() <-chan struct

// Creates an iterator based off of a generator function.
func Generator(generator func(yield func(int))) <-chan int {
	ch := make(chan int)

	go func() {
		defer close(ch)
		generator(func(incoming int) {
			select {
			case ch <- incoming:
			case deadlocked():
				runtime.Goexit()
			}
		})
	}()
	return ch
}

// usage
func FibonacciIter() <-chan int {
	return chans.Generator(func (yield func(int)) {
		a, b := 0, 1
		for {
			yield(a)

			c := a + b
			a = b
			b = c
		}
	})
}
  • Using finalizers to allow us to close the channel.

Finalizers are a bit hacky, but it somewhat solves our problem. There is a cool example that @ianlancetaylor shows in the generics draft about Rangers. Unfortunately, these are not actually rangeable, but it does show that we could possibly use a finalizer to clean up a goroutine.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions