Skip to content

discussion: standard iterator interface #54245

discussion: standard iterator interface #54245
Aug 4, 2022 · 84 comments · 465 replies

This is a discussion that is intended to lead to a proposal.

This was written with lots of input from @jba and @rsc.

Background

Most languages provide a standardized way to iterate over values stored in containers using an iterator interface (see the appendix below for a discussion of other languages). Go provides for range for use with maps, slices, strings, arrays, and channels, but it does not provide any general mechanism for user-written containers, and it does not provide an iterator interface.

Go does have examples of non-generic iterators:

  • runtime.CallersFrames returns a runtime.Frames that iterates over stack frames; Frames has a Next method that returns a Frame and a bool that reports whether there are more frames.
  • bufio.Scanner is an iterator through an io.Reader, where the Scan method advances to the next value. The value is returned by a Bytes method. Errors are collected and returned by an Err method.
  • database/sql.Rows iterates through the results of a query, where the Next method advances to the next value and the value is returned by a Scan method. The Scan method can return an error.

Even this short list reveals that there are no common patterns in use today. This is in part because before generics were introduced, there was no way to write an interface that described an iterator. And of course there may be no simple pattern that will cover all of these use cases..

Today we can write an interface Iter[E] for an iterator over a container that has elements of type E. The existence of iterators in other languages shows that this is a powerful facility. This proposal is about how to write such an interface in Go.

What we want from Go iterators

Go is of course explicit about errors. Iterators over containers can't fail. For the most common uses it doesn't make any more sense to have iterators return an error than it does for a for range statement to return an error. Algorithms that use iterators should often behave differently when using iterators that can fail. Therefore, rather than try to combine non-failing and failing iterators into the same interface, we should instead return explicit errors from iterators that can fail. These errors can be part of the values returned by the iterator, or perhaps they can be returned as additional values.

Iterators have two fundamental operations: retrieve the current value, and advance to the next value. For Go we can combine these operations into a single method, as runtime.Frames does.

In the general case we may want to implement iterators with some additional state that is not trivially garbage collected, such as an open file or a separate goroutine. In C++, for example, this state would be cleared up by a destructor, but of course Go does not have destructors. Therefore, we should have some explicit way to indicate that we no longer need an iterator. This should be optional, as many iterators do not require any special cleanup. We should encourage iterators to use finalizers if necessary to clean up resources, and also to clean up after themselves when reaching the end of an iteration.

In Go the builtin type map permits values to be inserted and removed while iterating over the map, with well-defined behavior. In general for Go we should be flexible, though of course the program should never simply crash. We should let each container type define how it behaves if the container is modified while iterators are active. For example, container modification may cause arbitrary elements to be skipped or returned two or more times during the iteration. In some cases, hopefully rare, container modification may cause uses of existing iterators to panic, or to return values that have been removed from the container.

Proposal

We define a new package iter that defines a set of interfaces. The expectation is that containers and other types will provide functions and methods that return values that implement these interfaces. Code that wants to work with arbitrary containers will use the interfaces defined in this package. That will permit people to write functions that work with containers but are agnostic to the actual container type being used, much as interfaces like io.Reader permit code to be agnostic as the source of the data stream.

iter.Iter

The core interface in the iterators package is iter.Iter[E].

// Iter supports iterating over a sequence of values of type `E`.
type Iter[E any] interface {
	// Next returns the next value in the iteration if there is one,
	// and reports whether the returned value is valid.
	// Once Next returns ok==false, the iteration is over,
	// and all subsequent calls will return ok==false.
	Next() (elem E, ok bool)
}

We also define a related interface for containers, such as maps, for which elements inherently have two values.

// Iter2 is like Iter but each iteration returns a pair of values.
type Iter2[E1, E2 any] interface {
	Next() (E1, E2, bool)
}

An iterator that can fail will either return a single value that includes an error indication, or it will implement Iter2[E, error]. It's not yet clear which of those options is better.

As mentioned above, some iterators may have additional state that may be discarded when no more values are expected from an iterator (for example, a goroutine that sends values on a channel). Telling the iterator that no more values are expected is done using an optional interface that an iterator may implement.

// StopIter is an optional interface for Iter.
type StopIter[E any] interface {
	Iter[E]

	// Stop indicates that the iterator will no longer be used.
	// After a call to Stop, future calls to Next may panic.
	// Stop may be called multiple times;
	// all calls after the first will have no effect.
	Stop()
}

// StopIter2 is like StopIter, but for Iter2.
type StopIter2[E1, E2 any] interface {
	Iter2[E1, E2]
	Stop()
}

The Stop method should always be considered to be an optimization. The program should work correctly even if Stop is never called. If an iterator is read to the end (until Next returns false) calling Stop should be a no-op. If necessary, iterator implementations should use finalizers to clean up cases where Stop is not called.

As a matter of programming style, the code that calls a function to obtain a StopIter is responsible for calling the Stop method. A function that accepts an Iter should not use a type assertion to detect and call the Stop method. This is similar to the way that a function that accepts an io.Reader should not use a type assertion to detect and call the Close method.

iter.New functions

iter.Iter provides a convenient way for the users of a container to iterate over its contents. We also want to consider the other side of that operation, and provide convenient ways for containers to define iterators.

// NewGen creates a new iterator from a generator function gen.
// The gen function is called once.  It is expected to call
// yield(v) for every value v to be returned by the iterator.
// If yield(v) returns false, gen must stop calling yield and return.
func NewGen[E any](gen func(yield func(E) bool)) StopIter[E]

// NewGen2 is like NewGen for Iter2.
func NewGen2[E1, E2 any](gen func(yield func(E1, E2) bool)) StopIter2[E1, E2]

An appendix below discusses how these functions can be implemented efficiently.

Simpler containers may be able to easily capture all required state in a function.

// NewNext creates a new iterator from a next function.
// The next function is called for each call of the iterator's Next method.
func NewNext[E any](next func (E, bool)) Iter[E]

// NewNext2 is like NewNext for Iter2.
func NewNext2[E1, E2 any](next func (E1, E2, bool)) Iter2[E1, E2]

iterators for standard containers

The iter package will define iterators for the builtin container types.

// FromChan returns an iterator over a channel.
func FromChan[E any](<-chan E) Iter[E]

// FromMap returns an iterator over a map.
func FromMap[K comparable, V any](map[K]V) Iter2[K, V]

// FromSlice returns an iterator over a slice.
func FromSlice[E any]([]E) Iter[E]

Functions that accept iterators

The iter package could define functions that operate on iterators. We should be conservative here to start. It's not yet clear which of these functions will be useful.

// Map returns a new iterator whose elements are f applied to
// the elements of it.
func Map[E1, E2 any](f func(E1) E2, it Iter[E1]) Iter[E2]

// Filter returns a new iterator whose elements are those
// elements of it for which f returns true.
func Filter[E any](f func(E) bool, it Iter[E]) Iter[E]

// Reduce uses a function to reduce the elements of an
// iterator to a single value.  The init parameter is
// passed to the first call of f.  If the input iterator
// is empty, the result is init.
func Reduce[E1, E2 any](f func(E2, E1) E2, it Iter[E1], init E2) E2

// ToSlice collects the elements of the iterator into a slice.
// [ Perhaps this should be slices.FromIter. ]
func ToSlice[E any](it Iter[E]) []E

// ToMap collects the elements of the iterator into a map.
// [ Perhaps this should be maps.FromIter. ]
func ToMap[K comparable, V any](it Iter2[K, V]) map[K]V

// Concat returns the concatenation of two iterators.
// The resulting iterator returns all the elements of the
// first iterator followed by all the elements of the second.
func Concat[E any](it1, it2 Iter[E]) Iter[E]

Range loops

The for range syntax will be expanded to support iterators. Note that this is the only language change in this proposal. Everything else is library code and programming conventions.

If the argument to range implements Iter[E] or Iter2[E1, E2], then the loop will iterate through the elements of the iterator. For example, this code:

for e := range it {
	// statements
}

will be equivalent to this code:

for e, _ok := it.Next(); _ok; e, _ok = it.Next() {
	// statements
}

Here _ok is a hidden variable that is not seen by user code.

Note that breaking out of the loop will leave the iterator at that position, such that Next will return the next elements that the loop would have seen.

Using range with an Iter2[E1, E2] will permit using two variables in the for statement, as with range over a map.

Compatibility note: if the type of it is a slice, array, pointer-to-array, string, map, or channel type, then the Next method will be ignored and the for range will operate in the usual way. This is required for backward compatibility with existing code.

Because it's inconvenient to write for v := range c.Range(), we propose a further extension: we permit range c if c has a method Range that returns a value that implements Iter[E] or Iter2[E]. If the Range method implements StopIter[E] or StopIter2[E] then the range loop will ensure that Stop is called when exiting the loop. (Here whether the result implements StopIter is a static type check, not a dynamic type assertion: if the Range method returns the type Iter[E], Stop will not be called even if the actual type has a Stop method.)

For example:

for v := range c {
	// statements
}

where c.Range returns a value that implements StopIter[E], is roughly equivalent to:

_it := c.Range()
defer _it.Stop()
for e, _ok := it.Next(); _ok; e, _ok = it.Next() {
	// statements
	// Any goto L or continue L statement where L is outside the loop
	// is replaced by
	//   _it.Stop(); goto L (or continue L)
}
_it.Stop()

The compiler will arrange for _it.Stop to be called if the loop statements panic, even if some outer defer in the function recovers the panic. That is, the defer in the roughly equivalent code is run when leaving the loop, not just when leaving the function.

Note that if we adopt this change it will be the first case in which a language construct invokes a user-defined method.

That is all

That completes the proposal.

Optional future extensions

We can use optional interfaces to extend the capabilities of iterators.

For example, some iterators permit deleting an element from a container.

// DeleteIter is an Iter that implements a Delete method.
type DeleteIter[E any] interface {
	Iter[E]

	// Delete deletes the current iterator element;
	// that is, the one returned by the last call to Next.
	// Delete should panic if called before Next or after
	// Next returns false.
	Delete()
}

We could then implement

// Delete removes all elements from it which f returns true.
func Delete[E any](it DeleteIter[E], f func(E) bool)

Similarly some iterators permit setting a value.

// SetIter is an Iter that implements a Set method.
type SetIter[E any] interface {
	Iter[E]

	// Set replaces the current iterator element with v.
	// Set should panic if called before Next or after
	// Next returns false.
	Set(v E)
}

We could then implement

// Replace replaces all elements e with f(e).
func Replace[E any](it SetIter[E], f func(e) E)

Bi-directional iterators can implement a Prev method.

// PrevIter is an iterator with a Prev method.
type PrevIter[E any] interface {
	Iter[E]

	// Prev moves the iterator to the previous position.
	// After calling Prev, Next will return the value at
	// that position in the container. For example, after
	//   it.Next() returning (v, true)
	//   it.Prev()
	// another call to it.Next will again return (v, true).
	// Calling Prev before calling Next may panic.
	// Calling Prev after Next returns false will move
	// to the last element, or, if there are no elements,
	// to the iterator's initial state.
	Prev()
}

This is just a sketch of possible future directions. These ideas are not part of the current proposal. However, we want to deliberately leave open the possibility of defining additional optional interfaces for iterators.

Examples

This is some example code showing how to use and create iterators. If a function in this section is not mentioned above, then it is purely an example, and is not part of the proposal.

// ToSlice returns a slice containing all the elements in an iterator.
// [ This might be in the slices package, as slices.FromIter. ]
func ToSlice[E any](it iter.Iter[E]) []E {
	var r []E
	for v := range it {
		r = append(r, v)
	}
	return r
}

// ToSliceErr returns a slice containing all the elements
// in an iterator, for an iterator that can fail.
// The iteration stops on the first error.
// This is just an example, this may not be the best approach.
func ToSliceErr[E any](it iter.Iter2[E, error]) ([]E, error) {
	var r []E
	for v, err := range it {
		if err != nil {
			return nil, err
		}
		r = append(r, v)
	}
	return r
}

// Map returns a new iterator that applies f to each element of it.
func Map[E1, E2 any](f func(E1) E2, it Iter[E1]) Iter[E2] {
	return iter.NewNext(func() (E2, bool) {
		e, ok := it.Next()
		var r E2
		if ok {
			r = f(e)
		}
		return r, ok
	})
}

// Filter returns a new iterator that only contains the elements of it
// for which f returns true.
func Filter[E any](f func(E) bool, it Iter[E]) Iter[E] {
	return iter.NewNext(func() (E, bool) {
		for {
			e, ok := it.Next()
			if !ok || f(e) {
				return e, ok
			}
		}
	})
}

// Reduce reduces an iterator to a value using a function.
func Reduce[E1, E2 any](f func(E2, E1) E2, it Iter[E1], init E2) E2
	r := init
	for v := range it {
		r = f(r, v)
	}
	return r
}

// iter.FromSlice returns an iterator over a slice.
// For example purposes only, this iterator implements
// some of the optional interfaces mentioned earlier.
func FromSlice[E any](s []E) Iter[E] {
	return &sliceIter[E]{
		s: s,
		i: -1,
	}
}

type sliceIter[E any] struct {
	s []E
	i int
}

func (it *sliceIter[E]) Next() (E, bool) {
	it.i++
	ok := it.i >= 0 && it.i < len(it.s)
	var v E
	if ok {
		v = it.s[it.i]
	}
	return v, ok
}

// Prev implements PrevIter.
func (it *sliceIter[E]) Prev() [
	it.i-
}

// Set implements SetIter.
func (it *sliceIter[E]) Set(v E) {
	it.s[it.i] = v
}

// FromChan returns an iterator for a channel.
func FromChan[E any](c <-chan E) Iter[E] {
	return iter.NewNext(func() (E, bool) {
		v, ok := <-c
		return v, ok
	})
}

// NewNext takes a function that returns (v, bool) and returns
// an iterator that calls the function until the second result is false.
func NewNext[E any](f func() (E, bool)) Iter[E] {
	return funcIter[E](f)
}

// funcIter is used by NewNext to implement Iter.
type funcIter[E any] func() (E, bool)

// Next implements Iter.
func (f funcIter[E]) Next() (E, bool) {
	return f()
}

// Equal reports whether two iterators have the same values
// in the same order.
func Equal[E comparable](it1, it2 Iter[E]) bool {
	for {
		v1, ok1 := it1.Next()
		v2, ok2 := it2.Next()
		if v1 != v2 || ok1 != ok2 {
			return false
		}
		if !ok1 {
			return true
		}
	}
}

// Merge takes two iterators that are expected to be in sorted order,
// and returns a new iterator that merges the two into a single
// iterator in sorted order.
func MergeIter[E constraints.Ordered](it1, it2 iter.Iter[E]) iter.Iter[E] {
	val1, ok1 := it1.Next()
	val2, ok2 := it2.Next()
	return &mergeIter[E]{
		it1:  it1,
		it2:  it2,
		val1: val1,
		ok1:  ok1,
		val2: val2,
		ok2:  ok2,
	}
}

type mergeIter[E constraints.Ordered] struct {
	it1, it2   iter.Iter[E]
	val1, val2 E
	ok1, ok2   bool
}

func (m *mergeIter[E]) Next() (E, bool) {
	var r E
	if m.ok1 && m.ok2 {
		if m.val1 < m.val2 {
			r = m.val1
			m.val1, m.ok1 = m.it1.Next()
		} else {
			r = m.val2
			m.val2, m.ok2 = m.it2.Next()
		}
		return r, true
	}
	if m.ok1 {
		r = m.val1
		m.val1, m.ok1 = m.it1.Next()
		return r, true
	}
	if m.ok2 {
		r = m.val2
		m.val2, m.ok2 = m.it2.Next()
		return r, true
	}
	return r, false
}

// Tree is a binary tree.
type Tree[E any] struct {
	val         E
	left, right *Tree[E]
}

// Range returns an in-order iterator over the tree.
// This shows how to use iter.NewGen to iterate over a
// complex data structure.
func (t *Tree[E]) Range() iter.StopIter[E] {
	return iter.NewGen(t.gen)
}

// gen is used by Range.  This is here just because we want
// to return bool from t.iterate but iter.NewGen takes a function
// with no results.
func (t *Tree[E]) gen(yield func(E) bool) {
	t.iterate(yield)
}

// iterate is used by Range.
func (t *Tree[E]) iterate(yield func(E) bool) bool {
	if t == nil {
		return true
	}
	// Keep providing values until yield returns false
	// or we have finished the tree.
	return t.left.iterate(yield) &&
		yield(t.val) &&
		t.right.iterate(yield)
}

// SQLRowsRange returns an iterator over a sql.Rows.
// This shows one way to adapt an existing iteration
// mechanism to the new interface.
// We wouldn't design things this way from scratch.
func SQLRowsRange(r *sql.Rows) iter.StopIter[SQLRowVal] {
	it := &SQLRowsIter{r}
	runtime.SetFinalizer(it, r.Close)
	return it
}

// SQLRowsIter implements iter.Iter[SQLRowVal].
type SQLRowsIter struct {
	r *sql.Rows
}

// Next implements iter.Next.
func (it *SQLRowsIter) Next() (SQLRowVal, bool) {
	ok := it.r.Next()
	var rit SQLRowVal
	if ok {
		rit = SQLRowVal{r}
	} else {
		it.r.Close()
		runtime.SetFinalizer(it, nil)
	}
	return rit, ok
}

// Stop implements iter.StopIter.
func (it *SQLRowsIter) Stop() {
	// We don't care about the error result here.
	// It's never a new error from the close itself,
	// just a saved error from earlier.
	// If the caller cares, they should check during the loop.
	it.r.Close()
	runtime.SetFinalizer(it, nil)
}

// SQLRowVal is an iteration value.
type SQLRowVal struct {
	r *sql.Rows
}

// Err returns any error for the current row.
func (i1 SQLRowVal) Err() error {
	return i1.r.Err()
}

// Scan fetches values from the current row.
func (i1 SQLRowVal) Scan(dest ...any) error {
	return i1.r.Scan(dest...)
}

// Total is an example of how SQLRowsRange might be used.
// Note how the function uses the proposed Map and Reduce functions.
func Total(r *sql.Rows) (int, error) {
	var rowsErr error
	toInt := func(i1 SQLRowVal) int {
		if err := i1.Err(); err != nil {
			rowsErr = err
			return 0
		}
		var v int
		if err := i1.Scan(&v); err != nil {
			rowsErr = err
		}
		return v
	}
	it := SQLRowsRange(r)
	defer it.Stop()
	ints := iter.Map(toInt, it)
	r := iter.Reduce(func(v1, v2 int) int { return v1 + v2 }, ints, 0)
	// Capture an error that was not returned by any iteration, if any.
	if rowsErr == nil {
		rowsErr = r.Err()
	}
	return r, rowsErr
}

Appendix: Iterators in other languages

C++

The C++ Standard Template Library defines a variety of iterator APIs. These are consistently implemented by C++ containers and are also used by other types such as files and streams. This makes it possible to write standard algorithms that work with all C++ containers.

C++ containers provide begin and end methods that return iterators. The begin method returns an iterator that refers to the beginning of the container. The end method returns an iterator that refers to the position just past the end of the container. Iterators to the same container may be compared for equality using the == and != operators. Any valid iterator (not the iterator returned by end) refers to a value in the container. That value is accessible using the unary * operator (which is the pointer dereference operator, thus iterators act like pointers into the container, and ordinary pointers act like iterators). The unary ++ operator advances the iterator to refer to the next element in the container. For any C++ container one can loop over all elements in the container by writing

  for (containerType::iterator p = c.begin(); p != c.end(); ++p)
    doSomething(*p);

As of C++11 this pattern is built into the language via the range-based for loop.

  for (auto&& var : container)
    doSomething(var);

This calls the begin and end methods of the container and loops as shown above.

Some C++ iterators have optional additional capabilities. Iterators can be grouped into five types.

  • Input iterators support the operations described above. They can be used to do a single sequential pass over a container. Example: an iterator that reads values from a file.
  • Output iterators permit setting a value through the iterator (*p = v), but do not permit retrieving it. Example: an iterator that writes values to a file.
  • Forward iterators support both input and output operations. Example: an iterator over a singly linked list.
  • Bidirectional iterators additionally support the unary -- operator to move to the preceding element. Example: an iterator over a doubly linked list.
  • Random access iterators additionally support adding or subtracting an integer, and getting the difference between two iterators to get the number of values between them, and comparing two iterators using < and friends, and indexing off an iterator to refer to a value. Example: an iterator over a slice (which C++ calls a vector).

C++ algorithms can use function overloading to implement the same algorithm in different ways depending on the characteristics of the iterator. For example, std::reverse, which reverses the elements in a container, can be implemented with a bidirectional iterator, but uses a more efficient algorithm when called with a random access iterator.

C++ iterators do not provide any form of error handling. Iterators over containers typically can't fail. An iterator associated with a file handles I/O errors by setting the file object into an error state, which can optionally cause an exception to be thrown.

Each C++ container type defines rules for when a modification to the container invalidates existing iterators. For example, inserting an element in a linked list does not invalidate iterators pointing to other elements, but inserting an element in a vector may invalidate them. Using an invalid iterator is undefined behavior, which can cause the program to crash or arbitrarily misbehave.

Java

Java also supports iterators to step through containers. Java iterators are much simpler than C++ iterators. Java defines an interface Iterator<E> that has three main methods: hasNext, next, and remove. Calling next in a situation where hasNext would return false will throw an exception, and in general Java iterators throw an exception for any error. (By the way, in C++ removing an iterator from a container is generally implemented as an erase method on the container type that takes an iterator as an argument.)

A Java container will have an iterator method that returns an Iterator that walks over the elements of the container. This too is described as a Java interface: Iterable<E>.

Java has a iterator using loop syntax like that of C++11 (C++ copied the syntax from Java):

  for (elementType var : container)
    doSomething(var);

This calls the iterator method on the container and then calls hasNext and next in a loop.

As far as I know Java does not have a standard implementation of output iterators or random access iterators. Specific containers will implement iterators with an additional set method that permits changing the value to which the iterator refers.

If a Java iterator is used after a container is modified in some way that the iterator can't support, the iterator methods will throw an exception.

Python

A container will implement an __iter__ method that returns an iterator object. An iterator will implement a __next__ method that returns the next element in the container, and raises a StopIteration exception when done. Code will normally call these methods via the builtin iter and next functions.

The Python for loop supports iterators.

  for var in container:
    doSomething(var)

This calls iter and next, and handles the StopIteration exception, as one would expect.

Python iterators generally don't permit modifying the container while an iterator is being used, but it's not clear to me precisely how they behave when it happens.

Discussion

For C++ and Python, iterators are a matter of convention: any type that implements the appropriate methods can return an iterator, and an iterator itself must simply implement the appropriate methods and (for C++) operator overloads. For Java, this is less true, as iterators explicitly implement the Iterator<E> interface. The for loop in each language just calls the appropriate methods.

These conventions are powerful because they permit separating the details of an algorithm from the details of a container. As long as the container implements the iterator interface, an algorithm written in terms of iterators will work.

Iterators do not handle errors in any of these languages. This is in part because errors can be handled by throwing exceptions. But it is also because iterating over a container doesn't fail. Iteration failure is only possible when a non-container, such as a file, is accessed via the iterator interface.

It's worth noting that the C++ use of paired begin and end iterators permit a kind of sub-slicing, at least for containers that support bidirectional or random access iterators.

Appendix: Efficient implementation of iter.NewGen

The natural way to implement iter.NewGen is to use a separate goroutine and a channel. However, we know from experience that that will be inefficient due to scheduling delays. A more efficient way to implement NewGen will be to use coroutines: let the generator function produce a new value and then do a coroutine switch to the code using the iterator. When that code is ready for the next value, do a coroutine switch back to the generator function. A coroutine switch can be fast: simply change the stack pointer and reload the registers. No need to go through the scheduler.

Of course Go doesn't have coroutines, but we can use compiler optimizations to achieve the same effect without any language changes. This approach, and much of the text below, is entirely due to @rsc.

First, we identify programming idioms that provide concurrency without any opportunity for parallelism, such as a send immediately followed by a receive. Second, we adjust the compiler and runtime to recognize the non-parallel idioms and optimize them to simple coroutine switches instead of using the thread-aware goroutine scheduler.

Coroutine idioms

A coroutine switch must start another goroutine and then immediately stop the current goroutine, so that there is no opportunity for parallelism.

There are three common ways to start another goroutine: a go statement creating a new goroutine, a send on a channel where a goroutine is blocked, and a close on a channel where a goroutine is blocked.

There are three common ways to immediately stop the current goroutine: a receive of one or two values (with or without comma-ok) from a channel with no available data and a return from the top of a goroutine stack, exiting the goroutine.

Optimizations

The three common goroutine starts and three common goroutine stops combine for nine possible start-stop pairs. The compiler can recognize each pair and translate each to a call to a fused runtime operation that does both together. For example a send compiles to chansend1(c, &v) and a receive compiles to chanrecv1(c, &v). A send followed by a receive can compile to chansend1recv1(c1, &v1, c2, &v2).

The compiler fusing the operations creates the opportunity for the runtime to implement them as coroutine switches. Without the fusing, the runtime cannot tell whether the current goroutine is going to keep running on its own (in which case parallelism is warranted) or is going to stop very soon (in which case parallelism is not warranted). Fusing the operations lets the runtime correctly predict the next thing the goroutine will do.

The runtime implements each fused operation by first checking to see if the operation pair would start a new goroutine and stop the current one. If not, it falls back to running the two different operations sequentially, providing exactly the same semantics as the unfused operations. But if the operation pair does start a new goroutine and stop the current one, then the runtime can implement that as a direct switch to the new goroutine, bypassing the scheduler and any possible confusion about waking new threads (Ms) or trying to run the two goroutines in different threads for a split second.

Note that recognizing these coroutine idioms would have potential uses beyond iterators.

NewGen

Here is an implementation of iter.NewGen that takes advantage of this technique.

// NewGen creates a new iterator from a generator function gen.
// The gen function is called once.  It is expected to call
// yield(v) for every value v to be returned by the iterator.
// If yield(v) returns false, gen must stop calling yield and return.
func NewGen[E any](gen func(yield func(E) bool)) StopIter[E] {
	cmore := make(chan bool)
	cnext := make(chan E)

	generator := func() {
		// coroutine switch back to client until Next is called (1)
		var zero E
		cnext <- zero
		if !<-cmore {
			close(cnext)
			return
		}
		gen(func(v E) bool {
			// coroutine switch back to client to deliver v (2)
			cnext <- v
			return <-cmore
		})

		// coroutine switch back to client marking end (3)
		close(cnext)
	}

	// coroutine switch to start generator (4)
	go generator()
	<-cnext

	r := &genIter[E]{cnext: cnext, cmore: cmore}
	runtime.SetFinalizer(r, (*genIter[E]).Stop)
	return r
}

// genIter implements Iter[E] for NewGen.
type genIter[E any] struct {
	cnext  chan E
	cmore  chan bool
	closed atomic.Bool
}

// Next implements Iter[E]
func (it *genIter[E]) Next() (E, bool) {
	// coroutine switch to generator for more (5)
	// (This panics if Stop has been called.)
	it.cmore <- true
	v, ok := <-it.cnext
	return v, ok
}

// Stop implements StopIter[E]
func (it *genIter[E]) Stop() {
	// Use the closed field to make Stop idempotent.
	if !it.closed.CompareAndSwap(false, true) {
		return
	}
	runtime.SetFinalizer(it, nil)
	// coroutine switch to generator to stop (6)
	close(it.cmore)
	<-it.cnext
}

The compiler would need to fuse the commented operation pairs for potential optimization by the runtime: send followed by receive (1, 2), close followed by return (3), go followed by receive (4), send followed by comma-ok receive (5), and close followed by receive (6).

Replies

84 comments
·
465 replies

Why is the special method for use in for range loops called Range? It is a container method that returns an iterator on the container; wouldn't Iter be a more natural name for it? That would be naming the method for what it does, rather than Range, which names the method based on what the calling code is thought likely to be doing with it.

8 replies
@ianlancetaylor

To me it seems natural for a statement like for v := range c to invoke c.Range rather than c.Iter. After all, range is the relevant keyword. And "range" is already a general Go term for iterating over a container, so writing it := c.Range() also seems natural.

But I don't feel strongly about it. Happy to hear other opinions.

@rothskeller

[deleted a reply that, after reconsideration, was flawed]

@jimmyfrasche

Feels super strange to me. It returns an iter not a range. Range is what you do to the iter. I don't think there should be magic interface at all, though. I'd rather just write for k, v := range x.Iter(), so feel free to take my strong opinion about the name for something I don't think should exist with a grain of salt.

An iterator that can fail will either return a single value that includes an error indication, or it will implement Iter2[E, error]. It's not yet clear which of those options is better.

Would it be possible to have an auxilliary method, Err() error that works in the same way as the bufio.Scanner's Err method works?

Compatibility note: if the type of it is a slice, array, pointer-to-array, string, map, or channel type, then the Next method will be ignored and the for range will operate in the usual way. This is required for backward compatibility with existing code.

If the user wants to use an iterator method in place of the 'natural' range iteration, would this be possible to signal to the for loop? One way that I could see this working would be to wrap it as an embedded field in a struct.

Because it's inconvenient to write for v := c.Range(),

Should this be for v := range c.Range()?

38 replies
@jhenstridge

The problem with a separate Err() method is the integration with the range syntax. If I write code like for v := range container {...}, then I never see the iterator returned by container.Range() and can't call that method.

@kortschak

Yes, that is a consideration — I thought about it when I was making the suggestion. My reasoning was that it's easy to add a single line to get the iterator before the range in the case that that is needed. (Without evidence) I'll say that iterating with a need for errors is going to be much less common than without. ISTM that it would be better to compose the error handling in rather than either requiring all iterators support it or doubling the number of iterator types, even given the small cost of needing to declare the iterator before the range in cases where error handling is needed.

@jhenstridge

If your proposed method is part of the standard iterator interface, then any range loop is a potential programming error. If I'm reviewing code, I'd need to check what type was returned by the container's Range() method, and see if that iterator type's Err() method might ever return something other than nil.

I'd prefer not to have the feature at all than open up that can of worms.

I'm excited by the coroutine optimizations. I've lost track of how many times a goroutine-generator pattern would have been the cleanest way to implement something, but couldn't be used because of the inefficiencies. I support this whole proposal, which I think is necessary and valuable — but the coroutine optimizations would be hugely valuable even if the rest of this proposal didn't happen.

3 replies
@blizzy78

Perhaps the coroutine optimizations should be extracted into a separate proposal.

@betamos

Can you help me understand? The whole section on generators seems so strange:

    // If yield(v) returns false, gen must stop calling yield and return.
    func NewGen[E any](gen func(yield func(E) bool)) StopIter[E]

So yield is just a callback, and in order to provide generator semantics (a) the compiler needs to ensure that the user respects the return value of yield (what should happen if it can't?) and (2) an extra goroutine, a regular unbuffered channel, a back-channel, an atomic and a finalizer are needed. To me, this seems like a heroic set of complexity based on non-existing optimizations, to deliver a moderately useful feature. Have I misunderstood or misrepresented something?

I'm not against either generator functions or compile-time channel optimizations, but it seems like it would be way easier to implement generator functions the old-fashioned way, which I assume is a new language construct in a separate proposal that applies the 1:1 coroutine transfer under the hood at the yield-point, without seemingly unrelated (?) multithreaded constructs like goroutines and atomics.

@ianlancetaylor

@blizzy78 There is no need to put the coroutine optimizations into a separate proposal, because they are simply optimizations. They can be done at any time, independent of this proposal. They are described in this proposal to show how NewGen can be implemented efficiently.

@betamos We have opposite ideas of "easier." To me it's much easier to not require a new language construct. A new language construct would have to fit in well with the rest of the language and work orthogonally with the go statement and channels and the select statement. It doesn't seem easy at all.

This comment has been hidden.

@nightlyone

This comment has been hidden.

@Merovius

This comment has been hidden.

@ianlancetaylor

This comment has been hidden.

At the risk of inviting bike-shedding, let me raise a minor concern on a naming choice.

Iter2 would be a widely-used stdlib interface. Unlike other widely used interfaces from the stdlib (io.Reader, io.Writer, etc), it includes a digit in its name, which could impact cognitive chunking while reading code. The name IterPair could convey the same information but would not have this feature. That said, it is longer.

6 replies
@lpar

Also, as per the naming conventions for interfaces, an Iter is a thing which Its. Given that you only write the iterator code once and then use it from regular for...range loops, I don't see a huge benefit to shortening the name.

Basically, I like this proposal except for the naming.

@jannotti

Just because a noun ends in er doesn't mean it's forbidden in an interface because there's no corresponding verb. If my interface should have a method called Beer() or something like that, so be it.

@lpar

Maybe since the primary purpose is enabling for...range, it should be called a Ranger rather than an Iter.

I use following pattern to solve this problem in my project, I think it is simpler than interface from proposal , and c++ implementation.

func (ci *Db) MustRangeCallback(req RangeCallback_Req, cb func(row RowData)bool){
   ...
}
num:=0
db.MustRangeCallback(req, func(row RowData)bool{
   num+=row.GetCount()
   return num<=100
})
fmt.Println(num)

return true in cb means continue give me next one in the loop, return false in cb means break this loop.

Maybe we can make this pattern easier to call with golang for range grammar or do not add grammar to golang.
The performance is good right now. I have confirm cb in this pattern will be embeded if the cb is not too large. So there is no more memory alloc when you use this pattern compare to embed the loop in the implement of the callback.

7 replies
@rothskeller

@bronze1man, a difficulty with the pattern that you propose is that it generally requires the callback function to be a closure — such as for access to the num variable in your example. That means every usage of this pattern results in a heap allocation. In addition, every element in the container requires a function call, and if the function body cannot be inlined, that can be very expensive. It seems unlikely that the pattern you describe can be rendered efficient in the general case.

@bronze1man

@rothskeller I think there is no big different in compiler implement between my closure version and the proposal. The compiler can identify the kind of closure usage and generate code that looks like proposal if it can speed up.

About heap allocation:

  • Struct to interface need a heap allocation. Alloc a pointer to a new struct need a heap allocation. So proposal will need at least one heap allocation in the general case without inline. But I can cache and reuse the interface in the proposal.
  • Closure will need a heap allocation without inline. So closure version will need at least one heap allocation in the general case without inline. But I can cache and reuse the closure.

But my version only need human to write one function to implement the iterator. So is the simpler way.

@Merovius

return true in cb means continue give me next one in the loop, return false in cb means break this loop.

Notably, there is no way to return out of the loop.

This comment has been hidden.

@ianlancetaylor

This comment has been hidden.

You should be able to query if an iterator is empty without pulling an element out of it. The Java model is better for that reason. Example. The iterator interface needs another method.

18 replies
@kortschak

It is always possible for the types to implement a Len() int method. The proposal here is for teaching the language to use the iterator via for range. That doesn't need to include the length.

@seancfoley

It doesn't need to be a Len method that needs to calculate the size, which is not always easy to calculate. But a simple method to ask if empty or not, which is always easy to calculate. There is no reason why it cannot be a part of the interface, resulting in better polymorphism.

@seancfoley

It is fairly common to want a method that indicates emptiness without the side-effect of state change. Sure, people can wrap the original to achieve this, but frankly it just seems like it's no cost to make it part of the original standard interface. Yes, golnag likes small interfaces, but two methods is still small.

With respect to integration with the range syntax, would it make sense to have two methods so the container can specifically handle one- vs. two-value iteration? I'm thinking of cases where the container can implement the iterator faster if it knows the second value won't be needed.

Taking an example from the language's builtins, consider slices: if I write for idx := range slice { ... }, the slice is just producing a sequence of integers. If I do for idx, val := range slice { ... }, it is also creating copies of each element of the slice.

If I wanted to implement my own container type that provided slice-like iteration, I'd need to make the Range method return an Iter2[int, E] and have its Next method always copy the elements. If the method is short enough, it might get inlined and the copy optimised out for a one-value range loop, but there's probably going to be cases where that doesn't happen. It'd be nice not to rely on the optimiser for this.

7 replies
@kalexmills

There's no reason that a type can't implement both the Iter and the Iter2 interfaces. The Go compiler would just use whichever interface makes sense based on the range syntax used.

@jhenstridge

Iter[E] and Iter2[E1, E2] define Next methods with incompatible signatures, so it'd be impossible for a type to implement both interfaces.

Also, I think the container might want to provide two different iterator implementations here. The decision between one-value and two-value mode is made once, rather than each time you iterate. I was thinking something more like having both a Range and a Range2 method (although with better names).

@kalexmills

Oooh. Good catch. That would be a problem.

I think a simple fix then might be to rename Iter2's Next method to NextPair().

I don't like the complexity of adding generators. Their usefulness is suspect - as almost always if a generator makes sense it is because the size is unbounded which usually equates to a more complex process underlying the collection - the block/notify is almost always going to be needed at some level anyway in this case.

It may make sense for trivial series generation - but for those cases you can just as easily code it with a state machine.

I also don't like the xxxx2 interfaces. I would rather see KeyValue and IndexValue interfaces created to be used as the return types - so Iter[KeyValue[K,V]] or Iter[IndexValue[I,V]] or Iter[Any[E]]. The compiler can use similar inspection code for the range operator.

Other than that it is a great step forward for Go.

34 replies
@rothskeller

The usefulness of a design pattern is not invalidated by the existence of other ways to achieve the same goal. In a large software system, ease of maintenance is one of the primary design criteria — often a more important one than the more traditional measures like speed, memory usage, etc. Experienced people who enjoy designing new software will design it in such a way that it's easy to maintain, so that they don't get trapped in legacy software support and are free to continue designing new software. To that end, they will use whatever design pattern most clearly communicates to a future maintainer what's going on in the system. The existence and usage of generators in various languages makes it undeniable that, for some classes of problems, they are the clearest and most intuitive approach.

@kalexmills

Go already includes a data type with potentially unbounded size -- channels. It makes sense to me for the stdlib's Iterator interface to be rich enough to capture channels, so I can see why generators were included.

@robaho

Agreed. Generators are much harder to reason about for most people, and in a complex system they are of limited usefulness - because there is housekeeping required for the producer/consumer channels. Like I admitted, it makes simple series generation easier - the languages that rely on generators like python, javascript do so because they have very poor concurrency support.

Because it's inconvenient to write for v := c.Range(), we propose a further extension: we permit range c if c has a method Range that returns a value that implements Iter[E] or Iter2[E]. If the Range method implements StopIter[E] or StopIter2[E] then the range loop will ensure that Stop is called when exiting the loop.

One thing that strikes me as unfortunate about this is that there is no way for a function to take advantage of the same mechanism to work on either a full collection or a bare iterator. So, say I have Contains[E comparable](x TODO, v E) bool, I have to ask myself what TODO should be. If it is Iter[E], then I have to manually handle the (optional) Stop method with type-assertions. If it is interface{ Range() Iter[E] }, the function does not work with a pure iterator. It would be great if Contains could also take advantage of the same thing and just do

func Contains[E comparable](x TODO, v E) bool {
    for e := range x {
        if e == v {
            return true
        }
    }
    return false
}

which would handle Stop transparently. Not sure how to do this, though.

This also brings up another thing: The Range method doesn't "implement" anything, it's what the method returns, presumably. And when the proposal says "if the return value of Range implements StopIter[E]", does that mean statically, or will the compiler emit code to do an interface type-assertion? This should be clarified.

It might be useful to do a type-assertion, as it would be possible for a wrapping iterator to implement StopIter[E] if and only if the wrapped iterator does - which means the wrapping function needs to return Iter[E] (i.e. it needs to return an interface that does not statically implement StopIter[E]).

15 replies
@Merovius

Not sure how to do this, though.

FWIW one way to theoretically do it would be to magically have iter.Iter[E] implement interface { Range() Iter[E] } where the implementation returns itself. Arguably, iter.Iter is already somewhat magic. But I don't like it.

Another way to do it would be to define interface { Range() Iter[E] } as a standard interface and codify the convention that an iterator should implement this, returning itself. Or maybe even make it part of the iter.Iter interface, if we want to really nail down that it should exist.

@jhenstridge

Python's iterator protocol is essentially your "always call Range" suggestion.

Objects are iterable if they implement a __iter__ method that returns an iterator. An iterator then implements a __next__ method to produce values. If you wanted to pass an iterator directly to a for loop, it'd need to have an __iter__ method that returns itself.

@Merovius

If it is Iter[E], then I have to manually handle the (optional) Stop method with type-assertions.

Uhm, this is of course violating the code-style suggestions of the proposal. Instead, the issue is that the caller has to write

it := c.Range()
defer it.Stop()
Contains(it, e)

which is inconvenient. The point remains that the special handling of for loops that make them work well both with collections and iterators over them can't be taken advantage of across function boundaries.

I believe the intention is that the code instead just does Contains(c.Range()), leaving the actual Stoping to the finalizer. But I find that dissatisfying, personally. I've been trained not to rely on finalizers and not calling Stop when we know perfectly well when it should be called just feels inefficient.

I like the general direction of this proposal. There are some minor disagreements though:

  1. There is no need for two Iter interfaces. Only Iter[K,V any] is needed and the for-range loop will discard values appropriately as whether one or two values are needed. Hence, for a simple array-like iterator implementation, the iterator should return the index and the value.
  2. Stop interface can be merged with Iter interface. I think cleaning up is an essential part of the iterator.
  3. Iterators should be for reading values only. Insertion, updating, and deletion should be manually done. I disagree with the 'optional future extension' part.
  4. I am ambivalent about generators as they can be convenient.

Another important point is to keep the number of "must-know" interfaces small. So try not to add too many of those.

14 replies
@Merovius

Only Iter[K,V any] is needed

How would a chan T based iterator implement this?

@henryas

chan T iterator can just return -1 and T.

@kortschak

That then needs to be written down somewhere and understood by code.

I would like us to encourage, from the start, to return exported, concrete types as iterators, not the iter.Iter{,2} interfaces. Concrete types make it possible to add extra methods if desired (for example for one of the possible future extensions). They might also, in some cases, make it easier for the compiler to do inlining and escape analysis, thus eliminating some copies and allocations.

In particular, I would like all the top-level functions of the iter-package to do that. In some cases, this means exporting extra types. In others (e.g. funcIter) it can mean eliminating a top-level function altogether.

A possible exception to this would be functions which exist to compose iterators, like Map/Filter/Reduce. Returning interfaces can reflect that composition aspect quite well. Though even for those, I think it might be sensible to return concrete types. If we ever added "refined method constraints" to the language to solve "optional interfaces"¹, it would enable such functions to implement StopIter if the underlying type implements StopIter, for example (and similar for other possible future extensions).

[1] I'm not sure if we've standardized on these terms, but I'm referring to the solution the FGG paper suggests for the expression problem

15 replies
@kalexmills

I agree. I would definitely like to see a reminder to "accept interfaces, return structs" included in the package documentation should this proposal be accepted.

@ianlancetaylor

There is a conflicting style guideline which is: as a general rule, don't return unexported types. So if we stick to that rule and to the rule that you suggest, we wind up having to export types like FuncIter. Is that useful?

I agree that if there is any possibility of adding more methods to the type, then the type should be exported and returned directly.

@robaho

That is the underlying problem I was getting at. You need to create a lot of extra exported types. To simplify my earlier example:

s := StudentService()

itr := s.getStudents(predicate_function)

getStudents() needs to return a type StudentStopIter[Student] but the natural implementation in the service would be:

return iter.Filter(sql.itr,predicate_function)

How is having to create StudentStopIter any better than way we have now?

Or, you ignore the "return concrete types rule", and return iter.StopIter[Student] but you still need an unexported local type within the service to convert the iter.Filter() iterator into a StopIter.

I would like to suggest adding two functions for composability and potentially reducing redundancy in APIs:

// Left returns an iterator over the first element type of an Iter2, dropping the second.
func Left[E1, E2 any](it Iter2[E1, E2]) Iter[E1]

// Right returns an iterator over the second element type of an Iter2, dropping the first.
func Right[E1, E2 any](it Iter2[E1, E2]) Iter[E2]

(bikeshed colors up for discussion. These names are taken from Haskell's Either, because "First" and "Last" are misleading in the context of iterators).

Far less useful, but suggesting it just for completeness:

type Nop struct{}

// ToLeft returns an Iter2 that uses it for its first element type
func ToLeft[E any](it Iter[E]) Iter2[E, Nop]

// ToRight returns an Iter2 that uses it for its second element type
func ToRight[E any](it Iter[E]) Iter2[Nop, E]
3 replies
@Merovius

An example usecase:

// Count counts how often any element appears in it.
func Count[E comparable](it Iter[E]) map[E]int

// Duplicates returns all elements from it which appear more than once.
func Duplicates[E comparable](it Iter[E]) Iter[E] {
    return iter.Left(iter.Filter2(func(_ E, count int) bool {
        return count > 1
    }, iter.FromMap(Count(it)))
}

The key point here is that the Iter2 needs to be filtered while both key and value are available, so we can't rely on, say maps.Keys or maps.Values to give us an Iter - the projection to Iter needs to happen after we massaged the Iter2 into shape.

@carlmjohnson

I think there's a typo and you left the 2 out of Iter2, e.g. func Left[E1, E2 any](it Iter2[E1, E2]) Iter[E1].

@carlmjohnson

I'm not sure how useful ToLeft/Right would be. I think most cases could be of wanting an Iter2 could be handled by Enumerate (to use the name Python uses) or Count (to use a slightly less jargony name for the same thing) plus one for errors called NilError.

Clarification:

Using range with an Iter2[E1, E2] will permit using two variables in the for statement, as with range over a map.

Does this mean it only permits the two-variable form, or does it mean to permit either the one or the two-variable form (as with range over a map)?

3 replies
@ianlancetaylor

The intent is to permit either form.

@jhenstridge

@ianlancetaylor: that's seems sub-optimal for the error return case. If a container has a Range() Iter2[E, error] method, then I can write for v := range container { ... } and it isn't obvious that I'm ignoring the error values.

@Merovius

I'm not sure I like that. The design suggests using Iter2[T, error] for iterators which can fail. If we allow the one-variable form for Iter2, it seems far too easy to accidentally ignore those errors. [edit] sorry, @jhenstridge's comment didn't appear for me until I submitted my duplicate comment [/edit]

Perhaps I'm slow, but I just realized that copying an iterator by assignment or passing to a function behaves very differently under this proposal than with, say, C++ iterators. For example, consider this code using FromSlice from in the proposal:

func Sum(it Iter[int]) int {
	sum := 0
	for e := range it {
		sum += e
	}
	return sum
}

func main() {
	s := []int{2, 3, 5, 7}
	begin := FromSlice(s)
	fmt.Println(Sum(begin))
	fmt.Println(Sum(begin))
}

(working code at https://go.dev/play/p/IZY3wmlTlRn)

One might expect this to print 17 twice. But in fact it prints 17 and 0. The reason is that these iterators are pointers to the actual iterator state, so it inside Sum and begin inside main share the same state; they are effectively the same iterator, and calling Next on it also advances begin. After the first call to Sum, begin points to the end of the slice.

This is not a quirk of this particular implementation. It is a natural consequence of the way iterators work in this proposal. FromSlice pretty much has to return a pointer (or equivalent to a pointer) in order to be useful.

Given this, consider another example. How would one write a function that iterates over an ordered container of integers, and prints everything after the last 0, without caching the container contents? I think that would look something like this (untested):

type ClonableIter[E any] {
    Iter[E]
    // Clone returns a duplicate of the original iterator, pointing to the same position within the iteration.
    // This duplicate can be advanced independently of the original.
    Clone() ClonableIter[E]
}

func PrintAfterLast0(it ClonableIter[int]) {
    zero := it.Clone()
    for {
        if e, ok := it.Next(); !ok {
            break
        } else if e == 0 {
            zero = it.Clone()
        }
    }
    for e := range zero {
        fmt.Println(e)
    }
}

Furthermore, I would expect that most iterators over containers would be ClonableIters, not plain Iters.

7 replies
@AndrewHarrisSPU

This is a good observation. There is some language in the proposal that I think targets these concerns:

We should let each container type define how it behaves if the container is modified while iterators are active.

The suggestion is that the contractual understanding of iterators is relatively weak. In turn I think I feel like trying to do much with iterators leads to a variation of Greenspun's tenth rule - "Any sufficiently complicated implementation of iter.Iter contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

How would one write a function that iterates over an ordered container of integers, and prints everything after the last 0

Not that I think this is the ultimate answer, but what I think the contract for the basic Iter suggests: don't pass the iterator, pass the container and get a new iterator. The additional SetIter, DeleteIter are suggestive of further refinements.

@Merovius

How would one write a function that iterates over an ordered container of integers, and prints everything after the last 0, without caching the container contents?

I think with the design as-is, you'd argue from a position of analogy with io.Reader and do something like bufio. That, of course, is what you mean by "caching the container contents".

Otherwise, I think the PrevIter extension would help. Your extension would also help. So, yes, you'd need some form of extension for this to work.

I'm not super worried about supporting that use-case from the get-go, TBQH.

@pat42smith

This is a good observation. There is some language in the proposal that I think targets these concerns:

We should let each container type define how it behaves if the container is modified while iterators are active.

Sorry, but I don't understand what you mean. The containers aren't being modified in my examples.

Not that I think this is the ultimate answer, but what I think the contract for the basic Iter suggests: don't pass the iterator, pass the container and get a new iterator. The additional SetIter, DeleteIter are suggestive of further refinements.

But as I pointed out at #54245 (comment), passing the container has problems. Also, it wouldn't solve this example unless you want to iterate over the container twice: first time counting the number of items before the first (edit: last) 0, and second time skipping that many items before starting to print items.

After reviewing this draft and the many comment threads attached to it, I'm left with the feeling that as currently defined the proposal might be "over-reaching", trying to address many different concerns all at once. I suppose then that in a sense this comment is an objection to the "What we want from Go iterators" section in particular, although of course it indirectly addresses the whole thing in the sense that the rest of the proposal is in response to the problem statement.

Specifically, I see a few different things here that don't seem to all necessarily need to be designed and implemented together (in no particular order):

  1. Generic "streams" over values, which can produce an arbitrary number of values, can fail, and be closed when no longer needed. I read this as a generalization of io.Reader and io.Closer over arbitrary types rather than just bytes.
  2. Allowing for composable functional-style primitives over lazily-built infallible sequences, such as the typical "map" and "filter" functions, without assuming a particular underlying data structure for the source sequence or necessarily buffering the whole sequence in memory.
  3. Implementing for ... range for custom collection types.

While I can certainly respect the desire to do more with less, it seems like the current proposal has a number of ergonomic challenges for each one of these situations that arise from the needs of the other two. For example:

  • There is no precedent for for ... range to handle failure either on entry into the loop or during iteration, but fallible streams require special control flow to handle errors.
  • for ... range has built into it the idea that a particular type may have either one or two iteration variables when "ranging over" it, but this idea of choosing either a 1-tuple or 2-tuple seems rather arbitrary when considering the general idea of infallible sequences.
  • The idea of "closing" a fallible stream has an obvious analog in io.Closer, allowing the close operation itself to fail, but that doesn't seem so obviously needed for infallible sequences. (I do see the argument that the iterator might have some resources in it that don't get cleaned up naturally as part of GC, such as an open filehandle, but that problem doesn't seem unique to iterators and in all other situations those must either be handled explicitly in user code or via a finalizer.)

Might it be helpful to start by designing an "ldealized" solution for each of the scenarios, and then afterwards look for opportunities for reuse? I have an admittedly-unjustified hunch that "the solution" here might instead be three different patterns along with some glue to explicitly compose them together, rather than a single design that addresses all three.


I feel like I should give an example of what I mean rather than just raising a problem. I'm not intending these as a specific design proposals for each scenario -- I'm not wedded to the approach I'm about to describe. Instead, I'm using this an example of how solving each of these problems separately and then "gluing" them together might work to make each situation more ergonomic in isolation while still allowing programmers to explicitly compose these primitives when it makes sense to do so.

First, let's consider a possible design for "streams" (my scenario 1):

package streams

type Reader[T] interface {
    Read() (T, error)
}

type Closer interface {
    Close() error
}

var EOF = errors.New("EOF")

type ReadCloser[T] interface {
    Reader[T]
    Closer
}

This is intentionally modelled after io.Reader, io.Closer, and io.ReadCloser. Just as for those, both individual reads and the final close are fallible. The patterns for working with these would be similar than for their io equivalents.

And now a hypothetical design for iterating over infallible sequences (my scenario 2) which is the bare minimum required to meet that need (as far as I can tell):

package sequences

type Iter[T] interface {
    Next() (T, bool)
}

func Map[T1, T2](seq Iter[T1], f func (elem T1) T2) Iter[T2] {
    // <returns an interator whose Next calls seq.Next and passes its result into "f">
}

This general idea of a sequence needs no idea of returning different numbers of values: it just models the basic idea of a lazy sequence that a caller can read from one value at a time until it's "done".

Finally, here's a hypotherical design for for ... range over custom types (scenario 3):

package sequences

type RangeItem[T1, T2] struct {
    First T1
    Second T2
}

type Unused struct {}

type Ranger[T1, T2] interface {
    Range() Iter[RangeItem[T1, T2]]
}

This part is admittedly still quite clunky. I'm not really sure how to solve the 1-tuple vs. 2-tuple problem ergonomically, but my main intent here is for it to be addressed separately in terms of infallible sequences, rather than as part of infallible sequences, so that functions that work with infallible sequences in general don't need to concern themselves with tuple arity. My strawman here is that a user-defined collection would implement Ranger[T1, T2] for some specific T1 and T2, and may leave either one as this sentinel type Unused to represent the 1-tuple case. I feel ambivalent about whether the compiler should treat Unused in any special way here (for example, disallowing specifying a second iteration symbol if T2 is Unused. But the concrete proposal is not really the point here, so much as treating it as something separate from the design of infallible sequences.

The Ranger[T1, T2] interface shows the bridge between the for ... range design and the infallible sequences design. This interface is explicitly for for ... range and not intended to serve any other purpose, but an author might implement it using the sequences.Map function to concisely adapt a sequences.Iter[T] provided elsewhere into the form the for ... range construct expects. Note that my intent here is that for ... range will only use Ranger[T1, T2], and will not attempt to automatically handle any other type; if someone wants to write an iterator that can also be used directly in for ... range then that iterator can itself implement Ranger[T1, T2], presumably just returning a sequences.Map derived from itself.

That then leaves the bridges back and forth between fallible streams and infallible sequences. Going from infallible to fallible is straightforward:

package streams

type SequenceStream[T] struct {
    Iter sequences.Iter[T]
}

// streams.Reader implementation
func (s *SequenceStream[T]) Read() (T, error) {
    v, ok := s.Iter.Next()
    if !ok {
        return EOF
    }
    return v, nil
}

// streams.Closer implementation
func (s *SequenceStream[T]) Close() error {
    return nil
}

Going from fallible to infallible requires the recipient of the value to do a little more ceremony. The following is inspired by the design of bufio.Scanner:

package sequences

type StreamIter[T] struct {
    stream streams.Reader[T]
    err error
}

func NewStreamIter[T](stream streams.Reader[T]) *StreamIter[T] {
    return StreamIter[T]{
        stream: stream,
    }
}

// sequences.Iter implementation
func (it *StreamIter[T]) Next() (T, bool) {
    var v T
    v, it.err = s.stream.Read()
    return v, it.err == nil
}

// sequences.Ranger implementation
func (it *StreamIter[T]) Range() Iter[RangeItem[T, Unused]] {
    return Map(it, func (v T) RangeItem[T, Unused] {
        return RangeItem{First: v}
    })
}

// This method isn't an implementation of any particular interface;
// it's specifically for StreamIter.
func (it *StreamIter[T]) Err() error {
    return it.err
}

The pattern for using sequences.StreamIter[T] as a sequence might then look like this:

stream := AnyStream()
defer stream.Close() // if and only if stream is a streams.Closer too! (and handle errors if needed)
iter := sequences.NewStreamIter(stream)
// <do normal infallible-sequence-ish things with "iter" here>
if err := iter.Err(); err != nil {
    // <handle the error>
}

Since sequences.StreamIter[T] also implements sequences.Ranger, it could also be used in a for ... range loop by logical extension of the previous example:

stream := AnyStream()
defer stream.Close() // if and only if stream is a streams.Closer too! (and handle errors if needed)
iter := sequences.NewStreamIter(stream)
for v := range iter {
    // <do something with "v">
}
if err := iter.Err(); err != nil {
    // <handle the error>
}

The above is comparable to the documented examples for bufio.Scanner, since I think bufio.Scanner would make a good example of a fallible stream of something other than bytes but where it's still very convenient to use it with functions that expect an infallible sequence. I hope it'd be clear to an author who is using StreamIter to pass a stream to a function that works with iterators that it's their responsibility, as the "owner" of the original stream that's being wrapped, to close it if necessary and handle any errors that arose during iteration.

I want to reinforce that my intent with showing this "counter-proposal" is only to illustrate that these three problems seem separable while still allowing explicit composition of the three concepts, not to propose these particular designs. I think it would be interesting to split this proto-proposal into at least three parts that could in principle be implemented separately -- although admittedly the for .. range part does depend on the "infallible sequence" part, so they are not truly independent at least as I've framed it here.

12 replies
@seancfoley

I think I like your suggestions here better than the original proposal. I also agree that the distinction between fallible (ie can fail) vs infallible is key. For the most part, other languages do not use iterators for fallible streams, iterators tend to be used for things like containers and data structures for which iteration is nothing more than iteration, it is not a more complex operation involving error scenarios. Iteration is just following a sequence, nothing more. I agree that trying to combine these two concepts (fallible and infallible) into a single interface pattern is not elegant. It is especially problematic when trying to integrate the fallible scenario with for ... range. Sure, it's not impossible to force it, as done in this proposal, but it's clunky.

I think your Reader interface suits the fallible scenario better.

I also like the separation of for ... range from the iterator, but linking of the two.

I would also suggest going with:

type Ranger[T1] interface {
    Range() Iter[T1]
}

type RangePair[T1, T2] struct {
    First T1
    Second T2
}

type PairRanger[T1, T2] interface {
    Range() Iter[RangePair[T1, T2]]
}

For the latter, the PairRanger, users could chooseone of for i := range, or for i, j := range or for _, j := range as is the case today with a slice.

And of course this could be extended to triplets if so desired, a TripletRanger:

type RangeTriplet[T1, T2,T3] struct {
    RangePair[T1,T2]
    Third T3
}

type TripletRanger[T1, T2,T3] interface {
    Range() Iter[RangeTriplet[T1, T2,T3]]
}
@carlmjohnson

I find the idea of changing the proposal's Next() (T, bool) to Next() (T, error) intriguing. There could be an iter.ErrStop that is returned to signal normal termination, like io.EOF.

@carlmjohnson

I don't like using bufio.Scanner as a model. I think it's too common for users to accidentally forget to check s.Err() at the end of a scan, so I would prefer if the next step in Go iteration avoids that pitfall if at all possible.

I'm surprised he hasn't mentioned it here yet: @rogpeppe also suggested at some point to use a package io like API for batched streams. That's technically also fulfilling the idea of providing iterators. It supports error handling, stopping (via Closer), random access iterators…

I don't really like it, but I have to admit that I find it hard coming up with good reasons why. And it does demonstrate some out of the box thinking and is a familiar, tried and tested API.

3 replies
@Merovius

The iterator API from the design is essentially io.ByteReader in that analogy (just that ByteReader returns an error, not bool).

@rogpeppe

I think that API is tried and tested for very low cost-per-element iterators, but at significantly higher implementation cost (writing io.Reader implementations that wrap other io.Readers is always a little tricky, as there are a bunch of edge cases to think about).

I believe that it's possible to implement Iter in terms of a genericio.Reader, and vice versa, so I feel both ideas have their place.

@Merovius

I believe that it's possible to implement Iter in terms of a genericio.Reader, and vice versa.

Indeed. To me, this kind of mitigates the "implication cost" argument.

This comment has been hidden.

@ianlancetaylor

This comment has been hidden.

Let me say first that I really like this proposal. It addresses a lot of design issues that I've been puzzling over for years - in particular, the NewGen design and associated compiler/runtime optimisations is beautiful. I had reservations about Iter2 at first (in particular, wondering, why Iter2 isn't spelled Iter[Pair[K, V]] or similar), but I see that the distinction is important for ambiguity reasons when used in Range).

I'm not keen on the suggestion for iterators that can error to be of the form Iter2[E, error]. That means that when doing:

    for x := range iter2 { ... }

the final value of x will always be the zero value (the iterator could potentially buffer one item so that it could return it along with any subsequent error, but that doesn't work if there's an error retrieving the very first item). I'm inclined to say that iterators that can error should either have an Err method that must be called explicitly (yes, I know that's historically error-prone), or avoid implementing the iterator interface entirely, forcing users to call the error-returning method directly themselves.

I have reservations about the potential performance implications of using iterators. ISTM that NewGen, although wonderfully elegant, is a garbage factory. I'm not sure that the compiler will in general be able to tell that the coroutine and its stack and the two channels doesn't escape to the heap. Luckily, most iterator composition operators can be implemented easily without using NewGen.

The coroutine optimisations rely on checking that a send/receive on a channel will block. Is that something that can be done without using (relatively) expensive synchronization primitives?

Debugging code that's using NewGen might turn out more awkward than debugging a regular iterator, because of the goroutines, but perhaps that's something a good debugger could help with.

The NewGen docs should probably document that the function is called in another goroutine, therefore care needs to be taken with side-effects. For example, should this code be considered racy or not?

	x := 0
	iter := NewGen(func(yield func(int) bool) {
		for ; x < 10; x++ {
			yield(x)
		}
	})
	defer iter.Stop()
	for i := range iter {
		x++
		fmt.Println(i)
	}

One small concrete suggestion: perhaps using range over a nil iterator could terminate immediately. So the following code would print no values.

var it iter.Iter[int]
for x := range it {
    fmt.Println(x)
}
10 replies
@jhenstridge

With the proposed implementation of NewGen, that example would be non-racy: the sends on the pair of channels creates "happens before" at the point where execution switches between the two goroutines.

@AndrewHarrisSPU

perhaps using range over a nil iterator could terminate immediately

Would this imply non-equivalent behavior between for...range (short-circuit) and calling Next() (panic)? My sense is that this would be surprising. Also the for...range rewrites the "defer" of Stop in a slightly subtle way; would/should this also short-circuit rather than panic?

@dylan-bourque

Personally I would hope/expect that a for...range over a nil iterator would short-circuit to a no-op the same as for a nil slice/map rather than panic and crash the application.

1 reply
@robaho

Also, it is trivial to wrap an in memory container to create a stream - difficult to make an io based iteration a simple iterator.

I’d like to suggest an alternative design for cleanup:

  • Remove the StopIter type and the Stop method.
  • Instead of returning a StopIter, NewGen returns (Iter[E], func()). The second return value is the cleanup function. It behaves exactly like the Stop method.
  • The expression after range in a range loop can be the two values (Iter[E], func()) in addition to just Iter[E]. The loop arranges to call the cleanup function when it finishes, just as with the Stop method.
  • All these changes apply to the two-type-parameter forms as well.

These changes greatly improve the likelihood that programs will clean up iterators, because they expose the need to stop the iterator in the syntax.

Cleanup is easily missed when using functions that accept iterators. Here is some correct code that doesn’t involve cleanup, where s is a slice and f is some function that transforms elements of s:

iter.ToSlice(iter.Map(f, iter.FromSlice(s)))

Now say we have an iterator that requires cleanup. For example, it is backed by a streaming RPC that should be closed when no longer needed. In the current proposal, the constructor function for the iterator would return a StopIter and should be used like this:

it := NewIter()
defer it.Stop()
iter.ToSlice(iter.Map(f, it)

But iterators that require cleanup are uncommon, so programmers are likely to write the more concise

iter.ToSlice(iter.Map(f, NewIter()))

Now, that code is still correct, because iterators should clean up after themselves when run to completion, as this one is. But say we had a function

func FirstN[E any](n int, it iter.Iter[E]) iter.Iter[E]

that returned an iterator with only the first n values of its argument iterator. The line

iter.ToSlice(iter.Map(f, FirstN(5, NewIter())))

now does not call Stop when the iterator has more than 5 elements. This is because the iterator is not run to completion, and, as the draft proposal correctly observes, functions that work on iterators should not call Stop themselves. If NewIter returned two values—an Iter and a cleanup function—instead of one, this line wouldn’t compile.

The draft proposal accounts for this case by saying that cleanup should always be an optimization. It claims that failing to call Stop explicitly here is not a big deal, because eventually a finalizer will call it. But one program’s optimization is another’s performance bug. An unfinished RPC, even an idle one, may consume resources on the local machine, the network, and the remote machine (and that machine's backends). Perhaps a garbage collection will run in time to trigger finalizers that will prevent this accumulated waste from bogging down the system. But perhaps the program is tuned to allocate little or no memory, so that the GC will run rarely or not at all. Or maybe worst of all, the GC runs frequently enough, but the sluggishness still appears between GCs, as significant but almost undetectable peanut butter.

It's tempting to hope that a vet check will catch calls like iter.Map(f, NewIter()) where NewIter returns a StopIter. But it isn't easy to make that vet check reliable. It can't just look for code of the form g(..., NewIter(), ...) because it doesn't know if g calls Stop or not. Or what if g puts the StopIter into a field of a struct which it then returns? The problem is essentially the same as determining whether Close is called on a file, and that is hard.

Removing StopIter from the proposal has the additional benefit that we don't need any design guidelines about who should call the Stop method or when. As @Merovius has repeatedly and correctly stated, the right answer is that it is the responsibility of the constructor's caller. Having the constructor return it separately makes that quite clear.

7 replies
@Merovius

It's tempting to hope that a vet check will catch calls like iter.Map(f, NewIter()) where NewIter returns a StopIter. But it isn't easy to make that vet check reliable. It can't just look for code of the form g(..., NewIter(), ...) because it doesn't know if g calls Stop or not. Or what if g puts the StopIter into a field of a struct which it then returns? The problem is essentially the same as determining whether Close is called on a file, and that is hard.

I would argue, though, that this is letting the perfect stand in the way of the good enough. I think even a relatively simple vet check like "Stop must be defered or called in every branch" can play well with the conventions we seek to establish (like functions not accepting/type-asserting on StopIter). And splitting Stop of doesn't really fix the convention-defying hard cases (like, g can accept the extra func() as well). That being said, in practice the split probably makes things clearer and a vet check less necessary.

I think overall I like the practicalities of this idea, but I dislike the aesthetics. To me, it "looks nicer" for Stop to be a method. Can't explain it better. I'd be fine with making this change but I'd also be fine without it.

@carlmjohnson

I love this change. It fits in with existing Go conventions, eg around context cancellation, and makes doing the wrong thing by accident harder.

@carlmjohnson

🚲: for v := range it, stop() { or for v := range it; stop() {? Should this be allowed: for i := range someslice; somefunc(somevalue) {? Seems like yes for simplicity.

All the functions in this design provide and accept iterators. The downside is that an iterator can't be reused.

In Java there is an Iterable interface that provides an Iterator, and in .NET there is an IEnumerable interface that provides an IEnumerator, and collection/stream functions operate on and return -ables, not -ators.

This design for Go mentions a Ranger (? name not explicitly stated) interface with a Range() method that returns an iterator, which would be the analogue.

I'd suggest that the functions FromChan, FromMap, FromSlice, Map, Filter, Reduce, ToSlice, ToMap and Concat should accept/return this "Ranger" interface instead of the Iter interface.

1 reply
@Merovius

This design for Go mentions a Ranger (? name not explicitly stated) interface with a Range() method that returns an iterator, which would be the analogue.

There is a problem with passing that around, though. Namely, that it can either return a stoppable or a non-stoppable iterator. If you accept an interface{ Range() iter.Iter }, the Stop method would not implicitly be called. Also, Go doesn't have covariance, so this would imply that the Range method can only return an iter.Iter, not a concrete type (which can be more useful).

I'd suggest that the functions FromChan, FromMap, FromSlice, Map, Filter, Reduce, ToSlice, ToMap and Concat should accept/return this "Ranger" interface instead of the Iter interface.

FromChan can't reasonably implement Ranger, at the very least.

Personally, after the thread I linked to, I tend to feel that accepting/returning iter.Iter is fine. I think use cases which require to re-start iteration are relatively rare.

A function which does need it, can always accept a func() iter.Iter.

Instead of making an Iter2 was it ever considered instead to add a Pair<E1, E2 any> struct construct? Maybe that would simplify the API surface, especially over time if extension interfaces keep being added. Really you would not need to add a pair type at all and just have the packages that need such a construct declare their own like a KeyValue<K, V any> in a map collections package.

5 replies
@Merovius

Really you would not need to add a pair type at all

If you want the two-variable range loop to work, you somehow need to put it into the spec.

@codyoss

you somehow need to put it into the spec.

That is fair. The two-variable range seems like a nice-to-have imo. But not essential. But if there were a Pair type I suppose you could make it magical.

@Merovius

Making it magical means putting it into the spec, means it must live in a package or be predeclared (as it's not an interface, but a concrete type). And then the question arises if and how we want it to interact with multiple return types. So all of this comes with its own can of worms that we've been traditionally hesitant to open.

But it would certainly be possible.

By the axiom that errors are values, I think Iter2[E,error] as-is is probably reasonable and well-reasoned. Still, on iterators and errors, there have been a lot of musings and suggestions:

  • an Err() method
  • a IterErr[E] type
  • a struct{ E, error } convention
  • a Result[E] type
  • sentinel errors

To add another idea, we could have an iter.Results[E] (emphatically, a plural Results[E]):

type Results[E] struct{ Iter Iter2[E, error]}

This type would not be an iterator, as there is no Next() method defined for it. But would be easily coerced into Iter and Iter2 forms.

The path from iter.Results[E] to iter.Iter2[E,error] is straightforward, and more or less means restoring the Next() method.

	var it iter.Iter2[E,error]
	it = results.Iter

The path from iter.Results[E] to iter.Iter[E] depends on a Try function:

func Try[E any]( it Results[E], policy func(E, error) bool ) Iter[E]

where the policy can be things like Log, Skip, Retry, Panic, Fail, etc., or whatever matches a policy function signature.

	var it Iter[E]
	var policy func(E, error) bool

	policy = iter.Log
	it = iter.Try( results, policy )

This does not cover the gamut of error handling around iterators, in fact none of the really interesting stuff. It is a way to deal with pretty trivial but maybe verbose error handling while still being a little bit structured.

0 replies

After a lot of reading and consideration, I now agree that adding a standardized iterator interface would be good.
From which arises a question. Do we want it to be part of a package, let's say iter.Iter or would it be better to be a built-in? Similarly, the comparable constraint for generics isn't part of a constraints package.

As for the range loops I still strongly oppose a change that would allow language construct to invoke a user-defined method.
The explicitness of the language is one of its defining features. Being able to skim the code and understand what is called when is a huge plus. Changing that so we could save a few keystrokes is a big no for me.

I would agree that the following snippet is the direct opposite of an elegant piece of code:

for e, ok := it.Next(); ok; e, ok = it.Next() {
	// statements
}

But, it could be solved with a change to the interface.
A simple example:

type Iter[T any] interface {
	Next() bool
	Element() T
}

type PairIter[K, T any] interface {
	Iter[T]
	ElementPair() (K, T)
}

We could always add more interfaces or expand them

type StopIter[T any] interface {
	Iter[T]
	Stop()
}

type PairStopIter[K, T any] interface {
	PairIter[K, T]
	Stop()
}

Or maybe we could add an Err() error method to the Iter[T] interface.

Using these interfaces would allow us to iterate over them in an agnostic way

for it.Next() {
    el := it.Element() 
    // or
    key, el := it.ElementPair()

    // do stuff with the data....
}
0 replies

func FromMap[K comparable, V any](map[K]V) Iter2[K, V]

How is this going to get implemented? If it uses NewGen, it would have to return StopIter. Otherwise it would have to use reflection or //go:linkname to call into the runtime, correct?

3 replies
@szabba

Which part of the spec would require that? Isn't this an acceptable implementation?

@magical

I'm guessing Merovius meant FromMap, not Map.

@Merovius

@magical You guessed correctly. Embarrassing typo/copy-paste fail. Edited the comment. Thanks for calling this out @szabba

It occurs to me that we might want to specify if Next() is allowed to return re-used values. This is relevant if the element type contains pointers. For example, an iterator producing []bytes could re-use a buffer for every returned value, saving allocations. But it can only do so, if it can rely on the caller not persisting the returned slice across the next call to Next.

On the other hand, this is kind of similar in nature to the infamous "closure over loop variable" problem, which is similar in nature, in that every loop-iteration re-uses the same variable/storage. So, if we do specify that, we might open ourselves up to similar bugs?

It might also be worth considering how this interplays with optimizations inlining Next calls in a range loop. Depending on the semantics, they might or might not be able to take advantage of a re-used loop variable.

0 replies

Some thoughts:

  • I kinda don't like the iter/Iter names. It reads like "ite-er", but my brain hears "it-er". I can also imagine "iter" being a common variable name, which we don't want package names to conflict with if possible. How about iterators.Iterator (only 2 letters longer than Reader)? Or iterators.I (like testing.T)?

  • Iter and Iter2 can't be implemented by the same type because the method names are shared and the method signatures conflict. This means I can't make my own Map type that imitates the map type, where it can be iterated by keys only, or by keys and elements.

  • Iter doesn't have HasNext() bool, so there's no way to peek if there are more items and then pass the iterator to another function for processing. Prev() would provide a way to call Next and then rewind the iterator one place to simulate HasNext, but Prev isn't included out of the box (and might not ever; it's in the optional future extensions section).

  • We also define a related interface for containers, such as maps, for which elements inherently have two values.

    The map type doesn't have two values per element. It has "keys" and associated "elements." See reflect.Type.Elem/Key, reflect.MapOf, and https://go.dev/ref/spec#Map_types. This is just a special case of iterating the items of a "container," doing another operation with each item to get a second value, and then packaging both values together for a specialized purpose.

    Where does it stop? What about 3 values per item? 4? 5? We shouldn't be adding special methods and interfaces to handle special cases. We have parameterized types for a reason: reusing code, regardless of types. We can handle 2-value, 3-value, 4-value, etc. cases with a single, parameterized type returned by the iteration that contains those values. If constructing and handling 2-value, 3-value, etc. types aren't facilitated by the current language features, then that's a sign that something needs to be added to the language to accommodate that, like tuples.

    Worst case, we can always implement temporary 2/3/4/etc.-value types ourselves, e.g.

    type MyIterVals struct { A string; B int; C map[rune]any; D error }
  • Stop or any other cleanup code shouldn't be part of the standard iterator interfaces. It should be handled by the thing that owns the resource in the first place, e.g.

    var file, err = os.Open(...)
    // ...
    var t = NewThing(file)
    defer file.Close() // or defer t.Cleanup()
    var i = t.Iter()
    for e, ok := i.Next(); ok; e, ok = i.Next() { ... }
    return thing // file cleaned up without iterator

    The iter, stop := x.Iter() idea proposed in another comment effectively standardizes another interface, which I don't think is a good idea (assuming you want it to work with for/range).

  • Use Iter() instead of Range(). Iterators have uses outside of range loops.

  • Overall, I suggest sticking to the Python or Java iterator interfaces, or the Go equivalent at least, which is basically the Go channel "interface:" one single value at a time without a way to express failure in the iteration itself. If we need to model errors per value, then we can stick them in the iteration values alongside their corresponding possible success values. Look at how Haskell does it: [Maybe Int], with the ability to convert to Maybe [Int] with trivial iteration code.

9 replies
@Merovius

I can also imagine "iter" being a common variable name, which we don't want package names to conflict with if possible. How about iterators.Iterator (only 2 letters longer than Reader)? Or iterators.I (like testing.T)?

I don't think iter is a particularly common iterator name (I have almost universally seen/used it). And Iterator might be only 2 letters longer than Reader, but iterators.Iterator is far more longer than io.Reader.

Personally, I think iter is a fairly common abbreviation of "Iterator". I think I like it better than any of the alternatives you present.

What I do dislike about iter.Iter is the stuttering, though. I don't see much of a way around it, unless we are fine with unnatural type names like I, or are willing to make it a predeclared identifier.

Iter and Iter2 can't be implemented by the same type because the method names are shared and the method signatures conflict. This means I can't make my own Map type that imitates the map type, where it can be iterated by keys only, or by keys and elements.

The intent is for Iter2 to allow both forms, with the second value automatically being discarded if the 1-value form is used.

Iter doesn't have HasNext() bool, so there's no way to peek if there are more items and then pass the iterator to another function for processing.

See discussion here.

Where does it stop? What about 3 values per item? 4? 5?

I think the design is pretty clear where it stops: At 2. A bit of discussion on this here. The TLDR is that key/value iteration is common enough to justify being taken into consideration for the design, but more than 2 values are not.

Stop or any other cleanup code shouldn't be part of the standard iterator interfaces. It should be handled by the thing that owns the resource in the first place, e.g. […] The iter, stop := x.Iter() idea proposed in another comment effectively standardizes another interface, which I don't think is a good idea (assuming you want it to work with for/range).

I don't really understand what you are arguing for/against here. The reason to have a standard interface of one way or another for stopping iterators is for it to work with range. You seem to be acknowledging that need, but also say you don't want us to standardize on anything.

FWIW, personally I think having a standard way to do this is useful regardless. Just like io.Closer is useful.

@willfaught

@Merovius:

I think the design is pretty clear where it stops: At 2.

You seem to have missed my point.

The reason to have a standard interface of one way or another for stopping iterators is for it to work with range. You seem to be acknowledging that need, but also say you don't want us to standardize on anything.

Range doesn't need to clean up anything. Neither does custom iteration code. Other code can and should do that, as I demonstrated in my example.

@Merovius

You seem to have missed my point.

Maybe. My understand of your point is that there is a slippery slope, that special casing 2 would lead to special casing any other number and there is no natural "stop". And to be fair, I linked to some discussion on this and I also countered with actual reasoning as to why 2 is the natural stop.

If I missed your point, feel free to expand on it and/or explain why you don't find that reasoning convincing.

Range doesn't need to clean up anything.

Yes, it does. If you do for v := range c, the iterator returned by c.Range() might have to be stopped. The design sets out the clear (and important) convention that the caller of Range must call Stop. In this case, the caller of Range is the range loop itself, so it has to call Stop.

Other code can and should do that, as I demonstrated in my example.

Your example, crucially, does not use range. So it seems inept to show how we can integrate with range without requiring a Stop method.

again, your original comment says "assuming you want it to work with for/range" - and yes, that's exactly why the design includes StopIter or why the alternative suggests to standardize on returning a separate stop function - to make it work with for/range.

I love the suggestion to have an Iterator interface in the stdlib!

I work heavily with iterator patterns because it allows me to apply information hiding in my business use case.
My business logic doesn't have to know about the implementation details of the data provider, just use and consume it.
This is highly beneficial TDD-wise as I can transparently use both in-memory variants and concrete database implementations.

I would like to share my experience working with iterators in Go.

The value of supporting error use-cases in the iterator interface

In other languages, the idiom to raise/throw an exception solves the integration of Error use-cases.
In Go, the idiomatic way to do that is to either return an error value on an action
or provide access to the error value through a method. (e.g.: context.Context.Err())

I use iterators primarily to decouple the implementation details of the data provider from the place they consume it.
For this, the ability to communicate errors during iteration is a must for my use-cases.

Most iterators in the stdlib already lean towards an OOP direction.
This way, the iterator manages the Next and the Err simultaneously,
while leaving space for extending it further for resource management.
(e.g.: sql.Rows, or bufio.Scanner)

type Iter[E any] interface {
	Next() (elem E, more bool)

	// Err return the error cause.
	// if an error occurs du