discussion: standard iterator interface #54245
Replies: 95 comments 539 replies
-
Why is the special method for use in |
Beta Was this translation helpful? Give feedback.
-
Would it be possible to have an auxilliary method,
If the user wants to use an iterator method in place of the 'natural'
Should this be |
Beta Was this translation helpful? Give feedback.
-
I'm excited by the coroutine optimizations. I've lost track of how many times a goroutine-generator pattern would have been the cleanest way to implement something, but couldn't be used because of the inefficiencies. I support this whole proposal, which I think is necessary and valuable — but the coroutine optimizations would be hugely valuable even if the rest of this proposal didn't happen. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
At the risk of inviting bike-shedding, let me raise a minor concern on a naming choice.
|
Beta Was this translation helpful? Give feedback.
-
I use following pattern to solve this problem in my project, I think it is simpler than interface from proposal , and c++ implementation.
return true in cb means continue give me next one in the loop, return false in cb means break this loop. Maybe we can make this pattern easier to call with golang |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
You should be able to query if an iterator is empty without pulling an element out of it. The Java model is better for that reason. Example. The iterator interface needs another method. |
Beta Was this translation helpful? Give feedback.
-
With respect to integration with the Taking an example from the language's builtins, consider slices: if I write If I wanted to implement my own container type that provided slice-like iteration, I'd need to make the |
Beta Was this translation helpful? Give feedback.
-
I don't like the complexity of adding generators. Their usefulness is suspect - as almost always if a generator makes sense it is because the size is unbounded which usually equates to a more complex process underlying the collection - the block/notify is almost always going to be needed at some level anyway in this case. It may make sense for trivial series generation - but for those cases you can just as easily code it with a state machine. I also don't like the xxxx2 interfaces. I would rather see KeyValue and IndexValue interfaces created to be used as the return types - so Iter[KeyValue[K,V]] or Iter[IndexValue[I,V]] or Iter[Any[E]]. The compiler can use similar inspection code for the range operator. Other than that it is a great step forward for Go. |
Beta Was this translation helpful? Give feedback.
-
One thing that strikes me as unfortunate about this is that there is no way for a function to take advantage of the same mechanism to work on either a full collection or a bare iterator. So, say I have
which would handle This also brings up another thing: The It might be useful to do a type-assertion, as it would be possible for a wrapping iterator to implement |
Beta Was this translation helpful? Give feedback.
-
This is a discussion that is intended to lead to a proposal.
This was written with lots of input from @jba and @rsc.
Background
Most languages provide a standardized way to iterate over values stored in containers using an iterator interface (see the appendix below for a discussion of other languages). Go provides
for
range
for use with maps, slices, strings, arrays, and channels, but it does not provide any general mechanism for user-written containers, and it does not provide an iterator interface.Go does have examples of non-generic iterators:
runtime.CallersFrames
returns aruntime.Frames
that iterates over stack frames;Frames
has aNext
method that returns aFrame
and a bool that reports whether there are more frames.bufio.Scanner
is an iterator through anio.Reader
, where theScan
method advances to the next value. The value is returned by aBytes
method. Errors are collected and returned by anErr
method.database/sql.Rows
iterates through the results of a query, where theNext
method advances to the next value and the value is returned by aScan
method. TheScan
method can return an error.Even this short list reveals that there are no common patterns in use today. This is in part because before generics were introduced, there was no way to write an interface that described an iterator. And of course there may be no simple pattern that will cover all of these use cases..
Today we can write an interface
Iter[E]
for an iterator over a container that has elements of typeE
. The existence of iterators in other languages shows that this is a powerful facility. This proposal is about how to write such an interface in Go.What we want from Go iterators
Go is of course explicit about errors. Iterators over containers can't fail. For the most common uses it doesn't make any more sense to have iterators return an error than it does for a
for
range
statement to return an error. Algorithms that use iterators should often behave differently when using iterators that can fail. Therefore, rather than try to combine non-failing and failing iterators into the same interface, we should instead return explicit errors from iterators that can fail. These errors can be part of the values returned by the iterator, or perhaps they can be returned as additional values.Iterators have two fundamental operations: retrieve the current value, and advance to the next value. For Go we can combine these operations into a single method, as
runtime.Frames
does.In the general case we may want to implement iterators with some additional state that is not trivially garbage collected, such as an open file or a separate goroutine. In C++, for example, this state would be cleared up by a destructor, but of course Go does not have destructors. Therefore, we should have some explicit way to indicate that we no longer need an iterator. This should be optional, as many iterators do not require any special cleanup. We should encourage iterators to use finalizers if necessary to clean up resources, and also to clean up after themselves when reaching the end of an iteration.
In Go the builtin type map permits values to be inserted and removed while iterating over the map, with well-defined behavior. In general for Go we should be flexible, though of course the program should never simply crash. We should let each container type define how it behaves if the container is modified while iterators are active. For example, container modification may cause arbitrary elements to be skipped or returned two or more times during the iteration. In some cases, hopefully rare, container modification may cause uses of existing iterators to panic, or to return values that have been removed from the container.
Proposal
We define a new package
iter
that defines a set of interfaces. The expectation is that containers and other types will provide functions and methods that return values that implement these interfaces. Code that wants to work with arbitrary containers will use the interfaces defined in this package. That will permit people to write functions that work with containers but are agnostic to the actual container type being used, much as interfaces likeio.Reader
permit code to be agnostic as the source of the data stream.iter.Iter
The core interface in the iterators package is
iter.Iter[E]
.We also define a related interface for containers, such as maps, for which elements inherently have two values.
An iterator that can fail will either return a single value that includes an error indication, or it will implement
Iter2[E, error]
. It's not yet clear which of those options is better.As mentioned above, some iterators may have additional state that may be discarded when no more values are expected from an iterator (for example, a goroutine that sends values on a channel). Telling the iterator that no more values are expected is done using an optional interface that an iterator may implement.
The
Stop
method should always be considered to be an optimization. The program should work correctly even ifStop
is never called. If an iterator is read to the end (untilNext
returnsfalse
) callingStop
should be a no-op. If necessary, iterator implementations should use finalizers to clean up cases whereStop
is not called.As a matter of programming style, the code that calls a function to obtain a
StopIter
is responsible for calling theStop
method. A function that accepts anIter
should not use a type assertion to detect and call theStop
method. This is similar to the way that a function that accepts anio.Reader
should not use a type assertion to detect and call theClose
method.iter.New functions
iter.Iter
provides a convenient way for the users of a container to iterate over its contents. We also want to consider the other side of that operation, and provide convenient ways for containers to define iterators.An appendix below discusses how these functions can be implemented efficiently.
Simpler containers may be able to easily capture all required state in a function.
iterators for standard containers
The iter package will define iterators for the builtin container types.
Functions that accept iterators
The iter package could define functions that operate on iterators. We should be conservative here to start. It's not yet clear which of these functions will be useful.
Range loops
The
for
range
syntax will be expanded to support iterators. Note that this is the only language change in this proposal. Everything else is library code and programming conventions.If the argument to
range
implementsIter[E]
orIter2[E1, E2]
, then the loop will iterate through the elements of the iterator. For example, this code:will be equivalent to this code:
Here
_ok
is a hidden variable that is not seen by user code.Note that breaking out of the loop will leave the iterator at that position, such that
Next
will return the next elements that the loop would have seen.Using
range
with anIter2[E1, E2]
will permit using two variables in thefor
statement, as withrange
over a map.Compatibility note: if the type of
it
is a slice, array, pointer-to-array, string, map, or channel type, then theNext
method will be ignored and thefor
range
will operate in the usual way. This is required for backward compatibility with existing code.Because it's inconvenient to write
for v := range c.Range()
, we propose a further extension: we permitrange c
ifc
has a methodRange
that returns a value that implementsIter[E]
orIter2[E]
. If theRange
method implementsStopIter[E]
orStopIter2[E]
then the range loop will ensure thatStop
is called when exiting the loop. (Here whether the result implementsStopIter
is a static type check, not a dynamic type assertion: if theRange
method returns the typeIter[E]
,Stop
will not be called even if the actual type has aStop
method.)For example:
where
c.Range
returns a value that implementsStopIter[E]
, is roughly equivalent to:The compiler will arrange for
_it.Stop
to be called if the loop statements panic, even if some outerdefer
in the function recovers the panic. That is, thedefer
in the roughly equivalent code is run when leaving the loop, not just when leaving the function.Note that if we adopt this change it will be the first case in which a language construct invokes a user-defined method.
That is all
That completes the proposal.
Optional future extensions
We can use optional interfaces to extend the capabilities of iterators.
For example, some iterators permit deleting an element from a container.
We could then implement
Similarly some iterators permit setting a value.
We could then implement
Bi-directional iterators can implement a
Prev
method.This is just a sketch of possible future directions. These ideas are not part of the current proposal. However, we want to deliberately leave open the possibility of defining additional optional interfaces for iterators.
Examples
This is some example code showing how to use and create iterators. If a function in this section is not mentioned above, then it is purely an example, and is not part of the proposal.
Appendix: Iterators in other languages
C++
The C++ Standard Template Library defines a variety of iterator APIs. These are consistently implemented by C++ containers and are also used by other types such as files and streams. This makes it possible to write standard algorithms that work with all C++ containers.
C++ containers provide
begin
andend
methods that return iterators. Thebegin
method returns an iterator that refers to the beginning of the container. Theend
method returns an iterator that refers to the position just past the end of the container. Iterators to the same container may be compared for equality using the==
and!=
operators. Any valid iterator (not the iterator returned byend
) refers to a value in the container. That value is accessible using the unary*
operator (which is the pointer dereference operator, thus iterators act like pointers into the container, and ordinary pointers act like iterators). The unary++
operator advances the iterator to refer to the next element in the container. For any C++ container one can loop over all elements in the container by writingAs of C++11 this pattern is built into the language via the range-based
for
loop.This calls the
begin
andend
methods of the container and loops as shown above.Some C++ iterators have optional additional capabilities. Iterators can be grouped into five types.
*p = v
), but do not permit retrieving it. Example: an iterator that writes values to a file.--
operator to move to the preceding element. Example: an iterator over a doubly linked list.<
and friends, and indexing off an iterator to refer to a value. Example: an iterator over a slice (which C++ calls a vector).C++ algorithms can use function overloading to implement the same algorithm in different ways depending on the characteristics of the iterator. For example,
std::reverse
, which reverses the elements in a container, can be implemented with a bidirectional iterator, but uses a more efficient algorithm when called with a random access iterator.C++ iterators do not provide any form of error handling. Iterators over containers typically can't fail. An iterator associated with a file handles I/O errors by setting the file object into an error state, which can optionally cause an exception to be thrown.
Each C++ container type defines rules for when a modification to the container invalidates existing iterators. For example, inserting an element in a linked list does not invalidate iterators pointing to other elements, but inserting an element in a vector may invalidate them. Using an invalid iterator is undefined behavior, which can cause the program to crash or arbitrarily misbehave.
Java
Java also supports iterators to step through containers. Java iterators are much simpler than C++ iterators. Java defines an interface
Iterator<E>
that has three main methods:hasNext
,next
, andremove
. Callingnext
in a situation wherehasNext
would returnfalse
will throw an exception, and in general Java iterators throw an exception for any error. (By the way, in C++ removing an iterator from a container is generally implemented as anerase
method on the container type that takes an iterator as an argument.)A Java container will have an
iterator
method that returns anIterator
that walks over the elements of the container. This too is described as a Java interface:Iterable<E>
.Java has a iterator using loop syntax like that of C++11 (C++ copied the syntax from Java):
This calls the
iterator
method on the container and then callshasNext
andnext
in a loop.As far as I know Java does not have a standard implementation of output iterators or random access iterators. Specific containers will implement iterators with an additional
set
method that permits changing the value to which the iterator refers.If a Java iterator is used after a container is modified in some way that the iterator can't support, the iterator methods will throw an exception.
Python
A container will implement an
__iter__
method that returns an iterator object. An iterator will implement a__next__
method that returns the next element in the container, and raises aStopIteration
exception when done. Code will normally call these methods via the builtiniter
andnext
functions.The Python
for
loop supports iterators.This calls
iter
andnext
, and handles theStopIteration
exception, as one would expect.Python iterators generally don't permit modifying the container while an iterator is being used, but it's not clear to me precisely how they behave when it happens.
Discussion
For C++ and Python, iterators are a matter of convention: any type that implements the appropriate methods can return an iterator, and an iterator itself must simply implement the appropriate methods and (for C++) operator overloads. For Java, this is less true, as iterators explicitly implement the
Iterator<E>
interface. Thefor
loop in each language just calls the appropriate methods.These conventions are powerful because they permit separating the details of an algorithm from the details of a container. As long as the container implements the iterator interface, an algorithm written in terms of iterators will work.
Iterators do not handle errors in any of these languages. This is in part because errors can be handled by throwing exceptions. But it is also because iterating over a container doesn't fail. Iteration failure is only possible when a non-container, such as a file, is accessed via the iterator interface.
It's worth noting that the C++ use of paired begin and end iterators permit a kind of sub-slicing, at least for containers that support bidirectional or random access iterators.
Appendix: Efficient implementation of
iter.NewGen
The natural way to implement
iter.NewGen
is to use a separate goroutine and a channel. However, we know from experience that that will be inefficient due to scheduling delays. A more efficient way to implementNewGen
will be to use coroutines: let the generator function produce a new value and then do a coroutine switch to the code using the iterator. When that code is ready for the next value, do a coroutine switch back to the generator function. A coroutine switch can be fast: simply change the stack pointer and reload the registers. No need to go through the scheduler.Of course Go doesn't have coroutines, but we can use compiler optimizations to achieve the same effect without any language changes. This approach, and much of the text below, is entirely due to @rsc.
First, we identify programming idioms that provide concurrency without any opportunity for parallelism, such as a send immediately followed by a receive. Second, we adjust the compiler and runtime to recognize the non-parallel idioms and optimize them to simple coroutine switches instead of using the thread-aware goroutine scheduler.
Coroutine idioms
A coroutine switch must start another goroutine and then immediately stop the current goroutine, so that there is no opportunity for parallelism.
There are three common ways to start another goroutine: a go statement creating a new goroutine, a send on a channel where a goroutine is blocked, and a close on a channel where a goroutine is blocked.
There are three common ways to immediately stop the current goroutine: a receive of one or two values (with or without comma-ok) from a channel with no available data and a return from the top of a goroutine stack, exiting the goroutine.
Optimizations
The three common goroutine starts and three common goroutine stops combine for nine possible start-stop pairs. The compiler can recognize each pair and translate each to a call to a fused runtime operation that does both together. For example a send compiles to
chansend1(c, &v)
and a receive compiles tochanrecv1(c, &v)
. A send followed by a receive can compile tochansend1recv1(c1, &v1, c2, &v2)
.The compiler fusing the operations creates the opportunity for the runtime to implement them as coroutine switches. Without the fusing, the runtime cannot tell whether the current goroutine is going to keep running on its own (in which case parallelism is warranted) or is going to stop very soon (in which case parallelism is not warranted). Fusing the operations lets the runtime correctly predict the next thing the goroutine will do.
The runtime implements each fused operation by first checking to see if the operation pair would start a new goroutine and stop the current one. If not, it falls back to running the two different operations sequentially, providing exactly the same semantics as the unfused operations. But if the operation pair does start a new goroutine and stop the current one, then the runtime can implement that as a direct switch to the new goroutine, bypassing the scheduler and any possible confusion about waking new threads (Ms) or trying to run the two goroutines in different threads for a split second.
Note that recognizing these coroutine idioms would have potential uses beyond iterators.
NewGen
Here is an implementation of
iter.NewGen
that takes advantage of this technique.The compiler would need to fuse the commented operation pairs for potential optimization by the runtime: send followed by receive (1, 2), close followed by return (3), go followed by receive (4), send followed by comma-ok receive (5), and close followed by receive (6).
Beta Was this translation helpful? Give feedback.
All reactions