New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iter: new package for iterators #61897
Comments
I think that's supposed to say |
In the "Pulling Values" example, I think |
Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying. type PullIter[T any] struct{}
func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)
func Pull[V any](seq Seq[V]) PullIter[V] |
Based on:
the range in the |
I believe // Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
...
next, stop := iter.Pull(it)
...
} should be // Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
...
next, stop := iter.Pull(seq)
...
} |
Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough. |
Given that
one way to avoid having |
I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how It's easy to ignore with It's not easy to not ignore without being awkward. Neither var k Key
var err error
for k, err = range f {
if err != nil {
break
}
proc(k)
}
if err != nil {
handle(err)
} nor for k, err := range f {
if err != nil {
handle(err)
break
}
proc(k)
} look especially readable to me. |
@jimmyfrasche |
I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places. Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc. |
I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods. |
This proposal has been added to the active column of the proposals project |
xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. If I thought an |
I think that method name for iterating through sequence shouldn't be |
@djordje200179 IMO |
Although the proposal is attractive, but to be honest, the Introduce tuple type for What do you think about this potential problem, @rsc. |
This comment was marked as resolved.
This comment was marked as resolved.
Re: errors, the proposal says:
In practice, the .Err() pattern has lead to a lot of bugs where .Err() is omitted. The most glaring example is at https://github.com/features/copilot/ (click on write_sql.go and note the lack of .Err()). I think the existing .Err() iterators in the std library should probably just stick around because they already exist, but we want a new pattern moving forward. Re: SQL, see #61637. |
Cosmetic suggestion: I like |
Change https://go.dev/cl/565935 mentions this issue: |
How are I can detect range functions via reflection (below is an example for a 1-value range function). But how would I range over them?
|
You can always use This works for me: package main
import (
"fmt"
"iter"
"reflect"
)
func main() {
Range(Iterate([]int{1, 2, 3}))
}
func Range(v any) {
rv := reflect.ValueOf(v)
yt := rv.Type().In(0)
rf := reflect.MakeFunc(yt, func(in []reflect.Value) []reflect.Value {
fmt.Println(in[0].Interface())
return []reflect.Value{reflect.ValueOf(true)}
})
rv.Call([]reflect.Value{rf})
}
func Iterate[T any](s []T) iter.Seq[T] {
return func(yield func(T) bool) {
for _, e := range s {
if !yield(e) {
return
}
}
}
} |
Thank you, @Merovius ! As a reusable function: func ToSeqAny(v any) (seq iter.Seq[any], ok bool) {
if !IsRangeFunc(v) {
return nil, false
}
rv := reflect.ValueOf(v)
yt := rv.Type().In(0)
return func(yield func(any) bool) {
rf := reflect.MakeFunc(yt, func(in []reflect.Value) []reflect.Value {
return []reflect.Value{reflect.ValueOf(yield(in[0].Interface()))}
})
rv.Call([]reflect.Value{rf})
}, true
}
func IsRangeFunc(v any) bool {
rt := reflect.TypeOf(v)
if rt.Kind() == reflect.Func && rt.NumIn() == 1 && rt.NumOut() == 0 {
yt := rt.In(0)
if yt.Kind() == reflect.Func && yt.NumIn() == 1 && yt.NumOut() == 1 && yt.Out(0).Kind() == reflect.Bool {
return true
}
}
return false
} |
What is the significance of adding a |
Tracing is currently broken when using iter.Pull from the rangefunc experiment partly because the "tracing is off" fast path in traceAcquire was deemed too expensive to check (an atomic load) during the coroutine switch. This change adds trace.enabled, a non-atomic indicator of whether tracing is enabled. It doubles trace.gen, which is the source of truth on whether tracing is enabled. The semantics around trace.enabled are subtle. When tracing is enabled, we need to be careful to make sure that if gen != 0, goroutines enter the tracer on traceAcquire. This is enforced by making sure trace.enabled is published atomically with trace.gen. The STW takes care of synchronization with most Ms, but there's still sysmon and goroutines exiting syscalls. We need to synchronize with those explicitly anyway, which luckily takes care of trace.enabled as well. When tracing is disabled, it's always OK for trace.enabled to be stale, since traceAcquire will always double-check gen before proceeding. For #61897. Change-Id: I47c2a530fb5339c15e419312fbb1e22d782cd453 Reviewed-on: https://go-review.googlesource.com/c/go/+/565935 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
This change resolves a TODO in the coroutine switch implementation (used exclusively by iter.Pull at the moment) to enable tracing. This was blocked on eliminating the atomic load in the tracer's "off" path (completed in the previous CL in this series) and the addition of new tracer events to minimize the overhead of tracing in this circumstance. This change introduces 3 new event types to support coroutine switches: GoCreateBlocked, GoSwitch, and GoSwitchDestroy. GoCreateBlocked needs to be introduced because the goroutine created for the coroutine starts out in a blocked state. There's no way to represent this in the tracer right now, so we need a new event for it. GoSwitch represents the actual coroutine switch, which conceptually consists of a GoUnblock, a GoBlock, and a GoStart event in series (unblocking the next goroutine to run, blocking the current goroutine, and then starting the next goroutine to run). GoSwitchDestroy is closely related to GoSwitch, implementing the same semantics except that GoBlock is replaced with GoDestroy. This is used when exiting the coroutine. The implementation of all this is fairly straightforward, and the trace parser simply translates GoSwitch* into the three constituent events. Because GoSwitch and GoSwitchDestroy imply a GoUnblock and a GoStart, they need to synchronize with other past and future GoStart events to create a correct partial ordering in the trace. Therefore, these events need a sequence number for the goroutine that will be unblocked and started. Also, while implementing this, I noticed that the coroutine implementation is actually buggy with respect to LockOSThread. In fact, it blatantly disregards its invariants without an explicit panic. While such a case is likely to be rare (and inefficient!) we should decide how iter.Pull behaves with respect to runtime.LockOSThread. Lastly, this change also bumps the trace version from Go 1.22 to Go 1.23. We're adding events that are incompatible with a Go 1.22 parser, but Go 1.22 traces are all valid Go 1.23 traces, so the newer parser supports both (and the CL otherwise updates the Go 1.22 definitions of events and such). We may want to reconsider the structure and naming of some of these packages though; it could quickly get confusing. For #61897. Change-Id: I96897a46d5852c02691cde9f957dc6c13ef4d8e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/565937 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
I wonder whether // Elements is a go1.23 iterator over the collection's elements.
func (c Collection[T]) Elements(yield func(T) bool) {
_ = c.All(yield) // discard unwanted boolean
}
// All reports whether f is true for all elements of the collection.
func (Collection[T]) All(f func(T) bool) bool { ... } it seems a shame to force the client to use more logic (and the machinery of a stackless coroutine) to re-materialize the discarded boolean: all := true
for elem := collection.Elements {
if !f(elem) {
all = false
break
}
} |
As ballast for this: this is what |
@adonovan To me, the fact this argument doesn't work for the dual func (c Collection[T]) Elements(yield func(T) bool) {
// well, can't be implemented in terms of Some…
}
// Alternatively: What would Some(yield) be?
// Some reports whether f is true for at least one element in the collection.
func (Collection[T]) Some(f func(T) bool) bool { ... } Meanwhile, when considering these as functions on iterators, they have perfectly dual implementations: func All[E any](s iter.Seq[E], f func(E) bool) bool {
for v := range s {
if !f(E) {
return false
}
}
return true
}
func Some[E any](s iter.Seq[E], f func(E) bool) bool {
for v := range s {
if f(E) {
return true
}
}
return false
} |
Some (usually called Any) isn't essential: using DeMorgan's laws, you can express it as My point is that iterators can be trivially expressed as a wrapper around the slightly more general |
It's not like there are no alternatives. In JavaScript, this function is named "every". We also have an "any" function in the slices package, there it's called "ContainsFunc" (named "some" in JavaScript, by the way). It's evident that there's no consistency across languages anyway. Personally, I advocate for using the good and concise name "All" for the most common operation, which is to get a sequence of all elements. |
@adonovan Yes, but my argument wasn't that you can't express (Also, I'll note the grain of salt that |
@adonovan opened issue #66637 and it reminded that I'm not sure whether the question of giving a specific signature to yield functions by returning a defined boolean type had been addressed. I know the idea came up a few times during the discussion but I can't recall having seen a rationale indicating that using plain Seems to me that returning a defined type (e.g. It would be less likely for a type to implement the iterable interface by mistake for instance. Is this sensible? |
Returning a defined boolean type means that type needs to be defined somewhere. Given that the spec has to define the behavior of Add to this that I don't think a defined boolean type (different from |
Yes it would have to be defined somewhere. On the other hand I see the unassignability as the intended purpose. It makes the semantics explicit. Even for documentation purposes. It's also easy to convert since we have higher order functions/closures. |
The point of mentioning the mutual non-assignability, is that it makes clear that "somewhere" has to mean "the spec". Because if we put Under the premise that we would be willing to put something like that into the spec, yes, the unassignability would become a bug, not a feature. But you should consider my argument as a whole.
That's an incorrect categorization. "Iterator" is characterized by "a function that takes [what you said]". Which is exotic enough that confusion should be rare enough, that "don't do that then" is an adequate response. |
Yes I did intend for the spec to include a There is also a point about legibility/readability, especially for newcomers, which is also reflected in tooling. For instance, the current yield signature looks like every other predicate and the bool return does not necessarily convey that it is for iteration termination. |
That's valid point; this squishiness in the API makes me uneasy too. I can't think of a case where the ability to plop in a typical predicate unaltered as the OTOH, I can't also picture how a thing as deliberate as that could happen by accident, how a bug, caused by misuse of a predicate as |
I think this situation will be improved with the advent of generic type aliases, due to land in 1.23.
So, when #46477 lands and For the record, I don't support adding a special named |
@rogpeppe 👍🏿 that should work. (Should be good enough) |
Is there a full list of functions available? I.e. where would I find |
We propose to add a new package
iter
that defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.
See also:
Note regarding push vs pull iterator types: The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop.
I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs: instead, iterators can be defined in push form and preserved in that form by any general combination functions and then only converted to pull form as needed at call sites, once all adapters and other transformations have been applied. This avoids the need for any APIs that make cleanup (stop functions) explicit, other than Pull. Adapters that need to convert to pull form inside an iterator function can defer stop and hide that conversion from callers. See the implementation of the adapters in #TODO for examples.
Note regarding standard library: There a few important reasons to convert the standard library:
Note that os.ReadDir and filepath.Glob do not get iterator treatment, since the sorted results imply they must collect the full slice before returning any elements of the sequence. filepath.SplitList could add an iterator form, but PATH lists are short enough that it doesn’t seem worth adding new API. A few other packages, like bufio, archive/tar, and database/sql might benefit from iterators as well, but they are not used as much, so they seem okay to leave out from the first round of changes.
The iter package would be:
/*
Package iter provides basic definitions and operations related to
iterators over sequences.
Iterators
An iterator is a function that passes successive elements of a
sequence to a callback function, conventionally named yield, stopping
either when the sequence is finished or when yield breaks the sequence
by returning false. This package defines [Seq] and [Seq2]
(pronounced like seek - the first syllable of sequence)
as shorthands for iterators that pass 1 or 2 values per sequence element
to yield:
Seq2 represents a sequence of paired values, conventionally key-value,
index-value, or value-error pairs.
Yield returns true when the iterator should continue with the next
element in the sequence, false if it should stop. The iterator returns
true if it finished the sequence, false if it stopped early at yield's
request. The iterator function's result is used when composing
iterators, such as in [Concat]:
Iterator functions are most often called by a range loop, as in:
Naming
Iterator functions and methods are named for the sequence being walked:
The iterator method on a collection type is conventionally named All,
as in the second example, because it iterates a sequence of all the
values in the collection.
When there are multiple possible iteration orders, the method name may
indicate that order:
If an iterator requires additional configuration, the constructor function
can take additional configuration arguments:
Single-Use Iterators
Most iterators provide the ability to walk an entire sequence:
when called, the iterator does any setup necessary to start the
sequence, then calls yield on successive elements of the sequence,
and then cleans up before returning. Calling the iterator again
walks the sequence again.
Some iterators break that convention, providing the ability to walk a
sequence only once. These “single-use iterators” typically report values
from a data stream that cannot be rewound to start over.
Calling the iterator again after stopping early may continue the
stream, but calling it again after the sequence is finished will yield
no values at all, immediately returning true. Doc comments for
functions or methods that return single-use iterators should document
this fact:
Errors
If iteration can fail, it is conventional to iterate value, error pairs:
Pulling Values
Functions and methods that are iterators or accept or return iterators
should use the standard yield-based function signature, to ensure
compatibility with range loops and with other iterator adapters.
The standard iterators can be thought of as “push iterator”, which
push values to the yield function.
Sometimes a range loop is not the most natural way to consume values
of the sequence. In this case, [Pull] converts a standard push iterator
to a “pull iterator”, which can be called to pull one value at a time
from the sequence. [Pull] starts an iterator and returns a pair
of functions next and stop, which return the next value from the iterator
and stop it, respectively.
For example:
Clients must call stop if they do not read the sequence to completion,
so that the iterator function can be allowed to finish. As shown in
the example, the conventional way to ensure this is to use defer.
Other Packages
Many packages in the standard library provide iterator-based APIs. Here are some notable examples.
Mutation
Iterators only provide the values of the sequence, not any direct way
to modify it. If an iterator wishes to provide a mechanism for modifying
a sequence during iteration, the usual method is to define a position type
with the extra operations and then provide an iterator over positions.
For example, a tree implementation might provide:
And then a client could delete boring values from the tree using:
*/
package iter
Note: If and when generic type aliases are implemented (#46477), we might also want to add
type Yield[V any] = func(V bool)
andtype Yield2[K, V any] = func(K, V) bool
. That way, code writing a function signature to implement Seq or Seq2 can write the argument asyield iter.Yield[V]
.The text was updated successfully, but these errors were encountered: