-
Notifications
You must be signed in to change notification settings - Fork 17.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iter: new package for iterators #61897
Comments
I think that's supposed to say |
|
In the "Pulling Values" example, I think |
|
Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying. type PullIter[T any] struct{}
func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)
func Pull[V any](seq Seq[V]) PullIter[V] |
|
Based on: the range in the |
|
I believe // Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
...
next, stop := iter.Pull(it)
...
}should be // Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
...
next, stop := iter.Pull(seq)
...
} |
|
Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough. |
|
Given that
one way to avoid having |
|
I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how It's easy to ignore with It's not easy to not ignore without being awkward. Neither var k Key
var err error
for k, err = range f {
if err != nil {
break
}
proc(k)
}
if err != nil {
handle(err)
}nor for k, err := range f {
if err != nil {
handle(err)
break
}
proc(k)
}look especially readable to me. |
|
@jimmyfrasche |
I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places. Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc. |
I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods. |
|
This proposal has been added to the active column of the proposals project |
|
xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. If I thought an |
|
I think that method name for iterating through sequence shouldn't be |
|
@djordje200179 IMO |
|
Although the proposal is attractive, but to be honest, the Introduce tuple type for What do you think about this potential problem, @rsc. |
This comment was marked as resolved.
This comment was marked as resolved.
|
Re: errors, the proposal says:
In practice, the .Err() pattern has lead to a lot of bugs where .Err() is omitted. The most glaring example is at https://github.com/features/copilot/ (click on write_sql.go and note the lack of .Err()). I think the existing .Err() iterators in the std library should probably just stick around because they already exist, but we want a new pattern moving forward. Re: SQL, see #61637. |
|
Cosmetic suggestion: I like |
|
@Splizard Apologies. You asked whether the idea of using channels for iteration has been considered. That gave me the impression that you are unaware of the prior discussions going back more than a decade, which came to the conclusion that the approach is not viable. |
|
I would never use a channel as an iterator unless I want some kind of concurrency / parallelism. A channel is a concurrency primitive in Go. It would never be my first thought saying, ah channels are perfect for this. No, they are not. They look nice, but that's not their purpose unless I really want to do some parallel processing of some data somewhere. I would definitely use it then. For me, an iterator is something to loop over some container's values conveniently and in an optimized way where it can lazy load some values or stream them. This is my 2cents. |
|
This is not the right issue for discussing adding iterators to Go. This is about adding a package in support of iterators. The right issue for discussing adding iterators to Go was #61405, which has now been closed as accepted. I understand wanting to weigh in on what is a big change to the language, but #61405 literally opened 11 months ago and built on prior discussions, so it's quite late to chime in now. The idea of improving channels to optimize into working as coroutines when separate scheduling isn't needed has been discussed already and even sort of implemented in the internal coro package. |
In addition to what @Merovius said above, there is also a good summary by Russ in #61898 (comment) that runs through problems with some alternatives to Regarding whether or not Separately, regarding just eliminating
and in #61405 (comment):
In any event, in addition to #61405 ("range over func") already being accepted, this |
Yes, this is why I am commenting here, that @Skarlso and that's fine, you would be thinking about using a
No, that was a spec change to add range over int and range over func. All I'm really saying, is that range-over-func seems redundant if Go efficiently supports channels as shown in my examples. It only appears to have been added to support this package (which seeks to clearly define the standard representation for iterators in Go).
The previously discussed ideas you are referring to are valuable but completely different from the optimisations I have been suggesting. I'm starting to believe what I have raised may very well be "new information" @thepudds. @robaho |
It's not, the final comment period on this proposal closed on Feb 15. |
|
Including errors in the type is awkward in my opinion. Even union types are messy. I think that is why it was proposed here #61405 (comment) <#61405 (comment)> on adding an error return function.
Java has exceptions. Go needs to figure out a way to return the error during iterator failure.
The alternative is to expose and use the Iterator instance, like:
itr := service.Iterator(… some params...);
for v := range itr {
}
if err := itr.error(); err!=nil {
…
}
I think most of this has been discussed before. I confess that I haven’t been following it too closely, but I re-read #6104 and I am not sure how errors encountered by the generating function can be handled in the user, or even detected (simply anyway).
edit: I see that it was proposed to use the Seq2 interface to return value & error for iterators that can fail. This seems reasonable to me.
|
With respect, citation needed. On the consumer side we see a simple for/range statement. On the producer side, even for relatively simple containers like binary trees, we see a goroutine that calls a deeply recursive set of interleaved function calls. The compiler would have to be able to prove that the channel is used in a way that is amenable to a transformation. It would have to ensure that the right thing happens when there is a In general, such an optimization would be subject to unfortunate performance cliffs: a small change in the code, or a new minor release of the compiler, might drastically change the performance of a loop, for better or for worse. That is an unintuitive model for programmers where performance matters, and leads to cargo culting and mental complexity that Go strives to avoid where possible. |
|
@ianlancetaylor I appreciate your comment. There are theoretical approaches to this and then there are practical ones, I don't think it is necessary for Considering Go's goals for stable predictable performance, I think a sensible implementation of this optimization for My assumption, would be that the same mechanism being used to handle panics appropriately for range-over-func would be applicable here. I also anticipate any other existing work towards the implementation of range-over-func should be applicable here, as this means re-using |
That is so absolutely not how any of this works. You are fully aware that #61405 was about how we want canonical iteration to work, that the discussion happened under awareness of the need of a supporting package and that you are now trying to revert the decision of how canonical iteration would work. For example, here is a quote from you:
You are arguing in bad faith, trying to derail the process with word games. Please stop that.
It is not, and you've been told so repeatedly. Take the hint. |
|
@thepudds is absolutely correct in #61897 (comment) that we need to prepare more documentation about this feature than we have today. I have a pending CL 591096 to add most of the top text in this issue to the package iter doc comment, and we can add links to blog posts and other introductory docs as they are written. To restate what others have, this issue is about package iter, not the range-over-func feature, and it is not the place to discuss range-over-func. As others have noted, range-over-func was discussed at length here on GitHub twice:
There are differences of opinon, absolutely, but we made a decision, and there does not appear to be new information here that would prompt reconsideration of that decision. That is, the arguments being made were all made before and incorporated into that decision. (See the "Reconsideration" section of John Ousterhout's Open Decision-Making.) I apologize for not engaging in the details here. As much as I enjoy discussing the tradeoffs here (really, I do), that decision is done, and we need to move forward. I have been focused on @gabyhelp and haven't gotten a chance to finish CL 591096, but I will do that now, and when that's submitted, this issue will be closed. |
|
I'd like to comment on one effect of For example, I am currently experimenting with an iterator-focused API for bezier paths and transformations on them. Consider the current API: which leads to usage like or An alternative API I've considered would use a named iterator type and methods, as follows: for the following usage: But a significant downside of this approach is that it requires explicit conversion from I'm not sure, however, if this pattern is a good idea to begin with. It works well for this specific use-case, but would fall apart the moment we'd need generic methods, e.g. for a |
I tend to agree that |
That would not be a compatible change: https://go.dev/play/p/x9MR1A1suez I too am curious as to the pros and cons of Seq being named, not merely an alias. I suppose it means we can add dozens of convenience methods to it later, but I'm not sure whether that's a pro or a con. ;-) |
|
@adonovan Sorry but your example seems to demonstrate the wrong thing? It seems to demonstrate that it's breaking moving from an alias to a named type, but not vice-versa, no? |
|
I've tried to explain why some people are "angry" over this design being too complicated in an article, but I think the short of it is that it feels like it goes against the apparent philosophy of Go that many people believe, coupled with it being a very functional way of doing things rather than imperative. And because of those two reasons, I think that is why people don't like the iterator stuff, even if I completely understand the design choices made. It doesn't "feel" like what Go original was to many people. If it was me, I would just not have allowed custom iterators into Go whatsoever, but I am not on the Go team (nor do I want to be). |
|
@gingerBill If you want to grasp at the value of push vs. pull iterators you should try to rewrite something like this as a pull iterator. I don't see the range-over-func statement as leaning functional, in the imperative-vs-functional paradigm (unlike the issue we are commenting on, which is not about range-over-func and is, in fact, about functional programming). If anything it enables more imperative style programming, if you wanted to call functions you'd call functions, the syntactic sugar is there to let you call break, continue and early-return, which are hardly functional programming constructs. |
|
My goal of asking this issue to be re-opened for comments wasn't to re-start the discussion on the merits of adding range-over-func. It was to be able to discuss the actual design issues as related to this package, like the above mentioned question about aliases. Please respect that, before the issue has to be locked again. |
Ah, I thought you were proposing to land an emergency change from named to alias before the imminent go1.23 release so that the change be reverted later if desired. My example was evidence that that later change would be breaking. But it's a breaking change either way. For example, |
|
The Go 1 guarantee isn't a legally binding contract, and e.g. the text/template/parse has broken it. Could we just caveat the 1.23 release notes with a big asterisk saying we reserve the right to switch to an alias in 1.24? Generics came with an asterisk, although the asterisk was never used. |
|
The interesting direction for backwards compatibility is to think about is Type assertions like Type switches additional could stop compiling if someone tried to do both: These risks are not 0. They do not seem enormous either. I don't know why one would mix We could write a vet check to discourage usage that might not be backwards compatible in 1.23, but I suspect it would not help all that often or beyond 1.24. (The vet check might require an asterisk in the 1.23 spec to be justified too.) A not very ergonomic solution is to ship iter without Seq and Seq2 and to expand the types until type parameterized aliases are available. It solves backwards compatibility, but would probably make this package much harder to learn and more annoying to use. |
|
@timothy-king 1. Note that you don't have to do both in a type-switch for the breakage to happen: func F(x any) {
if x.(func(func(int) bool)) {
panic("b0rk")
}
}
func main() {
F(maps.Keys(map[int]int{}))
}
FWIW I'm not convinced this really need a fix. I don't, ultimately, see more reason to make So I don't really see any real argument to even make |
The advantage of making iter.Seq/2 an alias is pretty clear: if package A defines |
|
@earthboundkid My point is that nothing of what you say seems specific to And I'll note that I did provide a specific reason for why methods on
I'll note that 1. this argument cuts both ways: Making it an alias will prevent us from adding methods to
It is, in my opinion, okay that if you want to define a different type to give it other methods, you need to convert back-and-forth. |
We propose to add a new package
iterthat defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.
See also:
Note regarding push vs pull iterator types: The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop.
I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs: instead, iterators can be defined in push form and preserved in that form by any general combination functions and then only converted to pull form as needed at call sites, once all adapters and other transformations have been applied. This avoids the need for any APIs that make cleanup (stop functions) explicit, other than Pull. Adapters that need to convert to pull form inside an iterator function can defer stop and hide that conversion from callers. See the implementation of the adapters in #TODO for examples.
Note regarding standard library: There a few important reasons to convert the standard library:
Note that os.ReadDir and filepath.Glob do not get iterator treatment, since the sorted results imply they must collect the full slice before returning any elements of the sequence. filepath.SplitList could add an iterator form, but PATH lists are short enough that it doesn’t seem worth adding new API. A few other packages, like bufio, archive/tar, and database/sql might benefit from iterators as well, but they are not used as much, so they seem okay to leave out from the first round of changes.
The iter package would be:
/*Package iter provides basic definitions and operations related to
iterators over sequences.
Iterators
An iterator is a function that passes successive elements of a
sequence to a callback function, conventionally named yield, stopping
either when the sequence is finished or when yield breaks the sequence
by returning false. This package defines [Seq] and [Seq2]
(pronounced like seek - the first syllable of sequence)
as shorthands for iterators that pass 1 or 2 values per sequence element
to yield:
Seq2 represents a sequence of paired values, conventionally key-value,
index-value, or value-error pairs.
Yield returns true when the iterator should continue with the next
element in the sequence, false if it should stop. The iterator returns
true if it finished the sequence, false if it stopped early at yield's
request. The iterator function's result is used when composing
iterators, such as in [Concat]:
Iterator functions are most often called by a range loop, as in:
Naming
Iterator functions and methods are named for the sequence being walked:
The iterator method on a collection type is conventionally named All,
as in the second example, because it iterates a sequence of all the
values in the collection.
When there are multiple possible iteration orders, the method name may
indicate that order:
If an iterator requires additional configuration, the constructor function
can take additional configuration arguments:
Single-Use Iterators
Most iterators provide the ability to walk an entire sequence:
when called, the iterator does any setup necessary to start the
sequence, then calls yield on successive elements of the sequence,
and then cleans up before returning. Calling the iterator again
walks the sequence again.
Some iterators break that convention, providing the ability to walk a
sequence only once. These “single-use iterators” typically report values
from a data stream that cannot be rewound to start over.
Calling the iterator again after stopping early may continue the
stream, but calling it again after the sequence is finished will yield
no values at all, immediately returning true. Doc comments for
functions or methods that return single-use iterators should document
this fact:
Errors
If iteration can fail, it is conventional to iterate value, error pairs:
Pulling Values
Functions and methods that are iterators or accept or return iterators
should use the standard yield-based function signature, to ensure
compatibility with range loops and with other iterator adapters.
The standard iterators can be thought of as “push iterator”, which
push values to the yield function.
Sometimes a range loop is not the most natural way to consume values
of the sequence. In this case, [Pull] converts a standard push iterator
to a “pull iterator”, which can be called to pull one value at a time
from the sequence. [Pull] starts an iterator and returns a pair
of functions next and stop, which return the next value from the iterator
and stop it, respectively.
For example:
Clients must call stop if they do not read the sequence to completion,
so that the iterator function can be allowed to finish. As shown in
the example, the conventional way to ensure this is to use defer.
Other Packages
Many packages in the standard library provide iterator-based APIs. Here are some notable examples.
Mutation
Iterators only provide the values of the sequence, not any direct way
to modify it. If an iterator wishes to provide a mechanism for modifying
a sequence during iteration, the usual method is to define a position type
with the extra operations and then provide an iterator over positions.
For example, a tree implementation might provide:
And then a client could delete boring values from the tree using:
*/package iterNote: If and when generic type aliases are implemented (#46477), we might also want to add
type Yield[V any] = func(V bool)andtype Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument asyield iter.Yield[V].The text was updated successfully, but these errors were encountered: