-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/exp/xiter: new package with iterator adapters #61898
Comments
The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable? |
What about an adapter that converts an |
Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation. |
May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation. |
For |
I'd actually prefer Edit: I just realized that if |
This proposal has been added to the active column of the proposals project |
The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare nonNegative := xiter.Filter(
xiter.Map(
bufio.Lines(r),
parseLine,
),
func(v int) bool { return v >= 0 },
) vs. nonNegative := bufio.Lines(r).
Map(parseLine).
Filter(func(v int) bool { return v >= 0 }) Go's a little weird because of the need to put the lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 }) That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls. |
What type does |
You would probably have to wrap the base iterator like:
|
Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an Not necessarily. The transformative and sink functions on iterators could just be defined as methods on |
I was wrong, it’s not an interface. |
Why do some functions take the names := xiter.Map(func (p Person) string {
return p.Name
}, people) // "people" gets lost
// vs
names := xiter.Map(people, func (p Person) string {
return p.Name
}) |
@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"? |
No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as |
Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment. |
I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so. |
Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for
Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible. |
Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.
That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. " This means that not everyone who has read these proposals in advance believes that this part is a good idea. |
Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls. As an analogy, Go has |
Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly. |
@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say
We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision. |
Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there. |
Hope that it's not noise: I wondered if naming it the |
Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time. Those can be recovered from the proposed with some postprocessing but I'd hate to have to always do that. These should be considered along with Limit: LimitFunc - stop iterating after a predicate matches (often called TakeWhile in other languages) Skip, SkipFunc - drop the first n items (or until the predicate matches) before yielding (opposite of Limit/LimitFunc, often called drop/dropWhile) |
Can you explain the difference? Is it just that |
zip stops after the shorter sequence. zipLongest pads out the missing values of the shorter sequence with a specified value. The provided ones are more general and can be used to build those but I can't really think of any time I've used zip where I needed to know that. I've always either known the lengths were equal by construction so it didn't matter or couldn't do anything other than drop the excess so it didn't matter. Maybe that's peculiar to me and the situations in which I reach for zip, but they've been defined like that in every language I can think I've used which has to be some kind of indicator that I'm not alone in this. I'm not arguing for them to be replaced with the less general more common versions: I want those versions here too so I can use them directly without having to write a shim to the standard definition. |
Not offhand, no, though I'll observe that named types defined in terms of function signatures are rare but not unheard-of; e.g., WalkDirFunc. (Tangent.) Still trying to develop some intuition for when a type parameter is needed vs. when it's OK to require a specific type in terms of other type parameters. Example, from the func Equal[S ~[]E, E comparable](s1, s2 S) bool Why is this not: func Equal[E comparable](s1, s2 []E) bool ? For that matter: is there some convention that dictates the order of the type parameters? For something like |
I should have added: there is also the example of WalkDir, a function defined in terms of that function type. |
@bobg Yes,
That's why I mentioned two conditions that have to happen at the same time: 1. using a type defined based on that signature and 2. trying to use that function type as an argument in a higher-order function. 2 is where type identity matters, not just assignability. |
Since 1.23 came out, I've been going through a number of my projects and inserting iterators where it makes sense to. I've been able to eliminate quite a few intermediary slice allocations, which is quite nice, and I've also come to realize a few things about the iterator adapters. I have my own iterator adapter package that I built back when the first prototype came out to play around with, and now that I'm using them more in real code I've found that they're significantly less useful than I expected them to be simply because the syntax is too clunky. For certain small things, they can be useful, such as something like newtitles := xiter.Map(
func(data Data) string { return data.Title },
xiter.Filter(
func(data Data) bool { return data.Time.After(threshold) },
dataSource,
),
) Instead, what I've found is that the far better approach in most situations is to take advantage of the fact that iterators are just functions and write an intermediary one as a closure. That reduces the type signature overhead significantly: newtitles := func(yield func(string) bool) {
for data := range dataSource {
if data.Time.Before(threshold) {
continue
}
if !yield(data.Title) {
return
}
}
} It's a few extra lines, but it's a lot more flexible and a lot more readable. I think improving the ergonomics of the adapter functions would require at the very least #21498, and possibly some variant of #49085 or some other way to write the adapter pipeline top-to-bottom instead of inside out. |
@DeedleFake It makes a big difference when you use intermediate variables instead of nesting expressions the way you have. It also improves things when any callback is the last argument to the function that needs it. I also have a version of the xiter proposal implemented, here, but with callbacks last (plus other useful additions). Using it, your first example could be written as: var (
newItems = seqs.Filter(dataSource, func(data Data) bool { return data.Time.After(threshold) })
newTitles = seqs.Map(newItems, func(data Data) string { return data.Title }
) which seems plenty readable to me. |
My implementation actually does have callbacks last, as that's the way that I tend to think of the flow going: Thing you're operating on and then config for the action. It's like a receiver, in a way. But the upside to putting them first is that when nesting them, the function is next to the iterator operation. For example, if you try to nest the calls in your example, it'll wind up being newtitles := xiter.Map(
xiter.Filter(
dataSource,
func(data Data) bool { return data.Time.After(threshold) } // Filter
),
func(data Data) string { return data.Title } // Map
) You wind up having to read them in a spiral. By putting them the other way around, you can just read it backwards. It's weird, but it keeps things in the right order. However, all of that being said, even if you store each step in an intermediate variable, there's still a lot clutter from the types in the function signatures, as well as needing to come up with names for the intermediate variables. For that particular example it's not a big deal, but with a more complicated chain you could have four or five steps to it. Having to name all of those is a lot of cognitive overhead and local namespace pollution that's completely avoided by using the closure approach. |
Haha, that's really well-put. I see your point about putting the callback first, and about the overhead of naming intermediate steps. But that overhead is mainly on the program author, not the reader, and Go favors the reader at the expense of a little more work for the author. As a reader I'd certainly rather encounter something like this: var (
ints = seqs.Ints(1, 1)
primes = seqs.Filter(ints, isPrime)
enumeratedPrimes = seqs.Enumerate(primes)
superPrimePairs = seqs.Filter2(enumeratedPrimes, func(pos, _ int) bool { return isPrime(pos+1) })
superPrimes = seqs.Right(superPrimePairs)
) than whatever the nested version of that would be (which I concede might not be quite so horrifying if we had lightweight anonymous functions, as you pointed out above). |
The iter package could use an efficient way to answer questions about the length of a sequence. A I propose: package iter
// Empty reports whether the sequence is empty.
func Empty[T any](seq iter.Seq[T]) bool {
return !LongerThan(0, seq)
}
// LongerThan reports whether the sequence seq has more than n elements.
func LongerThan[T any](seq iter.Seq[T], n int) bool {
i := 0
for range seq {
i++
if i > n {
return true
}
}
return false
} |
What I've been doing when someone wants a size is to just pass it alongside: func Example[T any](values iter.Seq[T], numvalues int) For emptiness checking, it might make sense to do |
@adonovan An O(n) |
This comment was marked as duplicate.
This comment was marked as duplicate.
while playing around with iterators I found the following helpful for constructing higher-order iterators which often needed to prime the pump: func Head[T any](seq iter.Seq[T]) (head T, ok bool, tail iter.Seq[T]) To get that to work you need to Pull the seq but then undo that to convert it back to a push iter. You'd need the same for replaying the sequence like @DeedleFake and @josharian mentioned. So maybe in addition to Pull maybe there needs to be a: func Push[T any](next func() (T, bool), stop func()) iter.Seq[T] |
I hadn't written |
The issue with something like func Empty[T any](seq iter.Seq[T]) (iter.Seq[T], bool) {
next, stop := iter.Pull(seq)
first, ok := next()
if !ok {
return func(func(T) bool) {}, false
}
return func(yield func(T) bool) {
defer stop() // This won't happen unless this returned iterator is used.
if !yield(first) {
return
}
for {
v, ok := next()
if !ok || !yield(v) {
return
}
}
} I'm also not sure that a |
It would leak if unused and the basic implementation is simple. The leaking issue might be ameliorated with a finalizer/cleanup, though that makes the implementation more complex. It might also be possible for the runtime and/or compiler to do some tricks to avoid the coroutine when possible, though that may require properties of the iterator being pull/push'd that may not be easy to know. |
I wonder if there isn't some way to thread through the yield functions so you don't need to use func Head[T any](seq iter.Seq[T]) (iter.Seq[T], func(), T, bool) {
next, stop := iter.Pull(seq)
first, ok := next()
if !ok {
return func(func(T) bool) {}, stop, first, false
}
return func(yield func(T) bool) {
for {
v, ok := next()
if !ok || !yield(v) {
return
}
}, stop, first, true
} |
IMHO the adaptors shouldn't return all those values, they should :
Otherwise we should return the sequences also for Still some adaptors I think might be useful are: // Or All, but in Go All is used to create a sequence over all elements
func Every[V any](iter.Seq[V], func(V) bool) bool
// it covers more cases than `First` or `Head`
// the use as `First`: Find(seq, func(_ V) bool { return true })
func Find[V any](iter.Seq[V], func(V) bool) (V, ok)
// Or Any, but in Go any is an interface
// use as `Empty`: !Some(seq, func(_ V) bool { return true })
func Some[V any](iter.Seq[V], func(V) bool) bool
// the following I have used rarely (or never), but might be useful
func Count[V any](iter.Seq[V]) int // better than Len I think
// handy if you consume the seq and want to keep track where you are
func Enumerate[V any](iter.Seq[V]) iter.Seq[int, V] |
@isgj I don't understand what you mean. Limit and Map don't return values, they return sequences. Can you demonstrate the changes you have in mind using the Map function as a simple example? |
@isgj Sorry, I guess your sentence was a response to @earthboundkid's comment, not to the proposed design. |
I agree that something like this would be useful, but naming it Instead, I would suggest the following: func Head[E any](seq iter.Seq[E]) (E, bool) {
for e := range seq {
return e, true
}
var zero E
return zero, false
}
func Tail[E any](seq iter.Seq[E]) (iter.Seq[E], bool) {
next, stop := iter.Pull(seq)
if _, ok := next(); !ok {
return nil, false
}
f := func(yield func(E) bool) {
defer stop()
for {
e, ok := next()
if !ok {
return
}
if !yield(e) {
return
}
}
}
return f, true
} and perhaps also something like func Uncons[E any](seq iter.Seq[E]) (E, iter.Seq[E], bool) |
@jub0bs That makes assumptions about iterators that don't hold in general. They do not need to return the same sequence each time. Consider an iterator that reads from a file and returns a single record. If you used your Head and Tail back to back you'd skip the second record. |
Two problems with your suggestions, @jub0bs:
It's bikeshedding, but my $.02 is that I'd rather have the |
The more I think about this, the less I think that an operation that splits the iterator in this way is a good idea. It's very easy to do manually already with And almost every problem that this tries to solve can be done just as easily with an |
@jimmyfrasche
Very good point, which I had missed. I'll have to think about this a bit more, it seems. |
My take (playground): func Cut[E any](s iter.Seq[E]) (head E, tail iter.Seq[E], ok bool) {
for v := range s {
head, ok = v, true
break
}
tail = func(yield func(E) bool) {
if !ok {
return
}
first := true
for v := range s {
if first {
first = false
continue
}
if !yield(v) {
return
}
}
}
return head, tail, ok
} Though, for the record, I'm against including something like this, for reasons already mentioned by others. |
@DeedleFake @Merovius After thinking about this a bit more, I have to agree: I can't think of a way for functions like |
Every case I've had for Head has been to simplify pattern I kept coming across in higher order iterators: first, once := true, false
for v := range seq {
if first {
first = false
// prime the pump
} else {
once = true
// actual loop code
}
}
if first && !once {
// special case for one value seq
} In terms of this thread, though, I only mentioned Head as another thing that could be implemented with Push. |
We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.
The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.
This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.
Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.
Edit, 2024-07-17: Updated code to match the final Go 1.23 language change. Corrected various typos.
/*
Package xiter implements basic adapters for composing iterator sequences:
*/
The text was updated successfully, but these errors were encountered: