Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/exp/xiter: new package with iterator adapters #61898

Open
rsc opened this issue Aug 9, 2023 · 193 comments
Open

proposal: x/exp/xiter: new package with iterator adapters #61898

rsc opened this issue Aug 9, 2023 · 193 comments
Labels
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.

The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.

Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.

Edit, 2024-07-17: Updated code to match the final Go 1.23 language change. Corrected various typos.


/*
Package xiter implements basic adapters for composing iterator sequences:

  • [Concat] and [Concat2] concatenate sequences.
  • [Equal], [Equal2], [EqualFunc], and [EqualFunc2] check whether two sequences contain equal values.
  • [Filter] and [Filter2] filter a sequence according to a function f.
  • [Limit] and [Limit2] truncate a sequence after n items.
  • [Map] and [Map2] apply a function f to a sequence.
  • [Merge], [Merge2], [MergeFunc], and [MergeFunc2] merge two ordered sequences.
  • [Reduce] and [Reduce2] combine the values in a sequence.
  • [Zip] and [Zip2] iterate over two sequences in parallel.

*/

package xiter

import (
	"cmp"
	"iter"
)

// Concat returns an iterator over the concatenation of the sequences.
func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) {
		for _, seq := range seqs {
			for e := range seq {
				if !yield(e) {
					return
				}
			}
		}
	}
}

// Concat2 returns an iterator over the concatenation of the sequences.
func Concat2[K, V any](seqs ...iter.Seq2[K, V]) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		for _, seq := range seqs {
			for k, v := range seq {
				if !yield(k, v) {
					return
				}
			}
		}
	}
}

// Equal reports whether the two sequences are equal.
func Equal[V comparable](x, y iter.Seq[V]) bool {
	for z := range Zip(x, y) {
		if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
			return false
		}
	}
	return true
}

// Equal2 reports whether the two sequences are equal.
func Equal2[K, V comparable](x, y iter.Seq2[K, V]) bool {
	for z := range Zip2(x, y) {
		if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
			return false
		}
	}
	return true
}

// EqualFunc reports whether the two sequences are equal according to the function f.
func EqualFunc[V1, V2 any](x iter.Seq[V1], y iter.Seq[V2], f func(V1, V2) bool) bool {
	for z := range Zip(x, y) {
		if z.Ok1 != z.Ok2 || !f(z.V1, z.V2) {
			return false
		}
	}
	return true
}

// EqualFunc2 reports whether the two sequences are equal according to the function f.
func EqualFunc2[K1, V1, K2, V2 any](x iter.Seq2[K1, V1], y iter.Seq2[K2, V2], f func(K1, V1, K2, V2) bool) bool {
	for z := range Zip2(x, y) {
		if z.Ok1 != z.Ok2 || !f(z.K1, z.V1, z.K2, z.V2) {
			return false
		}
	}
	return true
}

// Filter returns an iterator over seq that only includes
// the values v for which f(v) is true.
func Filter[V any](f func(V) bool, seq iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) {
		for v := range seq {
			if f(v) && !yield(v) {
				return
			}
		}
	}
}

// Filter2 returns an iterator over seq that only includes
// the pairs k, v for which f(k, v) is true.
func Filter2[K, V any](f func(K, V) bool, seq iter.Seq2[K, V]) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		for k, v := range seq {
			if f(k, v) && !yield(k, v) {
				return
			}
		}
	}
}

// Limit returns an iterator over seq that stops after n values.
func Limit[V any](seq iter.Seq[V], n int) iter.Seq[V] {
	return func(yield func(V) bool) {
		if n <= 0 {
			return
		}
		for v := range seq {
			if !yield(v) {
				return
			}
			if n--; n <= 0 {
				break
			}
		}
	}
}

// Limit2 returns an iterator over seq that stops after n key-value pairs.
func Limit2[K, V any](seq iter.Seq2[K, V], n int) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		if n <= 0 {
			return
		}
		for k, v := range seq {
			if !yield(k, v) {
				return
			}
			if n--; n <= 0 {
				break
			}
		}
	}
}

// Map returns an iterator over f applied to seq.
func Map[In, Out any](f func(In) Out, seq iter.Seq[In]) iter.Seq[Out] {
	return func(yield func(Out) bool) {
		for in := range seq {
			if !yield(f(in)) {
				return
			}
		}
	}
}

// Map2 returns an iterator over f applied to seq.
func Map2[KIn, VIn, KOut, VOut any](f func(KIn, VIn) (KOut, VOut), seq iter.Seq2[KIn, VIn]) iter.Seq2[KOut, VOut] {
	return func(yield func(KOut, VOut) bool) {
		for k, v := range seq {
			if !yield(f(k, v)) {
				return
			}
		}
	}
}

// Merge merges two sequences of ordered values.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered,
// the output sequence will not be ordered,
// but it will still contain every value from x and y exactly once.
//
// Merge is equivalent to calling MergeFunc with cmp.Compare[V]
// as the ordering function.
func Merge[V cmp.Ordered](x, y iter.Seq[V]) iter.Seq[V] {
	return MergeFunc(x, y, cmp.Compare[V])
}

// MergeFunc merges two sequences of values ordered by the function f.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When equal values appear in both sequences,
// the output contains the values from x before the values from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every value from x and y exactly once.
func MergeFunc[V any](x, y iter.Seq[V], f func(V, V) int) iter.Seq[V] {
	return func(yield func(V) bool) {
		next, stop := iter.Pull(y)
		defer stop()
		v2, ok2 := next()
		for v1 := range x {
			for ok2 && f(v1, v2) > 0 {
				if !yield(v2) {
					return
				}
				v2, ok2 = next()
			}
			if !yield(v1) {
				return
			}
		}
		for ok2 {
			if !yield(v2) {
				return
			}
			v2, ok2 = next()
		}
	}
}

// Merge2 merges two sequences of key-value pairs ordered by their keys.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered by their keys,
// the output sequence will not be ordered by its keys,
// but it will still contain every pair from x and y exactly once.
//
// Merge2 is equivalent to calling MergeFunc2 with cmp.Compare[K]
// as the ordering function.
func Merge2[K cmp.Ordered, V any](x, y iter.Seq2[K, V]) iter.Seq2[K, V] {
	return MergeFunc2(x, y, cmp.Compare[K])
}

// MergeFunc2 merges two sequences of key-value pairs ordered by the function f.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When pairs with equal keys appear in both sequences,
// the output contains the pairs from x before the pairs from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every pair from x and y exactly once.
func MergeFunc2[K, V any](x, y iter.Seq2[K, V], f func(K, K) int) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		next, stop := iter.Pull2(y)
		defer stop()
		k2, v2, ok2 := next()
		for k1, v1 := range x {
			for ok2 && f(k1, k2) > 0 {
				if !yield(k2, v2) {
					return
				}
				k2, v2, ok2 = next()
			}
			if !yield(k1, v1) {
				return
			}
		}
		for ok2 {
			if !yield(k2, v2) {
				return
			}
			k2, v2, ok2 = next()
		}
	}
}

// Reduce combines the values in seq using f.
// For each value v in seq, it updates sum = f(sum, v)
// and then returns the final sum.
// For example, if iterating over seq yields v1, v2, v3,
// Reduce returns f(f(f(sum, v1), v2), v3).
func Reduce[Sum, V any](f func(Sum, V) Sum, sum Sum, seq iter.Seq[V]) Sum {
	for v := range seq {
		sum = f(sum, v)
	}
	return sum
}

// Reduce2 combines the values in seq using f.
// For each pair k, v in seq, it updates sum = f(sum, k, v)
// and then returns the final sum.
// For example, if iterating over seq yields (k1, v1), (k2, v2), (k3, v3)
// Reduce returns f(f(f(sum, k1, v1), k2, v2), k3, v3).
func Reduce2[Sum, K, V any](f func(Sum, K, V) Sum, sum Sum, seq iter.Seq2[K, V]) Sum {
	for k, v := range seq {
		sum = f(sum, k, v)
	}
	return sum
}

// A Zipped is a pair of zipped values, one of which may be missing,
// drawn from two different sequences.
type Zipped[V1, V2 any] struct {
	V1  V1
	Ok1 bool // whether V1 is present (if not, it will be zero)
	V2  V2
	Ok2 bool // whether V2 is present (if not, it will be zero)
}

// Zip returns an iterator that iterates x and y in parallel,
// yielding Zipped values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip is a useful building block for adapters that process
// pairs of sequences. For example, Equal can be defined as:
//
//	func Equal[V comparable](x, y iter.Seq[V]) bool {
//		for z := range Zip(x, y) {
//			if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
//				return false
//			}
//		}
//		return true
//	}
func Zip[V1, V2 any](x iter.Seq[V1], y iter.Seq[V2]) iter.Seq[Zipped[V1, V2]] {
	return func(yield func(z Zipped[V1, V2]) bool) {
		next, stop := iter.Pull(y)
		defer stop()
		v2, ok2 := next()
		for v1 := range x {
			if !yield(Zipped[V1, V2]{v1, true, v2, ok2}) {
				return
			}
			v2, ok2 = next()
		}
		var zv1 V1
		for ok2 {
			if !yield(Zipped[V1, V2]{zv1, false, v2, ok2}) {
				return
			}
			v2, ok2 = next()
		}
	}
}

// A Zipped2 is a pair of zipped key-value pairs,
// one of which may be missing, drawn from two different sequences.
type Zipped2[K1, V1, K2, V2 any] struct {
	K1  K1
	V1  V1
	Ok1 bool // whether K1, V1 are present (if not, they will be zero)
	K2  K2
	V2  V2
	Ok2 bool // whether K2, V2 are present (if not, they will be zero)
}

// Zip2 returns an iterator that iterates x and y in parallel,
// yielding Zipped2 values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped2 values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip2 is a useful building block for adapters that process
// pairs of sequences. For example, Equal2 can be defined as:
//
//	func Equal2[K, V comparable](x, y iter.Seq2[K, V]) bool {
//		for z := range Zip2(x, y) {
//			if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
//				return false
//			}
//		}
//		return true
//	}
func Zip2[K1, V1, K2, V2 any](x iter.Seq2[K1, V1], y iter.Seq2[K2, V2]) iter.Seq[Zipped2[K1, V1, K2, V2]] {
	return func(yield func(z Zipped2[K1, V1, K2, V2]) bool) {
		next, stop := iter.Pull2(y)
		defer stop()
		k2, v2, ok2 := next()
		for k1, v1 := range x {
			if !yield(Zipped2[K1, V1, K2, V2]{k1, v1, true, k2, v2, ok2}) {
				return
			}
			k2, v2, ok2 = next()
		}
		var zk1 K1
		var zv1 V1
		for ok2 {
			if !yield(Zipped2[K1, V1, K2, V2]{zk1, zv1, false, k2, v2, ok2}) {
				return
			}
			k2, v2, ok2 = next()
		}
	}
}
@rsc rsc added the Proposal label Aug 9, 2023
@gopherbot gopherbot added this to the Proposal milestone Aug 9, 2023
@gophun
Copy link

gophun commented Aug 9, 2023

The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable?

@gophun
Copy link

gophun commented Aug 9, 2023

What about an adapter that converts an iter.Seq[V] to an iter.Seq2[int, V] and an adapter that converts an iter.Seq2[K, V] to an iter.Seq[V]?

@zephyrtronium
Copy link
Contributor

Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation.

@earthboundkid
Copy link
Contributor

May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation.

@earthboundkid
Copy link
Contributor

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

@DeedleFake
Copy link

DeedleFake commented Aug 9, 2023

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

I'd actually prefer func Reduce[Sum, V any](seq Seq[V], sum Sum, f func(Sum, V) Sum) Sum.

Edit: I just realized that if Reduce() is being used to build an array, putting sum first puts everything in the same order as Append() and other functions that put the destination first. I'm not sure if that's worth it or not.

@rsc
Copy link
Contributor Author

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@DeedleFake
Copy link

The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare

nonNegative := xiter.Filter(
  xiter.Map(
    bufio.Lines(r),
    parseLine,
  ),
  func(v int) bool { return v >= 0 },
)

vs.

nonNegative := bufio.Lines(r).
  Map(parseLine).
  Filter(func(v int) bool { return v >= 0 })

Go's a little weird because of the need to put the .on the previous line, but other than that one oddity, which I could get used to, the second is better in every way. It reads in the order that actions actually happen, it's less repetitive, etc. The only real way to emulate it currently is something like

lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 })

That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls.

@ianlancetaylor
Copy link
Contributor

What type does bufio.Lines return to make that work in Go? What methods does that type support? What is the type of nonNegative? I mean these as honest questions. Can we write this kind of code in Go today, or would we need new language features?

@hherman1
Copy link

You would probably have to wrap the base iterator like:

stream.New(bufio.Lines).
    Filter(…).
    …

@DeedleFake
Copy link

DeedleFake commented Aug 10, 2023

@ianlancetaylor

Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an iter.Seq[string]. In this case, the idea was that it would internally use a bufio.Scanner to yield lines from an io.Reader or something. My original code had an anonymous func(string) int instead of the vague parseLine but I removed it because it was clogging up the example with irrelevant code and I didn't clarify when I did.

@hherman1

Not necessarily. The transformative and sink functions on iterators could just be defined as methods on iter.Seq.

@hherman1
Copy link

hherman1 commented Aug 10, 2023

But iter.Seq is an interface type no? Are you saying it should be a struct type?

I was wrong, it’s not an interface.

@benhoyt
Copy link
Contributor

benhoyt commented Aug 10, 2023

Why do some functions take the f func as the last parameter, but Filter and Map take it as the first, and Reduce in the middle? Most other functions in the stdlib take funcs as the last parameter, such as sort.Slice, slices.*Func, ServeMux.HandleFunc, and so on. This makes code that uses them with inline function literals more readable:

names := xiter.Map(func (p Person) string {
	return p.Name
}, people) // "people" gets lost

// vs

names := xiter.Map(people, func (p Person) string {
	return p.Name
})

@Merovius
Copy link
Contributor

Merovius commented Aug 10, 2023

@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

@DeedleFake
Copy link

DeedleFake commented Aug 10, 2023

What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as range-over-func exists while I wait.

@gophun
Copy link

gophun commented Aug 10, 2023

Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

@Merovius
Copy link
Contributor

I can wait a bit longer. If a bad implementation goes in, I'll never get a good version.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so.

@DeedleFake
Copy link

DeedleFake commented Aug 10, 2023

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others? I have no problem with functions like slices.Backwards() and other source function proposals. My only problem is the transformative and sink functions.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so.

Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in x/exp that they may not go into the standard library at all, so maybe my point is moot after all. I still think that this is a valid issue with the API design to bring up, but maybe it's a bit off-topic for this particular proposal and should wait until after they're in x/exp and it can be more easily demonstrated how awkward they are to use. I don't like the idea that existing code will be broken when some variant of them does potentially go into the standard library, but it's less of a problem than I was worried about. Never mind. Please ignore my rambling below.

That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a v2 is an option, especially if #61716 is accepted, but that was created out of necessity to deal with problems in an existing package, while this would essentially be purposefully putting problems into a new package. It's not like I'm saying that iterators are unacceptable to me in this state, just that features have been delayed or cut before because of possible changes coming later and that I think that it's prudent to discuss the possibility here. That just happened in the last few weeks in the maps package because of the possibility of the acceptance of #61405. I think the same should be done with the transformative and sink functions for now, or at the very least those functions should be planned to stay in x/exp until some way to clean up the API is decided on, that's all.

One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible.

@gophun
Copy link

gophun commented Aug 10, 2023

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those?

Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.

Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. "

This means that not everyone who has read these proposals in advance believes that this part is a good idea.

@jba
Copy link
Contributor

jba commented Aug 10, 2023

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls.

As an analogy, Go has if. Isn't the intention of if to allow conditional execution? Why then shouldn't Go have the ternary operator ?:? Because it often leads to hard-to-read code.

@rsc
Copy link
Contributor Author

rsc commented Aug 10, 2023

Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly.

@Merovius
Copy link
Contributor

Merovius commented Aug 10, 2023

@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say

I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics.

We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision.

@DeedleFake
Copy link

Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there.

@thediveo
Copy link

Hope that it's not noise: I wondered if naming it the sum parameter might be implying to the less experienced dev that reduce does only summation, so I looked at Javascript's array reduce: that uses accumulator. I don't know if that is much better, I just wanted to point it out. If anything, let's have a good laugh.

@jimmyfrasche
Copy link
Member

Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time. Those can be recovered from the proposed with some postprocessing but I'd hate to have to always do that.

These should be considered along with Limit:

LimitFunc - stop iterating after a predicate matches (often called TakeWhile in other languages)

Skip, SkipFunc - drop the first n items (or until the predicate matches) before yielding (opposite of Limit/LimitFunc, often called drop/dropWhile)

@jba
Copy link
Contributor

jba commented Aug 10, 2023

Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time.

Can you explain the difference? Is it just that zip typically stops at the end of the shorter sequence? That is definitely less useful as a building block, and easy to write given these functions. What are some examples where stopping at the shortest is better?

@jimmyfrasche
Copy link
Member

zip stops after the shorter sequence. zipLongest pads out the missing values of the shorter sequence with a specified value.

The provided ones are more general and can be used to build those but I can't really think of any time I've used zip where I needed to know that. I've always either known the lengths were equal by construction so it didn't matter or couldn't do anything other than drop the excess so it didn't matter. Maybe that's peculiar to me and the situations in which I reach for zip, but they've been defined like that in every language I can think I've used which has to be some kind of indicator that I'm not alone in this.

I'm not arguing for them to be replaced with the less general more common versions: I want those versions here too so I can use them directly without having to write a shim to the standard definition.

@bobg
Copy link

bobg commented Aug 8, 2024

Do you have a concrete use case that would benefit from it being a type parameter?

Not offhand, no, though I'll observe that named types defined in terms of function signatures are rare but not unheard-of; e.g., WalkDirFunc.

(Tangent.)

Still trying to develop some intuition for when a type parameter is needed vs. when it's OK to require a specific type in terms of other type parameters. Example, from the slices package:

func Equal[S ~[]E, E comparable](s1, s2 S) bool

Why is this not:

func Equal[E comparable](s1, s2 []E) bool

?

For that matter: is there some convention that dictates the order of the type parameters? For something like slices.Equal I would naively expect the type parameters to be reversed, so that it can be partially instantiated with a simple slices.Equal[int].

@bobg
Copy link

bobg commented Aug 8, 2024

I'll observe that named types defined in terms of function signatures are rare but not unheard-of; e.g., WalkDirFunc.

I should have added: there is also the example of WalkDir, a function defined in terms of that function type.

@Merovius
Copy link
Contributor

Merovius commented Aug 8, 2024

@bobg Yes, slices.Equal is exactly a case where there is an argument in favor of using type-parameters, because you might use that as a higher-level function with a custom slice type. E.g. type Ints []int; var s []Ints; slices.CompactFunc(s, slices.Equal) (compare this snippet and that snippet). Many of the other cases in the slices package are purely for consistency, though.

WalkDirFunc exists mainly for documentation purposes and it doesn't get in the way, because while it is indeed used in WalkDir, you are unlikely to try to assign WalkDir to a func(fs.FS, string, func(string, fs.DirEntry, error) error) error.

That's why I mentioned two conditions that have to happen at the same time: 1. using a type defined based on that signature and 2. trying to use that function type as an argument in a higher-order function. 2 is where type identity matters, not just assignability.

@DeedleFake
Copy link

DeedleFake commented Aug 25, 2024

Since 1.23 came out, I've been going through a number of my projects and inserting iterators where it makes sense to. I've been able to eliminate quite a few intermediary slice allocations, which is quite nice, and I've also come to realize a few things about the iterator adapters. I have my own iterator adapter package that I built back when the first prototype came out to play around with, and now that I'm using them more in real code I've found that they're significantly less useful than I expected them to be simply because the syntax is too clunky.

For certain small things, they can be useful, such as something like xiter.Filter(times, time.Time.IsDST), but anything more complicated gets very awkward. For example, chaining even a single map and filter together with custom anonymous functions for each results about half the code being type signatures for the functions that are all types simply repeated from elsewhere, as well as just being written in a kind of awkward, inside out order:

newtitles := xiter.Map(
  func(data Data) string { return data.Title },
  xiter.Filter(
    func(data Data) bool { return data.Time.After(threshold) },
    dataSource,
  ),
)

Instead, what I've found is that the far better approach in most situations is to take advantage of the fact that iterators are just functions and write an intermediary one as a closure. That reduces the type signature overhead significantly:

newtitles := func(yield func(string) bool) {
  for data := range dataSource {
    if data.Time.Before(threshold) {
      continue
    }
    if !yield(data.Title) {
      return
    }
  }
}

It's a few extra lines, but it's a lot more flexible and a lot more readable.

I think improving the ergonomics of the adapter functions would require at the very least #21498, and possibly some variant of #49085 or some other way to write the adapter pipeline top-to-bottom instead of inside out.

@bobg
Copy link

bobg commented Aug 25, 2024

@DeedleFake It makes a big difference when you use intermediate variables instead of nesting expressions the way you have. It also improves things when any callback is the last argument to the function that needs it.

I also have a version of the xiter proposal implemented, here, but with callbacks last (plus other useful additions). Using it, your first example could be written as:

var (
  newItems  = seqs.Filter(dataSource, func(data Data) bool { return data.Time.After(threshold) })
  newTitles = seqs.Map(newItems, func(data Data) string { return data.Title }
)

which seems plenty readable to me.

@DeedleFake
Copy link

DeedleFake commented Aug 25, 2024

My implementation actually does have callbacks last, as that's the way that I tend to think of the flow going: Thing you're operating on and then config for the action. It's like a receiver, in a way. But the upside to putting them first is that when nesting them, the function is next to the iterator operation. For example, if you try to nest the calls in your example, it'll wind up being

newtitles := xiter.Map(
  xiter.Filter(
    dataSource,
    func(data Data) bool { return data.Time.After(threshold) } // Filter
  ),
  func(data Data) string { return data.Title } // Map
)

You wind up having to read them in a spiral. By putting them the other way around, you can just read it backwards. It's weird, but it keeps things in the right order.

However, all of that being said, even if you store each step in an intermediate variable, there's still a lot clutter from the types in the function signatures, as well as needing to come up with names for the intermediate variables. For that particular example it's not a big deal, but with a more complicated chain you could have four or five steps to it. Having to name all of those is a lot of cognitive overhead and local namespace pollution that's completely avoided by using the closure approach.

@bobg
Copy link

bobg commented Aug 25, 2024

You wind up having to read them in a spiral.

Haha, that's really well-put.

I see your point about putting the callback first, and about the overhead of naming intermediate steps. But that overhead is mainly on the program author, not the reader, and Go favors the reader at the expense of a little more work for the author. As a reader I'd certainly rather encounter something like this:

var (
  ints             = seqs.Ints(1, 1)
  primes           = seqs.Filter(ints, isPrime)
  enumeratedPrimes = seqs.Enumerate(primes)
  superPrimePairs  = seqs.Filter2(enumeratedPrimes, func(pos, _ int) bool { return isPrime(pos+1) })
  superPrimes      = seqs.Right(superPrimePairs)
)

than whatever the nested version of that would be (which I concede might not be quite so horrifying if we had lightweight anonymous functions, as you pointed out above).

@adonovan
Copy link
Member

adonovan commented Sep 2, 2024

The iter package could use an efficient way to answer questions about the length of a sequence. A Len operator is an obvious solution, but it must consume the entire (possibly infinite!) sequence even though the most common questions are typically "is it empty?" and "does it have more than one element?".

I propose:

package iter

// Empty reports whether the sequence is empty.
func Empty[T any](seq iter.Seq[T]) bool {
   return !LongerThan(0, seq)
}

// LongerThan reports whether the sequence seq has more than n elements.
func LongerThan[T any](seq iter.Seq[T], n int) bool {
    i := 0
    for range seq {
        i++
        if i > n {
           return true
        }
    }
    return false
}

@DeedleFake
Copy link

What I've been doing when someone wants a size is to just pass it alongside:

func Example[T any](values iter.Seq[T], numvalues int)

For emptiness checking, it might make sense to do func Empty[T any](seq iter.Seq[T]) (iter.Seq[T], bool) where the boolean indicates if it was empty or not and the returned iterator yields the one value that it had to get and then continues the underlying iterator.

@Merovius
Copy link
Contributor

Merovius commented Sep 2, 2024

@adonovan An O(n) Len operator seems pretty problematic to me. I believe it would frequently to accidentally quadratic behavior. It also assumes that the iter.Seq is restartable, which is not always the case. And I honestly can't imagine a lot of places where I really need to ask questions about the length of a sequence that wouldn't be better served by ranging over it and counting on the go. So, at the very least, I think a few examples for LongerThan would be helpful.

@josharian

This comment was marked as duplicate.

@jimmyfrasche
Copy link
Member

while playing around with iterators I found the following helpful for constructing higher-order iterators which often needed to prime the pump:

func Head[T any](seq iter.Seq[T]) (head T, ok bool, tail iter.Seq[T])

To get that to work you need to Pull the seq but then undo that to convert it back to a push iter. You'd need the same for replaying the sequence like @DeedleFake and @josharian mentioned. So maybe in addition to Pull maybe there needs to be a:

func Push[T any](next func() (T, bool), stop func()) iter.Seq[T]

@earthboundkid
Copy link
Contributor

I hadn't written Head, but I love the API design and will steal it. I have written and frequently use First(iter.Seq[T]) T, which just returns a zero value for empty sequences.

@DeedleFake
Copy link

DeedleFake commented Sep 2, 2024

@jimmyfrasche

The issue with something like Head() or Empty() is that the coroutine leaks if the user doesn't actually use the returned iterator, unless you've got some other way to implement it. It looks something like the following in my head:

func Empty[T any](seq iter.Seq[T]) (iter.Seq[T], bool) {
  next, stop := iter.Pull(seq)
  first, ok := next()
  if !ok {
    return func(func(T) bool) {}, false
  }

  return func(yield func(T) bool) {
    defer stop() // This won't happen unless this returned iterator is used.

    if !yield(first) {
      return
    }
    for {
      v, ok := next()
      if !ok || !yield(v) {
        return
      }
  }
}

I'm also not sure that a Push() converter is necessary. It's very easy to do manually, as I did above. Pull() is necessary because doing it without coroutines has a lot more overhead, though coroutines themselves have quite a bit, too.

@jimmyfrasche
Copy link
Member

It would leak if unused and the basic implementation is simple. The leaking issue might be ameliorated with a finalizer/cleanup, though that makes the implementation more complex. It might also be possible for the runtime and/or compiler to do some tricks to avoid the coroutine when possible, though that may require properties of the iterator being pull/push'd that may not be easy to know.

@earthboundkid
Copy link
Contributor

I wonder if there isn't some way to thread through the yield functions so you don't need to use iter.Pull, but I will admit I haven't figured it out yet. The simplest solution is just to return the stop function:

func Head[T any](seq iter.Seq[T]) (iter.Seq[T], func(), T, bool) {
  next, stop := iter.Pull(seq)
  first, ok := next()
  if !ok {
    return func(func(T) bool) {}, stop, first, false
  }

  return func(yield func(T) bool) {
    for {
      v, ok := next()
      if !ok || !yield(v) {
        return
      }
  }, stop, first, true
}

@isgj
Copy link

isgj commented Sep 6, 2024

IMHO the adaptors shouldn't return all those values, they should :

  • wrap sequences, meaning their code will run when the seq is being consumed (Limit, Map, ...)
  • or consume the seq and return a result (Equal, Reduce, ...)

Otherwise we should return the sequences also for Equal, Reduce , but maybe (and this is my opinion) it's easier to just iterate over again, or use an if if we want to check the emptiness while looping (much performant than iter.Pull), or maybe the sequences are multi use as mentioned in the iter documentation (they can continue where they left, or start from the beginning again). It depends on the use case.

Still some adaptors I think might be useful are:

// Or All, but in Go All is used to create a sequence over all elements
func Every[V any](iter.Seq[V], func(V) bool) bool

// it covers more cases than `First` or `Head`
// the use as `First`: Find(seq, func(_ V) bool { return true })
func Find[V any](iter.Seq[V], func(V) bool) (V, ok)

// Or Any, but in Go any is an interface
// use as `Empty`: !Some(seq, func(_ V) bool { return true })
func Some[V any](iter.Seq[V], func(V) bool) bool


// the following I have used rarely (or never), but might be useful

func Count[V any](iter.Seq[V]) int     // better than Len I think

// handy if you consume the seq and want to keep track where you are
func Enumerate[V any](iter.Seq[V]) iter.Seq[int, V] 

@gophun
Copy link

gophun commented Sep 6, 2024

IMHO the adaptors shouldn't return all those values, they should :

wrap sequences, meaning their code will run when the seq is being consumed (Limit, Map, ...)

@isgj I don't understand what you mean. Limit and Map don't return values, they return sequences. Can you demonstrate the changes you have in mind using the Map function as a simple example?

@gophun
Copy link

gophun commented Sep 6, 2024

@isgj Sorry, I guess your sentence was a response to @earthboundkid's comment, not to the proposed design.

@jub0bs
Copy link

jub0bs commented Sep 17, 2024

@jimmyfrasche

func Head[T any](seq iter.Seq[T]) (head T, ok bool, tail iter.Seq[T])

I agree that something like this would be useful, but naming it Head even though it also returns the tail isn't great. In other languages like Haskell, such a function is typically named "uncons"; not very Go-like, I'll give you that. Also, wouldn't a bool result coming last be more idiomatic?

Instead, I would suggest the following:

func Head[E any](seq iter.Seq[E]) (E, bool) {
	for e := range seq {
		return e, true
	}
	var zero E
	return zero, false
}

func Tail[E any](seq iter.Seq[E]) (iter.Seq[E], bool) {
	next, stop := iter.Pull(seq)
	if _, ok := next(); !ok {
		return nil, false
	}
	f := func(yield func(E) bool) {
		defer stop()
		for {
			e, ok := next()
			if !ok {
				return
			}
			if !yield(e) {
				return
			}
		}
	}
	return f, true
}

and perhaps also something like

func Uncons[E any](seq iter.Seq[E]) (E, iter.Seq[E], bool)

@jimmyfrasche
Copy link
Member

@jub0bs That makes assumptions about iterators that don't hold in general. They do not need to return the same sequence each time. Consider an iterator that reads from a file and returns a single record. If you used your Head and Tail back to back you'd skip the second record.

@bobg
Copy link

bobg commented Sep 17, 2024

Two problems with your suggestions, @jub0bs: Head unrecoverably discards the rest of the iterator (which isn't resumable), and Tail requires the caller to use the returned iterator, otherwise the defer stop() never gets called and resources can leak from the input iterator.

wouldn't a bool result coming last be more idiomatic?

It's bikeshedding, but my $.02 is that I'd rather have the bool paired with the value whose ok-ness it's annotating; i.e. Uncons(...) (E, bool, iter.Seq[E]).

@DeedleFake
Copy link

DeedleFake commented Sep 17, 2024

The more I think about this, the less I think that an operation that splits the iterator in this way is a good idea. It's very easy to do manually already with iter.Pull(), and every implementation in here, including my own, is error-prone because of the strange side effect of requiring that the returned tail iterator be used in order to ensure that the underlying pull iterator gets stopped. On top of that, if someone uses it repeatedly to get elements one at a time, something that they should definitely use iter.Pull() for instead, it'll wind up stacking pull iterators on top of each other and destroying performance.

And almost every problem that this tries to solve can be done just as easily with an iter.Push() convenience function to convert an iterator back from the return of an iter.Pull() call.

@jub0bs
Copy link

jub0bs commented Sep 17, 2024

@jimmyfrasche Maybe I'm missing something, but you're describing is a single-use iterator, which doesn't seem specific to the functions I suggest. Can you link to a Playground that illustrates your point? Point taken.

@bobg

Head unrecoverably discards the rest of the iterator

I don't perceive this as problematic. For a single-use iterator, restarting the loop would yield the second element and so on; for a multiple-use iterator, the loop would yield the first one again. Point taken.

Tail requires the caller to use the returned iterator, otherwise the defer stop() never gets called and resources can leak from the input iterator.

Very good point, which I had missed. I'll have to think about this a bit more, it seems.

@Merovius
Copy link
Contributor

Merovius commented Sep 17, 2024

My take (playground):

func Cut[E any](s iter.Seq[E]) (head E, tail iter.Seq[E], ok bool) {
	for v := range s {
		head, ok = v, true
		break
	}
	tail = func(yield func(E) bool) {
		if !ok {
			return
		}
		first := true
		for v := range s {
			if first {
				first = false
				continue
			}
			if !yield(v) {
				return
			}
		}
	}
	return head, tail, ok
}

Though, for the record, I'm against including something like this, for reasons already mentioned by others.

@jub0bs
Copy link

jub0bs commented Sep 17, 2024

@DeedleFake @Merovius After thinking about this a bit more, I have to agree: I can't think of a way for functions like Head, Tail, Uncons to work with single-use iterators. In that case, leaving them out is preferable. The desired "logic" (treating the first element, if any, in a special way) is easy enough to implement in the loop of a push-based iterator.

@jimmyfrasche
Copy link
Member

Every case I've had for Head has been to simplify pattern I kept coming across in higher order iterators:

first, once := true, false
for v := range seq {
  if first {
    first = false
   // prime the pump
  } else {
     once = true
    // actual loop code
  }
}
if first && !once {
  // special case for one value seq
}

In terms of this thread, though, I only mentioned Head as another thing that could be implemented with Push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Active
Development

No branches or pull requests