Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter: new package for iterators #61897

Closed
rsc opened this issue Aug 9, 2023 · 235 comments
Closed

iter: new package for iterators #61897

rsc opened this issue Aug 9, 2023 · 235 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted release-blocker
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

We propose to add a new package iter that defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.

See also:

Note regarding push vs pull iterator types: The vast majority of the time, push iterators are more convenient to implement and to use, because setup and teardown can be done around the yield calls rather than having to implement those as separate operations and then expose them to the caller. Direct use (including with a range loop) of the push iterator requires giving up storing any data in control flow, so individual clients may occasionally want a pull iterator instead. Any such code can trivially call Pull and defer stop.

I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs: instead, iterators can be defined in push form and preserved in that form by any general combination functions and then only converted to pull form as needed at call sites, once all adapters and other transformations have been applied. This avoids the need for any APIs that make cleanup (stop functions) explicit, other than Pull. Adapters that need to convert to pull form inside an iterator function can defer stop and hide that conversion from callers. See the implementation of the adapters in #TODO for examples.

Note regarding standard library: There a few important reasons to convert the standard library:

  • Ship a complete solution. We should not release a package we don’t use in obvious places where it should be used. This applies especially to new interfaces.
  • Put functionality in the right places. Ian’s earlier draft included FromSlice and FromMap, but these are more appropriately slices.All and maps.All.
  • Find problems or rough edges in the iter package itself that we want to find before its release. I have already changed a few definitions from what I started with as a result of working through the standard library changes.
  • Set an example (hopefully a good one) for others to follow.

Note that os.ReadDir and filepath.Glob do not get iterator treatment, since the sorted results imply they must collect the full slice before returning any elements of the sequence. filepath.SplitList could add an iterator form, but PATH lists are short enough that it doesn’t seem worth adding new API. A few other packages, like bufio, archive/tar, and database/sql might benefit from iterators as well, but they are not used as much, so they seem okay to leave out from the first round of changes.


The iter package would be:

/*
Package iter provides basic definitions and operations related to
iterators over sequences.

Iterators

An iterator is a function that passes successive elements of a
sequence to a callback function, conventionally named yield, stopping
either when the sequence is finished or when yield breaks the sequence
by returning false. This package defines [Seq] and [Seq2]
(pronounced like seek - the first syllable of sequence)
as shorthands for iterators that pass 1 or 2 values per sequence element
to yield:

type (
	Seq[V any]     func(yield func(V) bool) bool
	Seq2[K, V any] func(yield func(K, V) bool) bool
)

Seq2 represents a sequence of paired values, conventionally key-value,
index-value, or value-error pairs.

Yield returns true when the iterator should continue with the next
element in the sequence, false if it should stop. The iterator returns
true if it finished the sequence, false if it stopped early at yield's
request. The iterator function's result is used when composing
iterators, such as in [Concat]:

func Concat[V any](seqs ...Seq[V]) Seq[V] {
	return func(yield func(V) bool) bool {
		for _, seq := range seqs {
			if !seq(yield) {
				return false
			}
		}
		return true
	}
}

Iterator functions are most often called by a range loop, as in:

func PrintAll[V any](seq iter.Seq[V]) {
	for _, v := range seq {
		fmt.Println(v)
	}
}

Naming

Iterator functions and methods are named for the sequence being walked:

// All returns an iterator over elements in s.
func (s *Set[V]) All() iter.Seq[V]

The iterator method on a collection type is conventionally named All,
as in the second example, because it iterates a sequence of all the
values in the collection.

When there are multiple possible iteration orders, the method name may
indicate that order:

// All iterates through the list from head to tail.
func (l *List[V]) All() iter.Seq[V]

// Backward iterates backward through the list from tail to head.
func (l *List[V]) Backward() iter.Seq[V]

If an iterator requires additional configuration, the constructor function
can take additional configuration arguments:

// Bytes iterates through the indexes and bytes in the string s.
func Bytes(s string) iter.Seq2[int, byte]

// Split iterates through the (possibly-empty) substrings of s
// separated by sep.
func Split(s, sep string) iter.Seq[string]

Single-Use Iterators

Most iterators provide the ability to walk an entire sequence:
when called, the iterator does any setup necessary to start the
sequence, then calls yield on successive elements of the sequence,
and then cleans up before returning. Calling the iterator again
walks the sequence again.

Some iterators break that convention, providing the ability to walk a
sequence only once. These “single-use iterators” typically report values
from a data stream that cannot be rewound to start over.
Calling the iterator again after stopping early may continue the
stream, but calling it again after the sequence is finished will yield
no values at all, immediately returning true. Doc comments for
functions or methods that return single-use iterators should document
this fact:

// Lines iterates through lines read from r.
// It returns a single-use iterator.
func (r *Reader) Lines() iter.Seq[string]

Errors

If iteration can fail, it is conventional to iterate value, error pairs:

// Lines iterates through the lines of the named file.
// Each line in the sequence is paired with a nil error.
// If an error is encountered, the final element of the
// sequence is an empty string paired with the error.
func Lines(file string) iter.Seq2[string, error]

Pulling Values

Functions and methods that are iterators or accept or return iterators
should use the standard yield-based function signature, to ensure
compatibility with range loops and with other iterator adapters.
The standard iterators can be thought of as “push iterator”, which
push values to the yield function.

Sometimes a range loop is not the most natural way to consume values
of the sequence. In this case, [Pull] converts a standard push iterator
to a “pull iterator”, which can be called to pull one value at a time
from the sequence. [Pull] starts an iterator and returns a pair
of functions next and stop, which return the next value from the iterator
and stop it, respectively.

For example:

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
	return func(yield func(V, V) bool) bool {
		next, stop := iter.Pull(it)
		defer stop()
		v1, ok1 := next()
		v2, ok2 := next()
		for ok1 || ok2 {
			if !yield(v1, v2) {
				return false
			}
		}
		return true
	}
}

Clients must call stop if they do not read the sequence to completion,
so that the iterator function can be allowed to finish. As shown in
the example, the conventional way to ensure this is to use defer.

Other Packages

Many packages in the standard library provide iterator-based APIs. Here are some notable examples.

TODO FILL THIS IN AS OTHER PACKAGES ARE UPDATED

Mutation

Iterators only provide the values of the sequence, not any direct way
to modify it. If an iterator wishes to provide a mechanism for modifying
a sequence during iteration, the usual method is to define a position type
with the extra operations and then provide an iterator over positions.

For example, a tree implementation might provide:

// Positions iterates through positions in the sequence.
func (t *Tree[V]) Positions() iter.Seq[*Pos]

// A Pos represents a position in the sequence.
// It is only valid during the yield call it is passed to.
type Pos[V any] struct { ... }

// Pos returns the value at the cursor.
func (p *Pos[V]) Value() V

// Delete deletes the value at this point in the iteration.
func (p *Pos[V]) Delete()

// Set changes the value v at the cursor.
func (p *Pos[V]) Set(v V)

And then a client could delete boring values from the tree using:

for p := range t.Positions() {
	if boring(p.Value()) {
		p.Delete()
	}
}

*/
package iter

// Seq is an iterator over sequences of individual values.
// See the [iter] package documentation for details.
type Seq[V any] func(yield func(V) bool) bool
// Seq2 is an iterator over pairs of values, conventionally
// key-value or value-error pairs.
// See the [iter] package documentation for details.
type Seq2[K, V any] func(yield func(K, V) bool) bool

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

// Pull starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next value from the sequence. When there is
// such a value v, next returns v, true. When the sequence is over, next
// returns zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())
// Pull2 starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next key-value pair from the sequence. When there is
// such a pair k, v, next returns k, v, true. When the sequence is over, next
// returns zero, zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull2[K, V any](seq Seq[K, V]) (next func() (K, V, bool), stop func())
@szabba
Copy link

szabba commented Aug 9, 2023

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

I think that's supposed to say type Yield[V any] = func(V) bool?

@icholy
Copy link

icholy commented Aug 9, 2023

In the "Pulling Values" example, I think next, stop := iter.Pull(it) should be next, stop := iter.Pull(seq)

@mateusz834
Copy link
Member

mateusz834 commented Aug 9, 2023

Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying.

type PullIter[T any] struct{}

func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)

func Pull[V any](seq Seq[V]) PullIter[V]

@gazerro
Copy link
Contributor

gazerro commented Aug 9, 2023

Based on:

Range expression                                   1st value          2nd value
function, 1 value   f  func(func(V)bool) bool      value    v  V

the range in the Concat and PrintAll functions should have only one value?

@DmitriyMV
Copy link
Contributor

I believe

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
		...
		next, stop := iter.Pull(it)
		...
}

should be

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
		...
		next, stop := iter.Pull(seq)
		...
}

@jimmyfrasche
Copy link
Member

Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough.

@jimmyfrasche
Copy link
Member

Given that

  • every Seq2 can be converted to a Seq by a left or right projection
  • every Seq can be extended to a Seq2 by padding it like Seq[K]Seq2[K, struct{}] or an operation like python's enumerate

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

@jimmyfrasche
Copy link
Member

I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how Seq2[K, error] makes sense except in the case where each K could have an associated error. I get why it seems like it would be appealing but I don't think it's going to be nice in practice:

It's easy to ignore with k := range f.

It's not easy to not ignore without being awkward. Neither

var k Key
var err error
for k, err = range f {
  if err != nil {
    break
  }
  proc(k)
}
if err != nil {
  handle(err)
}

nor

for k, err := range f {
  if err != nil {
    handle(err)
    break
  }
  proc(k)
}

look especially readable to me.

@DmitriyMV
Copy link
Contributor

@jimmyfrasche Map fits quite nicely for this for k, err := ... pattern. Working with buffers too. Parallel execution.

@earthboundkid
Copy link
Contributor

earthboundkid commented Aug 9, 2023

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places.

Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc.

@earthboundkid
Copy link
Contributor

Isn't this kind of API more preferable? …

I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods.

@rsc
Copy link
Contributor Author

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@AndrewHarrisSPU
Copy link

xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. Next() (T, error) (Maybe this is the TLDR? #61405 (comment))

If I thought an iter.Stream type or an iter/stream package would be worth exploring, I'm wondering if we'd have a preference or any guidance about where to explore that - in the current set of proposals, in a separate proposal, or leave that unresolved for now (maybe forever), etc.?

@djordje200179
Copy link

I think that method name for iterating through sequence shouldn't be All. Because my first guess when seeing that is that it is Python-like and C# LINQ-like method for checking if all elements meet the criteria.
Also, its non-consistent to have methods All and Backwards. In that case name Forwards would be more appropriate.

@Merovius
Copy link
Contributor

@djordje200179 IMO for _, v := range it.All() makes it clear, that this is not a predicate, but an iteration. And Forwards does not work as a conventional name, as not all data structures have a dedicated direction - see e.g. map.

@leaxoy
Copy link

leaxoy commented Aug 10, 2023

Although the proposal is attractive, but to be honest, the Seq2[K, V] looks ugly and very similar to the Seq[V], every time when add iter-func, we must consider there are two to version of Seq, it's a bad thing. In the long run, two versions of iterators can quickly bloat the code.

Introduce tuple type for Seq2 maybe another choice, but the final decision rests with the go official team.

What do you think about this potential problem, @rsc.

@kardianos

This comment was marked as resolved.

@earthboundkid
Copy link
Contributor

Re: errors, the proposal says:

If iteration can fail, it is conventional to iterate value, error pairs:

In practice, the .Err() pattern has lead to a lot of bugs where .Err() is omitted. The most glaring example is at https://github.com/features/copilot/ (click on write_sql.go and note the lack of .Err()). I think the existing .Err() iterators in the std library should probably just stick around because they already exist, but we want a new pattern moving forward.

Re: SQL, see #61637.

@bobg
Copy link

bobg commented Aug 10, 2023

Cosmetic suggestion: I like Of better than Seq as the type name. IMO Of is a good name for generic Go container types in general. Qualified with the package name and the element type it reads naturally: iter.Of[string] is an iterator of strings.

@Merovius
Copy link
Contributor

@Splizard Apologies. You asked whether the idea of using channels for iteration has been considered. That gave me the impression that you are unaware of the prior discussions going back more than a decade, which came to the conclusion that the approach is not viable.

@Skarlso
Copy link
Contributor

Skarlso commented Jun 13, 2024

I would never use a channel as an iterator unless I want some kind of concurrency / parallelism. A channel is a concurrency primitive in Go. It would never be my first thought saying, ah channels are perfect for this. No, they are not. They look nice, but that's not their purpose unless I really want to do some parallel processing of some data somewhere. I would definitely use it then.

For me, an iterator is something to loop over some container's values conveniently and in an optimized way where it can lazy load some values or stream them.

This is my 2cents.

@earthboundkid
Copy link
Contributor

This is not the right issue for discussing adding iterators to Go. This is about adding a package in support of iterators. The right issue for discussing adding iterators to Go was #61405, which has now been closed as accepted. I understand wanting to weigh in on what is a big change to the language, but #61405 literally opened 11 months ago and built on prior discussions, so it's quite late to chime in now. The idea of improving channels to optimize into working as coroutines when separate scheduling isn't needed has been discussed already and even sort of implemented in the internal coro package.

@thepudds
Copy link
Contributor

The alternative is to do nothing (either for now, or forever). Seq2 can always be added in the future. "No is temporary, yes is forever" apparently doesn't apply to iterators, and the Go Team hasn't explained why, as far as I've seen.

I don't know of any other language that has a Seq2. I've yet to see an explanation for why Go needs one, but other languages do not. As the Go Team has said in tuple proposals, you can always declare a struct type if you want to pass around multiple values as a unit.

In addition to what @Merovius said above, there is also a good summary by Russ in #61898 (comment) that runs through problems with some alternatives to Seq2, including tuples, variadic type parameters, a generic Pair struct, observations about overuse of std::pair in C++, and so on.

Regarding whether or not Seq2 is a forever solution, that comment also suggests that if for example variadic generics or tuples do happen later, then likely Seq could be updated to take advantage (and at that point, Seq2 would become legacy only).

Separately, regarding just eliminating Seq2 (and/or leaving it up to every package to define their own struct type), I recall members of the core Go team commenting on that as well, including with concerns around ergonomics, how easy it is to swap between different implementations, what types might leak into public APIs, and so on. I don't immediately recall a perfect quote that encompasses all of that, but for example Russ wrote in the #61405 FAQ:

We clearly need to support two values to simulate existing range usages.

and in #61405 (comment):

A few comments about only allowing a single loop variable. This is untenable to me, because it means you can't write something like slices.Backward (#61899) that makes iterating backward over a slice as convenient as iterating forward over a slice.

In any event, in addition to #61405 ("range over func") already being accepted, this iter package proposal was also already debated and accepted, and it is extremely rare to reverse a just accepted proposal, especially in the absence of new information.

@Splizard
Copy link

@earthboundkid

This is about adding a package in support of iterators

Yes, this is why I am commenting here, that iter.Seq should be a receive-only channel as the iter package is still open for discussion.

@Skarlso and that's fine, you would be thinking about using a iter.Seq regardless of whether it ends up being a function type or a channel type.

The right issue for discussing adding iterators to Go was #61405

No, that was a spec change to add range over int and range over func. All I'm really saying, is that range-over-func seems redundant if Go efficiently supports channels as shown in my examples. It only appears to have been added to support this package (which seeks to clearly define the standard representation for iterators in Go).

The idea of improving channels to optimize into working as coroutines when separate scheduling isn't needed has been discussed already and even sort of implemented in the internal coro package.

The previously discussed ideas you are referring to are valuable but completely different from the optimisations I have been suggesting. I'm starting to believe what I have raised may very well be "new information" @thepudds.

@robaho
I'm not sure I understand your comment in relationship to <-chan T versus func(func(T) bool)bool, I've shown how that they are behaviouraly equivalent in the iterator use-case. Everything you've outlined applies to both types no? In either case you will need to signal errors, most likely by including them within T.

@gophun
Copy link

gophun commented Jun 13, 2024

as the iter package is still open for discussion.

It's not, the final comment period on this proposal closed on Feb 15.

@robaho
Copy link

robaho commented Jun 13, 2024 via email

@ianlancetaylor
Copy link
Contributor

@Splizard

Single producer 'read-only' channels that don't escape can be transformed by the compiler into a simple for loop when ranged over.

With respect, citation needed.

On the consumer side we see a simple for/range statement. On the producer side, even for relatively simple containers like binary trees, we see a goroutine that calls a deeply recursive set of interleaved function calls. The compiler would have to be able to prove that the channel is used in a way that is amenable to a transformation. It would have to ensure that the right thing happens when there is a break statement or panic call in the loop--and of course the panic might be hidden from the compiler in an indirect call in the loop body. (I'm not even 100% sure we are handling panic completely correctly in range-over-func.)

In general, such an optimization would be subject to unfortunate performance cliffs: a small change in the code, or a new minor release of the compiler, might drastically change the performance of a loop, for better or for worse. That is an unintuitive model for programmers where performance matters, and leads to cargo culting and mental complexity that Go strives to avoid where possible.

@Splizard
Copy link

Splizard commented Jun 13, 2024

@ianlancetaylor I appreciate your comment. There are theoretical approaches to this and then there are practical ones, I don't think it is necessary for gc to eliminate the channel structure entirely, nor for it to generate an additional completely rewritten function for each iterator, nor for it to attempt to inline the entire goroutine and loop body together.

Considering Go's goals for stable predictable performance, I think a sensible implementation of this optimization for gc would be to include a func(T) bool inside runtime.hchan, which, when it isn't nil, replaces the behavior of sending to the channel. The compiler can set this function to be the loop body in the case where a function F returns a <-chan T such that the sending half of the channel does not escape the inner goroutine that is sending values to it (proven via existing escape analysis techniques). Execution of the inner goroutine would then replace the for loop (without scheduling). If func(T) bool returns false, then the 'goroutine' exits, without running any defers (in an abstract sense, it is still running, it just will never progress).

My assumption, would be that the same mechanism being used to handle panics appropriately for range-over-func would be applicable here. I also anticipate any other existing work towards the implementation of range-over-func should be applicable here, as this means re-using func(T) bool as an optimization-level detail for channels, rather than being reliant on changes to the spec.

@Merovius
Copy link
Contributor

Merovius commented Jun 14, 2024

@Splizard

Yes, this is why I am commenting here, that iter.Seq should be a receive-only channel as the iter package is still open for discussion. […] No, that was a spec change to add range over int and range over func. All I'm really saying, is that range-over-func seems redundant if Go efficiently supports channels as shown in my examples. It only appears to have been added to support this package (which seeks to clearly define the standard representation for iterators in Go).

That is so absolutely not how any of this works. You are fully aware that #61405 was about how we want canonical iteration to work, that the discussion happened under awareness of the need of a supporting package and that you are now trying to revert the decision of how canonical iteration would work. For example, here is a quote from you:

If this is the only reason, this is not a very convincing argument to justify the significant costs of changing the language spec just to add an additional way to represent iterators.

You are arguing in bad faith, trying to derail the process with word games. Please stop that.

I'm starting to believe what I have raised may very well be "new information"

It is not, and you've been told so repeatedly. Take the hint.

@rsc
Copy link
Contributor Author

rsc commented Jun 14, 2024

@thepudds is absolutely correct in #61897 (comment) that we need to prepare more documentation about this feature than we have today. I have a pending CL 591096 to add most of the top text in this issue to the package iter doc comment, and we can add links to blog posts and other introductory docs as they are written.

To restate what others have, this issue is about package iter, not the range-over-func feature, and it is not the place to discuss range-over-func. As others have noted, range-over-func was discussed at length here on GitHub twice:

There are differences of opinon, absolutely, but we made a decision, and there does not appear to be new information here that would prompt reconsideration of that decision. That is, the arguments being made were all made before and incorporated into that decision. (See the "Reconsideration" section of John Ousterhout's Open Decision-Making.)

I apologize for not engaging in the details here. As much as I enjoy discussing the tradeoffs here (really, I do), that decision is done, and we need to move forward.

I have been focused on @gabyhelp and haven't gotten a chance to finish CL 591096, but I will do that now, and when that's submitted, this issue will be closed.

@golang golang locked as too heated and limited conversation to collaborators Jun 14, 2024
@dominikh
Copy link
Member

I'd like to comment on one effect of iter.Seq being a named type instead of a type alias: it discourages developers from defining their own named iterator types, which would be useful to be able to define methods on them.

For example, I am currently experimenting with an iterator-focused API for bezier paths and transformations on them. Consider the current API:

type PathElement { /* some representation of line segments, quadratic and cubic beziers */ }

type Path []PathElement

func (p Path) All() iter.Seq[PathElement] { return slices.Values(p) }

func Translate(seq iter.Seq[PathElement], trans Vec2) iter.Seq[PathElement]
func Scale(seq iter.Seq[PathElement], trans Vec2) iter.Seq[PathElement]
func Flatten(seq iter.Seq[PathElement], accuracy float64) iter.Seq[PathElement]
func ToSVG(seq iter.Seq[PathElement]) string

which leads to usage like

var p Path
println(ToSVG(Flatten(Scale(Translate(p.All(), v), v), 1)))

or

it = Translate(p.All(), v)
it = Scale(it, v)
it = ...

An alternative API I've considered would use a named iterator type and methods, as follows:

type PathElements iter.Seq[PathElement]

func (p Path) All() PathElements { return PathElements(slices.Values(p)) }

func (PathElements) Translate(trans Vec2) PathElements
func (PathElements) Scale(trans Vec2) PathElements
// etc

for the following usage:

println(p.All().Translate(v).Scale(v).Flatten(1).ToSVG())

But a significant downside of this approach is that it requires explicit conversion from PathElements to iter.Seq[PathElements] whenever we want to pass iterators to functions like slices.Collect or from xiter. This wouldn't be the case if iter.Seq were a type alias, as my named type would implicitly convert to the unnamed type func (...)

I'm not sure, however, if this pattern is a good idea to begin with. It works well for this specific use-case, but would fall apart the moment we'd need generic methods, e.g. for a Map method. It also hides the "seq"ness of the returned value behind another name. So maybe discouraging this pattern from the get-go is desired, not unintended?

@rogpeppe
Copy link
Contributor

I'd like to comment on one effect of iter.Seq being a named type instead of a type alias

I tend to agree that iter.Seq should be an alias rather than a named type.
Generic type aliases aren't quite there yet, but is there any reason we can't move to using an alias
when they've landed without breaking compatiblity?

@golang golang unlocked this conversation Jun 17, 2024
@adonovan
Copy link
Member

Generic type aliases aren't quite there yet, but is there any reason we can't move to using an alias
when they've landed without breaking compatibility?

That would not be a compatible change: https://go.dev/play/p/x9MR1A1suez

I too am curious as to the pros and cons of Seq being named, not merely an alias. I suppose it means we can add dozens of convenience methods to it later, but I'm not sure whether that's a pro or a con. ;-)

@Merovius
Copy link
Contributor

@adonovan Sorry but your example seems to demonstrate the wrong thing? It seems to demonstrate that it's breaking moving from an alias to a named type, but not vice-versa, no?

@gingerBill
Copy link

I've tried to explain why some people are "angry" over this design being too complicated in an article, but I think the short of it is that it feels like it goes against the apparent philosophy of Go that many people believe, coupled with it being a very functional way of doing things rather than imperative.

And because of those two reasons, I think that is why people don't like the iterator stuff, even if I completely understand the design choices made. It doesn't "feel" like what Go original was to many people.

If it was me, I would just not have allowed custom iterators into Go whatsoever, but I am not on the Go team (nor do I want to be).

@aarzilli
Copy link
Contributor

@gingerBill If you want to grasp at the value of push vs. pull iterators you should try to rewrite something like this as a pull iterator.
The recursive version is straightforwardly correct (more or less), but when you go and turn it into a state machine suddenly you have to keep your own stack, and each stack entry needs to have its own queue or, if you don't want to overallocate, you have to make a stack of state machines... suddenly correctness becomes a lot less obvious.

I don't see the range-over-func statement as leaning functional, in the imperative-vs-functional paradigm (unlike the issue we are commenting on, which is not about range-over-func and is, in fact, about functional programming). If anything it enables more imperative style programming, if you wanted to call functions you'd call functions, the syntactic sugar is there to let you call break, continue and early-return, which are hardly functional programming constructs.

@Merovius
Copy link
Contributor

Merovius commented Jun 17, 2024

My goal of asking this issue to be re-opened for comments wasn't to re-start the discussion on the merits of adding range-over-func. It was to be able to discuss the actual design issues as related to this package, like the above mentioned question about aliases. Please respect that, before the issue has to be locked again.

@adonovan
Copy link
Member

[@Merovius] Sorry but your example seems to demonstrate the wrong thing? It seems to demonstrate that it's breaking moving from an alias to a named type, but not vice-versa, no?

Ah, I thought you were proposing to land an emergency change from named to alias before the imminent go1.23 release so that the change be reverted later if desired. My example was evidence that that later change would be breaking.

But it's a breaking change either way. For example, switch x.(type) { case func(...): } will no longer match if the dynamic type of x changes from an alias to a named type.

@earthboundkid
Copy link
Contributor

earthboundkid commented Jun 17, 2024

The Go 1 guarantee isn't a legally binding contract, and e.g. the text/template/parse has broken it. Could we just caveat the 1.23 release notes with a big asterisk saying we reserve the right to switch to an alias in 1.24? Generics came with an asterisk, although the asterisk was never used.

@timothy-king
Copy link
Contributor

The interesting direction for backwards compatibility is to think about is iter.Seq[V] going from named to alias.

Type assertions like x.(iter.Seq[V]) would continue to work as intended. What would change though is x.(func(yield func(V) bool) bool). The named version would not match and the alias would match.

Type switches additional could stop compiling if someone tried to do both:

	switch x.(type) {
	case iter.Seq[int]:
		...
	case func(yield func(int) bool) bool:
		...
	}

These risks are not 0. They do not seem enormous either. I don't know why one would mix func(yield func(int) bool) bool and iter.Seq[int] in an interface value and check the runtime type. (I don't want to over index on my lack of imagination though.)

We could write a vet check to discourage usage that might not be backwards compatible in 1.23, but I suspect it would not help all that often or beyond 1.24. (The vet check might require an asterisk in the 1.23 spec to be justified too.)

A not very ergonomic solution is to ship iter without Seq and Seq2 and to expand the types until type parameterized aliases are available. It solves backwards compatibility, but would probably make this package much harder to learn and more annoying to use.

@Merovius
Copy link
Contributor

Merovius commented Jun 18, 2024

@timothy-king 1. Note that you don't have to do both in a type-switch for the breakage to happen:

func F(x any) {
    if x.(func(func(int) bool)) {
        panic("b0rk")
    }
}
func main() {
    F(maps.Keys(map[int]int{}))
}
  1. Note that code currently could contain such a type-assertion so making it illegal would break that code.
  2. The idea of special-case otherwise perfectly normal code to not compile - and including that special case in the spec - just as a preparation for a potential future switch is abhorrent to me.

FWIW I'm not convinced this really need a fix. I don't, ultimately, see more reason to make iter.Seq an alias than, say, bytes.Buffer. On the contrary, I'd argue that there is less reason, because while you can define methods on a function type, those methods can't really do anything except call the function or compare it to nil. So for iter.Seq (and similar types), literally anything a method can do, you can do in a function. Or with a wrapper type in a third-party package even. There is no unexported state that could be hidden.

So I don't really see any real argument to even make iter.Seq an alias. There isn't a strong reason to make it a named type either, true, but aliases should still be the exception, not the rule, lest we forget they where introduced for a specific problem.

@earthboundkid
Copy link
Contributor

FWIW I'm not convinced this really need a fix. I don't, ultimately, see more reason to make iter.Seq an alias than, say, bytes.Buffer. On the contrary, I'd argue that there is less reason, because while you can define methods on a function type, those methods can't really do anything except call the function or compare it to nil. So for iter.Seq (and similar types), literally anything a method can do, you can do in a function. Or with a wrapper type in a third-party package even. There is no unexported state that could be hidden.

bytes.Buffer has methods, so I don't see the analogy here. Passing it as an aliased struct to a series of functions would make using io.Copy impossible, for example.

The advantage of making iter.Seq/2 an alias is pretty clear: if package A defines myseq.WithCoolChainingMethods and package B defines func DoSomething[T any](seq iter.Seq[T]) iter.Seq[T], you can't pass a myseq to DoSomething without writing DoSomething(iter.Seq[foo](seq)). If it were an alias, you could skip that needless wrapping. Is that a good enough reason to have iter.Seq be an alias? I'm not sure, but I think all things equal, it would probably be better if it were, just to leave the door open for other definitions of sequences. In the standard library, there are some bootstrapping issues with not wanting to import iter from one package to another for fear of someday creating an import cycle. Again, it's not a huge issue, but using an alias would make it more clear that iter.Seq[T] just exists to save typing out func(func(T) bool) which is sort of tedious to do.

@Merovius
Copy link
Contributor

Merovius commented Jun 18, 2024

@earthboundkid My point is that nothing of what you say seems specific to iter.Seq. I agree that bytes.Buffer is a poor comparison to make that point. But, for example, fs.WalkDirFunc post-dates aliases and we still didn't make it an alias. Or e.g. these types in the ACME package.

And I'll note that I did provide a specific reason for why methods on func types are less useful, so there is a specific reason why this concern is less important in this case, than in other cases where you'd define a type to have a name to refer to.

I think all things equal, it would probably be better if it were, just to leave the door open for other definitions of sequences.

I'll note that 1. this argument cuts both ways: Making it an alias will prevent us from adding methods to iter.Seq ourselves and 2. all things are not equal: Making it an alias requires delaying this until aliases can be parameterized.

using an alias would make it more clear that iter.Seq[T] just exists to save typing out func(func(T) bool) which is sort of tedious to do.

iter.Seq[T] as a type serves a purpose beyond func(func(T) bool): It signals that you specifically intend this to be something that can be iterated over. Just as if I where to define type Path string, it serves a purpose beyond not typing string, even if I don't add methods. It says that this is its own type, with its own distinct identity and its own intended semantics.

It is, in my opinion, okay that if you want to define a different type to give it other methods, you need to convert back-and-forth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted release-blocker
Projects
Status: Accepted
Status: Done
Development

No branches or pull requests