Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: add range over int, range over func #61405

Open
rsc opened this issue Jul 17, 2023 · 324 comments
Open

proposal: spec: add range over int, range over func #61405

rsc opened this issue Jul 17, 2023 · 324 comments
Labels
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jul 17, 2023

Following discussion on #56413, I propose to add two new types that a for-range statement can range over: integers and functions.

In the spec, the table that begins the section would have a few more rows added:

Range expression                                   1st value          2nd value

array or slice      a  [n]E, *[n]E, or []E         index    i  int    a[i]       E
string              s  string type                 index    i  int    see below  rune
map                 m  map[K]V                     key      k  K      m[k]       V
channel             c  chan E, <-chan E            element  e  E
integer             n  integer type                index    i int
function, 0 values  f  func(func()bool) bool
function, 1 value   f  func(func(V)bool) bool      value    v  V
function, 2 values  f  func(func(K, V)bool) bool   key      k  K      v          V

Range over integer

If n is an integer type, then for x := range n { ... } would be completely equivalent to for x := T(0); x < n; x++ { ... }, where T is the type of n (assuming x is not modified in the loop body).

The additional spec text for range over integer is:

For an integer n, the index iteration values 0 through n-1 are produced, with the same type as n. If n <= 0, the loop does not run any iterations.

In the Go code I have examined from the main repo, approximately half of existing 3-clause for loops can be converted to range over integer. Loops that don't care about the index variable become even shorter. For example, the canonical benchmark iteration becomes:

for range b.N {
	do the thing being benchmarked
}

3-clause for loops certainly have their place in Go programs, but something as simple as counting to n should be easier. The combination of range over integer and range over functions should make range the for loop form of choice for almost all iteration. When you see a 3-clause loop, you'll know it is doing something unusual and know to examine it more carefully.

Range over function

If f is a function type of the form func(yield func(T1, T2)bool) bool, then for x, y := range f { ... } is similar to f(func(x T1, y T2) bool { ... }), where the loop body has been moved into the function literal, which is passed to f as yield. The boolean result from yield indicates to f whether to keep iterating. The boolean result from f itself is ignored in this usage but present to allow easier composition of iterators.

I say "similar to" and not "completely equivalent to" because all the control flow statements in the loop body continue to have their original meaning. In particular, break, continue, defer, goto, and return all do exactly what they would do in range over a non-function.

Both T1 and T2 are optional: func(yield func(T1)bool) bool and func(yield func()bool) bool can both be iterated over as well. Just like any other range loops, it is permitted to omit unwanted variables. For example if f has type func(yield func(T1, T2) bool) bool any of these are valid:

for x, y := range f { ... }
for x, _ := range f { ... }
for _, y := range f { ... }
for x := range f { ... }
for range f { ... }

The additional spec text for range over function is:

For a function f, the iteration proceeds by calling f with a synthesized yield function that invokes the body of the loop. The values produced correspond to the arguments in successive calls to yield. As with range over other types, it is permitted to declare fewer iteration variables
than there are iteration values. The return value from the yield function reports whether f should continue iterating. For example, if the loop body executes a break statement, the corresponding call to yield returns false. The return value from f is ignored by range but conventionally returns false when yield has returned false, to allow composing a sequence of such functions. The use of a synthesized yield function does not change the semantics of the loop body. In particular, break, continue, defer, goto, and return statements all behave in a loop body ranging over a function exactly as they do in a loop body ranging over other types.

Being able to range over a function enables writing range over user-specified iteration logic,
which should simplify many programs. It should also help make use of custom collection types nicer.

Not all iteration a program wants to do fits into a single for loop. To convert one of these functions to a "next-based" or "pull-based" iterator, we plan to add a function to the standard library that runs the yield-based ("push-based") iterator in a coroutine, along the lines of the coro.Pull function in my recent blog post. The specific name of those functions will be decided in a future proposal; this proposal is limited to language changes, not library changes.

One of the problems identified in the discussion of #56413 was having too many different possible functions to range over. For that reason, this proposal drops the possibility of funcs that do not return bool as well as funcs of the form func() (T, bool) ("pull-iterators"). Push iterators are the standard form with language support and that packages should provide, and something like coro.Pull will allow conversion to pull iterators in code that needs that form.

Implementation

I have implemented this proposal so that we can explore it and understand it better. To run that prototype implementation:

go install golang.org/dl/gotip@latest
gotip download 510541   # download and build CL 510541              
gotip version  # should say "(w/ rangefunc)"                             
gotip run foo.go       

and then use the Go toolchain you just built. If you already have Go checked out and use the codereview plugin, you can:

git change 510541
cd src
./make.bash

(Or git codereview change if you don't have the standard aliases set up.)

@rsc rsc added the Proposal label Jul 17, 2023
@rsc rsc added this to the Proposal milestone Jul 17, 2023
@rsc
Copy link
Contributor Author

rsc commented Jul 17, 2023

Discussion Summary / FAQ

This comment summarizes the discussion so far and answers frequently asked questions.

Last update: July 23, 2023

Can I try running my own programs?

Absolutely. The easiest way right now is to use gotip, like this:

go install golang.org/dl/gotip@latest
gotip download 510541   # download and build CL 510541
gotip version  # should say "(w/ rangefunc)"
gotip run foo.go

We will look into making range-over-int and range-over-func usable in a special mode on the Go playground as well.

Can you provide more motivation for range over functions?

The most recent motivation is the addition of generics, which we expect will lead to custom containers such as ordered maps, and it would be good for those custom containers to work well with range loops.

Another equally good motivation is to provide a better answer for the many functions in the standard library that collect a sequence of results and return the whole thing as a slice. If the results can be generated one at a time, then a representation that allows iterating over them scales better than returning an entire slice. We do not have a standard signature for functions that represent this iteration. Adding support for functions in range would both define a standard signature and provide a real benefit that would encourage its use.

For example, here are a few functions from the standard library that return slices but probably merit forms that return iterators instead:

  • strings.Split
  • strings.Fields
  • bytes variants of the above
  • regexp.Regexp.FindAll and friends

There are also functions we were reluctant to provide in slices form that probably deserve to be added in iterator form. For example, there should be a strings.Lines(text) that iterates over the lines in a text.

Similarly, iteration over lines in a bufio.Reader or bufio.Scanner is possible, but you have to know the pattern, and the pattern is different for those two and tends to be different for each type. Establishing a standard way to express iteration will help converge the many different approaches that exist today.

For additional motivation for iterators, see #54245. For additional motivation specifically for range over functions, see #56413.

Can you provide more motivation for range over integers?

3-clause for loops have a long and venerable history, and they are very flexible, but the majority of the time, they are used to count from 0 through N-1 with no complications. When you stop and think about that, a 3-clause for loop requires a lot of machinery and ceremony just to count from 0 through N-1.

As @seebs noted, there are many ways to complicate a 3-clause for loop, some of them not terribly obvious. Here are a few:

for i := 0; i < N; i++ { /* use i */ }
for i := 1; i < N; i++ { /* use i */ }
for i := 0; i <= N; i++ { /* use i */ }
for i := 1; i <= N; i++ { /* use i */ }
for i := 0; i < len(s); i++ { /* use i */ s := s[1:] }
for i := 0; i < fn(x); i++ { /* use i */ }
for i := 0; i < len(s); s = s[1:] { /* use i */ }

It helps readability of programs if the trivial counting form is conventionally written for i := range N instead of having to examine the loop header (and body!) carefully to make sure it really is the trivial counting form.

Later, @danderson made the point perfectly:

So far I've seen a bunch of mild opposition to range-over-int here, so I'll offer a voice in support instead: since reading this proposal, my fingers have wanted to type for i := range N in code I've been writing at least a half dozen times, and I've felt a passing sadness every time as I write out for i := 0; i < N; i++.

This is not in code cherrypicked for this kind of loop, they just came up in code I was writing (an ACL evaluator, and a database layer). This to me is an "emotional" confirmation of what the proposal's data suggests, that this loop form is extremely common. And while the cognitive load of the status quo is small, it's non-zero and accumulates as code grows: the expanded form is shift-heavy to type on qwerty, and it takes a beat for eyes to parse it and confirm that it's just a standard int loop that's not doing something weird with the bounds, increment terms, or messing with the counter in the loop body.

I would like to be able to use range N, somewhat for the easier typing but moreso because it's quicker to read: range N tells me very quickly that there are no surprising semantics in the loop I'm about to read.

I believe this also argues for keeping range-over-int separate from range-over-func, even if range N could be expressed as a stdlib function iterator. Keeping very common code concise is (IMO) more important than the conceptual cleanliness of cramming both into a single new syntax.

The non-trivial counting forms that are common enough can also be abstracted away using range-over-func. Combined, range-over-func and range-over-int should make the vast majority of loops in Go use the range form, which should lead to more regular, easier to read code.

Why not use a library function for range over ints?

That would be another approach, but counting from 0 through N-1 is incredibly common. If it required importing a separate package to get the range form, probably most code would keep using the 3-clause form. We want to do better. See also the quoted comment from @danderson in the previous answer.

Why not add a new syntax for more sophisticated integer ranges?

More sophisticated integer ranges are much less common than 0 through N-1. For those, the balance tips more easily toward custom functions that can be imported and called when necessary. For example, a very common 3-clause loop in some code bases is counting from N-1 down through 0, to iterate backward over a slice. We could provide that countdown as a function, or we could specialize to slices, with something like

package slices

func Backward(s []E) func(func(int, E) bool) bool {
	return func(yield func(int, E) bool) bool {
		for i := len(s)-1; i >= 0; i-- {
			if !yield(i, s[i]) {
				return false
			}
		}
		return true
	}
}

The ability to do this customization for these cases makes custom functions much more attractive. But lots of code just needs to count up, and that should be as easy as possible.

Why not use range over an interface instead of range over functions?

Today, range makes the decision of how to iterate based only on the underlying type of the range variable, not its methods. In fact, no syntax in Go ever makes use of specific methods. The closest exception is the predefined error interface, but even that never has its Error method called by any Go syntax. People often ask for operator methods, and we have avoided doing that for the same reason: we don't want to privilege specific methods. It is of course possible to create a language structured around such methods (Python is one example), but Go is not designed that way. Nothing in the Go language invokes specific method names.

A second design reason for the decision is that it's more difficult to construct implementations of an interface than it is to just write a function literal. A function like slices.Backward above would have to define a special backwardIterator object holding no state just to have a special method. Or some package would need to define a func-based implementation like http.HandlerFunc. It's much easier just to use functions directly and not worry about needing a different type for each kind of iterator.

There is also a technical incompatibility in using interfaces. If a custom map type (say) defined the special iteration method, then ranging over it in current Go would do a regular map iteration, while ranging over it in a future Go would invoke the special iteration method. That would be a breaking change. Perhaps a desired change, but still a breaking one. That concern is secondary to the basic design principles though.

There is more discussion about the downsides of range over interfaces in #54245. The switch to range over functions was meant to address that directly.

What is a simple example of how range over function runs?

Sure. Consider the Backwards function in the previous answer. It can be invoked as:

s := []string{"hello", "world"}
for i, x := range slices.Backward(s) {
	fmt.Println(i, x)
}

This program would translate inside the compiler to a program more like:

slices.Backward(s)(func(i int, x string) bool {
	fmt.Println(i, x)
	return true
})

The "return true" at the end of the body is the implicit "continue" at the end of the loop body. An explicit continue would translate to "return true" as well. A break statement would translate to "return false" instead. Other control structures are more complicated but still possible.

How are more complicated loops implemented?

Beyond simple break and continue, other control flow (labeled break, continue, goto out of the loop, return) requires setting a variable that the code outside the loop can consult when the loop breaks. For example a "return" might turn into something like "doReturn = true; return false" where the "return false" is the "break" implementation, and then when the loop finishes, other generated code would do "if(doReturn) return".

The full rewrite is explained at the top of cmd/compile/internal/rangefunc/rewrite.go in the implementation.

What if the iterator function ignores yield returning false?

We'll get there. Keep scrolling down.

Why are yield functions limited to at most two arguments?

There has to be a limit; otherwise people file bugs against the compiler when it rejects ridiculous programs. If we were designing in a vacuum, perhaps we would say it was unlimited but that implementations only had to allow up to 1000, or something like that.

However, we are not designing in a vacuum: go/ast and go/parser exist, and they can only represent and parse up to two range values. We clearly need to support two values to simulate existing range usages. If it were important to support three or more values, we could change those packages, but there does not appear to be a terribly strong reason to support three or more, so the simplest choice is to stop at two and leave those packages unchanged. If we find a strong reason for more in the future, we can revisit that limit.

Another reason to stop at two is to have a more limited number of function signatures for general code to define. Standard library changes are explicitly out of scope for this proposal, but having only three signatures means a package can easily define names for all three, like perhaps:

package iter

type Seq0 func(yield func() bool) bool
type Seq[V any] func(yield func(V) bool) bool
type Seq2[K, V any] func(yield func(K, V) bool) bool

Why are standard library changes out of scope for this proposal?

This proposal is limited to the language change. If we add standard library changes to the scope, it will be that much harder to converge. There will be other proposals for the standard library. Probably they will include some kind of useful iterator definitions, as well as iterator-returning forms of the functions mentioned earlier, like strings.Split and regexp.Regexp.FindAll. For now, though, please limit discussion to the language change.

We will comment on this issue with pointers to library change proposals when they are posted. We are still working through some details in the draft.

What will idiomatic APIs with range functions look like?

We don't know yet, and that's really part of the eventual standard library proposal. However, you could imagine a container like a binary tree implementing an All method that returns an iterator:

func (t *Tree[V]) All() iter.Seq[V]

We would like to establish a name like that, probably All, as the default "all items" iterator. Specific containers might provide others as well. Maybe a list would provide backward iteration too:

func (l *List[V]) All() iter.Seq[V]
func (l *List[V]) Backward() iter.Seq[V]

These examples are meant to show that the library can be written in a way that should make these kinds of functions readable and understandable. Please don't focus on the exact details. Again, that will be the focus of other proposals, which we will link here when they are ready.

Will Go programs using range over functions be readable?

We think they can be. For example using slices.Backward instead of the explicit count-down loop should be easier to understand, especially for developers who don't see count-down loops every day and have to think carefully through the boundary conditions to make sure they are correct.

It is true that the possibility of range over a function means that when you see range x, if you don't know what x is, you don't know exactly what code it will run or how efficient it will be. But slice and map iteration are already fairly different as far as the code that runs and how fast it is, not to mention channels. Ordinary function calls have this problem too - in general we have no idea what the called function will do - and yet we find ways to write readable, understandable code, and even to build an intuition for performance.

The same will certainly happen with range over functions. We will build up useful patterns over time, and people will recognize the most common iterators and know what they do.

What is the bool result from the iterator function for?

(TL;DR: It is a mistake that hasn't been deleted from the proposal yet.)

The bool result from the iterator function indicates whether the iterator finished. If yield returns false, the iterator should stop and return false. Otherwise it should return true. The intent was that the bool result would make iterator composition easier, but it looks like the only operation it helps is concatenating iterators, as in:

func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) bool {
		for _, seq := range seqs {
			if !seq(yield) {
				return false
			}
		}
		return true
	}
}

But this function could just as easily be written:

func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) {
		for _, seq := range seqs {
			for _, v := range seq {
				if !yield(v) {
					return
				}
			}
		}
	}
}

So the bool result does not seen entirely justified.

A second example of composition is iterating a binary tree:

func (t *Tree[V]) All(yield func(V) bool) bool {
	return t == nil ||
		t.left.All(yield) && yield(t.value) && t.right.All(yield)
}

Having the boolean result avoids writing a second helper function, because All is the iterator (as opposed to returning the iterator). However, writing programs using these functions, it quickly becomes confusing that sometimes parens are needed to obtain the iterator (range slices.Backward(x)) and sometimes they are not (range t.All). It seems less confusing if every method used in a range statement returns iterators instead of being iterators. It's also more readable, since these methods can explicitly return iter.Seq[V] in their public API instead of spelling out the iterator function signature. That means t.All should return an iterator, not be an iterator, which requires the helper function anyway, and the helper can use a bool even if the standard signature does not:

func (t *Tree[V]) All() iter.Seq[V] {
	return func(yield func(V) bool) { t.all(yield) }
}

func (t *Tree[V]) all(yield func(V) bool) bool {
	return t == nil ||
		t.left.all(yield) && yield(t.value) && t.right.all(yield)
}

When All was itself an iterator (as opposed to returning an iterator), there was a useful pun on the name All in that it could equivalently be defined as:

// All reports whether f returns true for all values in v.
func (t *Tree[V]) All(f func(V) bool) bool

This is like Python's All. However, if All returns an iterator, the opportunity for the useful pun goes away, and so does whatever small nudge it provided for a bool result.

Since all three reasons for a bool result from the iterator don't look terribly compelling, we should probably remove it. The proposal is not yet updated to reflect that.

What if the iterator function ignores yield returning false?

The current prototype does not notice, because the added check was causing problems with inlining, and we wanted to demonstrate that the performance of these range-over-func loops could be competitive with hand-written loops. For the real implementation, we will make sure the check happens, with a panic on misuse, without noticeably hurting performance.

What if the iterator function saves yield and calls it after returning?

The current prototype does not notice. Again, the real implementation will.

What if the iterator function calls yield on a different goroutine?

This is an open question that is not yet decided.

We have gone back and forth about whether this should be allowed. In general Go tries very hard to avoid exposing any notion of goroutine identity, and it would be strange to define that yield must be called on same goroutine that the function iterator was invoked on. It would be stranger still to notice and panic, since that would provide a way to create functions that only run on a single goroutine. Even so, maybe it is worth doing.

What if the iterator function calls yield on multiple goroutines simultaneously?

This is an open question that is not yet decided.

In general that would cause a race, and running the race detector would catch the problem. However, if the loop body does not write to any shared state and does not execute any special control flow other than continue (no break, no return, no goto), that code probably would not have any races. It is unclear whether we should disallow that kind of use (probably) and whether we can catch it efficiently (perhaps not). If we did decide that yield should only be called on its original goroutine, that would also eliminate the possibility of use on multiple goroutines simultaneously.

Can we write a vet check for misuse of a yield function?

We have not looked into this, but almost certainly yes. With the runtime checks, however, it may not be necessary. The initial version probably will not include this check. But maybe.

What do stack traces look like in the loop body?

The loop body is called from the iterator function, which is called from the function in which the loop body appears. The stack trace will show that reality. This will be important for debugging iterators, aligning with stack traces in debuggers, and so on.

Why are the semantics not exactly like if the iterator function ran in a coroutine or goroutine?

Running the iterator in a separate coroutine or goroutine is more expensive and harder to debug than having everything on one stack. Since we're going to have everything on one stack, that fact will change certain visible details. We just saw the first: stack traces show the calling function and the iterator function interleaved, as well as showing the explicit yield function that does not exist on the page in the program.

It can be helpful to think about running the iterator function in its own coroutine or gorotine as an analogy or mental model, but in some cases the mental model doesn't give the best answer, because it uses two stacks, and the real implementation is defined to use one.

What happens if the loop body defers a call? Or if the iterator function defers a call?

If a range-over-func loop body defers a call, it runs when the outer function containing the loop returns, just as it would for any other kind of range loop. That is, the semantics of defer do not depend on what kind of value is being ranged over. It would be quite confusing if they did. That kind of dependence seems unworkable from a design perspective. Some people have proposed disallowing defer in a range-over-func loop body, but this would be a semantic change based on the kind of value being ranged over and seems similarly unworkable.

The loop body's defer runs exactly when it looks like it would if you didn't know anything special was happening in range-over-func.

If an iterator function defers a call, the call runs when the iterator function returns. The iterator function returns when it runs out of values or is told to stop by the loop body (because the loop body hit a "break" statement which translated to "return false"). This is exactly what you want for most iterator functions. For example an iterator that returns lines from a file can open the file, defer closing the file, and then yield lines.

The iterator function's defer runs exactly when it looks like it would if you didn't know the function was being used in a range loop at all.

This pair of answers can mean the calls run in a different time order than the defer statements executed, and here the goroutine analogy is useful. Think of the main function running in one goroutine and the iterator running in another, sending values over a channel. In that case, the defers can run in a different order than they were created because the iterator returns before the outer function does, even if the outer function loop body defers a call after the iterator does.

What happens if the loop body panics? Or if the iterator function panics?

The deferred calls run in the same order for panic that they would in an ordinary return: first the calls deferred by the iterator, then the calls deferred by the loop body and attached to the outer function. It would be very surprising if ordinary returns and panics ran deferred calls in different orders.

Again there is an analogy to having the iterator in its own goroutine. If before the loop started the main function deferred a cleanup of the iterator, then a panic in the loop body would run the deferred cleanup call, which would switch over to the iterator, run its deferred calls, and then switch back to continue the panic on the main goroutine. That's the same order the deferred calls run in an ordinary iterator, even without the extra goroutine.

See this comment for a more detailed rationale for these defer and panic semantics.

What happens if the iterator function recovers a panic in the loop body?

This is an open question that is not yet decided.

If an iterator recovers a panic from the loop body, the current prototype allows it to invoke yield again and have the loop keep executing. This is a difference from the goroutine analogy. Perhaps it is a mistake, but if so it is a difficult one to correct efficiently. We continue to look into this.

Can range over a function perform as well as hand-written loops?

Yes. Consider the slices.Backward example again, which first translates to:

slices.Backward(s)(func(i int, x string) bool {
	fmt.Println(i, x)
	return true
})

The compiler can recognize that slices.Backward is trivial and inline it, producing:

func(yield func(int, E) bool) bool {
	for i := len(s)-1; i >= 0; i-- {
		if !yield(i, s[i]) {
			return false
		}
	}
	return true
}(func(i int, x string) bool {
	fmt.Println(i, x)
	return true
})

Then it can recognize a function literal being immediately called and inline that:

{
	yield := func(i int, x string) bool {
		fmt.Println(i, x)
		return true
	}
	for i := len(s)-1; i >= 0; i-- {
		if !yield(i, s[i]) {
			goto End
		}
	}
End:
}

Then it can devirtualize yield:

{
	for i := len(s)-1; i >= 0; i-- {
		if !(func(i int, x string) bool {
			fmt.Println(i, x)
			return true
		})(i, s[i]) {
			goto End
		}
	}
End:
}

Then it can inline that func literal:

{
	for i := len(s)-1; i >= 0; i-- {
		var ret bool
		{
			i := i
			x := s[i]
			fmt.Println(i, x)
			ret = true
		}
		if !ret {
			goto End
		}
	}
End:
}

From that point the SSA backend can see through all the unnecessary variables and treats that code the same as

for i := len(s)-1; i >= 0; i-- {
	fmt.Println(i, s[i])
}

This looks like a fair amount of work, but it only runs for simple bodies and simple iterators, below the inlining threshold, so the work involved is small. For more complex bodies or iterators, the overhead of the function calls is insignificant.

@gopherbot
Copy link

Change https://go.dev/cl/510540 mentions this issue: runtime: add support for range-over-func

@gopherbot
Copy link

Change https://go.dev/cl/510541 mentions this issue: cmd/compile: implement range over func

@gopherbot
Copy link

Change https://go.dev/cl/510539 mentions this issue: cmd/compile, runtime: make room for rangefunc defers

@gopherbot
Copy link

Change https://go.dev/cl/510538 mentions this issue: cmd/compile: implement range over integer

@gopherbot
Copy link

Change https://go.dev/cl/510536 mentions this issue: cmd/compile: trim range typechecking

@gopherbot
Copy link

Change https://go.dev/cl/510537 mentions this issue: cmd/compile, go/types: typechecking of range over int, func

@gopherbot
Copy link

Change https://go.dev/cl/510535 mentions this issue: spec: changes for range over int, func

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

The func part seems like it'd be easier to reason about with a usage example showing the body of a function that would work here.

If I've understood this correctly:

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { return }
    }
}

...

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

would print 0, 1, 2, 3, 4, and 5, then the call to yield(6) would return false and the loop would terminate?

Is there a particular reason to cap functions at two arguments, rather than just taking arbitrary functions which return bool?

@rsc
Copy link
Contributor Author

rsc commented Jul 17, 2023

@seebs, yes, you're right about the rangeTen example.

If we cap the number of variables at two, then no changes are required in go/ast or go/parser, and libraries that want to support iteration functions know the 3 signatures they need to support.

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

Hmm. So, basically, break makes yield() return false, and continue makes yield() return true immediately. But now I'm thinking:

outer:
for a := range f1 {
    for b := range f2 {
        ...
        continue outer
    }
}

So... does the yield call in f2 return at all when we reach continue outer? Does anything enforce that the function then does the right thing? My concern is that a function could conceivably have cleanup code which needs to run after the loop, for instance. Or is that prohibited?

Basically, what happens if the user is holding it wrong?

Thinking about it more:

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { break }
    }
    // this is a stand-in for "cleanup code we have to do after the loop"
    fmt.Println("got here")
}

...

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

Does this print got here? If it doesn't, the control flow is a little surprising. If it does, this feels noticably unlike what I initially envisioned as "the break, continue, defer, goto, and return statements behave exactly as they do in a loop body ranging over other types". Because on the one hand, I expect the break to just instantly transfer control to just past the loop, but on the other hand, I expect the rangeTen function to actually finish its execution.

@danderson
Copy link
Contributor

The proposed spec change says As with range over other types, it is permitted to declare fewer iteration variables than there are iteration values.

I read that to mean: if I'm writing a library that provides a map-ish data structure, and I want to allow the consumer to do all of for range ..., for k := range ... and for k, v := range ..., I would only have to implement one iterator function, Iter(yield func(K, V) bool), and unnecessary yielded values will be discarded to fit the caller's desires.

Is that correct and the way you'd expect library authors to work with this language feature? Or are we going to end up having to provide IterFoo0, IterFoo1 and IterFoo2 variants for everything?

@cespare
Copy link
Contributor

cespare commented Jul 17, 2023

I, too, would find it easier to understand the proposal with more examples. Perhaps you can give some examples of the code you're referring to with:

Being able to range over a function enables writing range over user-specified iteration logic, which should simplify many programs.

and

It should also help make use of custom collection types nicer.


Do you think that the implementation of this feature will be able to be faster than 1 full function call per iteration?

@rittneje
Copy link

@seebs In your example:

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { return }
    }
    fmt.Println("got here")
}

...

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

Did you mean to use break in rangeTen instead of return?

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

... yes.

@apparentlymart
Copy link

apparentlymart commented Jul 17, 2023

I was just writing some custom collection types last week and had no good solutions for dealing with iteration over members, so this proposal is timely!

The ability to use a function with range does seem like it would fill the gap, but I'd like to see an example in the proposal of what you'd consider to be an idiomatic use of range-over-function with a hypothetical collection type. Specifically, I'm interested to see your viewpoint on:

  • Should the function used with range typically be a package-level function which takes the collection as an argument, or a method where the receiver argument is therefore implied?
  • What would you consider to be a good name for such a function?

Or, more concretely, what exactly would the for ... range loop for this collection type look like at the call site, including a realistic hypothetical name?


For the sake of example I'm thinking about a hypothetical "set of T" type; let's call it type Set[T any]. If I have a Set[string] value in a variable s, what might that look like at the callsite?

As a method func (s Set[T]) Range(func yield func(T) bool):

for name := range s.Range {
    // ...
}

...or as a function in package sets, func Range[T comparable](s Set[T])() func yield func(T) bool:

for name := range sets.Range(s) {
    // ...
}

Is Range a good name to use? I'm not sure!

Edit: when I first wrote this I had a thinko and for some reason wrote Rust-style for loops. I've edited it to now actually use Go for ... range loops. Sorry for the misfire!


Edit edit: A later rsc comment implicitly answers this question without directly answering it. 😀

From that comment I infer that:

  • Methods are the most likely way to offer "iterator functions" for a collection type, rather than package-level functions that take a collection and return another function. In practice that seems to mean using a method value as the range operand, so that it's automatically bound to the receiver.
  • The method should be named after what set of values the iterator will expose and, where relevant, what order it will do that in. For example, s.All for "all elements of the set".
for name := range s.All {
    // ...
}

That does seem pretty reasonable to me, though intentionally not including anything in the name representing "iterator" or "range" does mean that someone looking at the godoc for a package will presumably just need to be familiar with the pattern that any function which takes a yield function as its only argument is intended for use with for ... range. That doesn't seem like a show-stopper though, and presumably package authors will write doc comments explaining what these methods are intended to be used for.

@neild
Copy link
Contributor

neild commented Jul 17, 2023

... yes.

In that case, this prints 0, 1, 2, 3, 4, 5, "got here".

(You can test this with the draft implementation. Although that seems to require rangeTen to return bool, which is not in the proposal.)

The "got here" will execute if the range rangeTen exits due to finishing the range, return, break, or goto. It won't execute if it exits due to a panic, but a defer in rangeTen will still execute in that case.

@jimmyfrasche
Copy link
Member

It would be great if there were a playground for this. Building a custom tool chain isn't too hard but it's fairly involved if it's not something you've done before and even if you have it's a bit much if you want to just try 2 or 3 small things.

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

This seems like a possibly-surprising thing, because I think my current intuition is that if I'm doing a loop, and i execute break, i will just unconditionally and immediately be outside the loop. The category of "code which can be executed after a break but before execution resumes past the loop" is new, I think? I think it's necessary, but it's slightly surprising. I think less surprising than the alternative of not returning from yield would be.

But then I'm uncertain what I expect to happen in the continue outer case. Because that feels very much like it should just be causing the outer loop's yield to return true... So it's sort of like an implicit break from any loops between the current execution and the loop we're continuing, but nothing else between those happens. So we sort of go up the line causing yield to return false, and as soon as the range-iteration function that called it exits, we jump on to the next one and cause its yield to return false also?

@thepudds
Copy link
Contributor

thepudds commented Jul 18, 2023

It would be great if there were a playground for this. Building a custom tool chain isn't too hard but it's fairly involved if it's not something you've done before

Agreed a playground would be nice, but FWIW, it is pretty straightforward to try out a CL just by using the gotip wrapper utility, which will install a parallel Go toolchain in $HOME/sdk/gotip (without impacting your main Go toolchain).

To try out these changes with the CL Russ suggested:

$ go install golang.org/dl/gotip@latest
$ gotip download 510541            # download and build CL 510541              
$ gotip version                                       
$ gotip run foo.go                 # use 'gotip' in place of the 'go' command

@ianlancetaylor
Copy link
Contributor

func rangeTen(yield func(int) bool) {
    for i := range 10 {
        if !yield(i) { break }
    }
    // this is a stand-in for "cleanup code we have to do after the loop"
    fmt.Println("got here")
}

for i := range rangeTen {
    if i == 6 { break }
    fmt.Println(i)
}

Does this print got here?

Yes. The break statement in the range rangeTen loop will cause the yield function called by rangeTen to return false.

@ianlancetaylor
Copy link
Contributor

@danderson

I read that to mean: if I'm writing a library that provides a map-ish data structure, and I want to allow the consumer to do all of for range ..., for k := range ... and for k, v := range ..., I would only have to implement one iterator function, Iter(yield func(K, V) bool), and unnecessary yielded values will be discarded to fit the caller's desires.

Yes.

@ianlancetaylor
Copy link
Contributor

This seems like a possibly-surprising thing, because I think my current intuition is that if I'm doing a loop, and i execute break, i will just unconditionally and immediately be outside the loop. The category of "code which can be executed after a break but before execution resumes past the loop" is new, I think? I think it's necessary, but it's slightly surprising. I think less surprising than the alternative of not returning from yield would be.

There are two different loops here, and they should be considered separately.

rangeTen is just a function that calls yield up to ten times until it returns false. If yield returns false (and doesn't panic or call runtime.Goexit) then the loop in rangeTen just keeps going as usual. The fact that rangeTen happens to be invoked by another for range loop is irrelevant. rangeTen acts like ordinary Go code.

What is new here is that if the for range loop over rangeTen exits early, then yield in rangeTen returns false. But other than that detail, rangeTen is just ordinary Go code.

And, yes, if we break out of multiple for range loops over push functions at once, then the yield function for each of those push functions will return false, and when those functions have returned we will continue with the break in the outer loop. If those functions fail to return, that would be weird, but the program will just carry on until they eventually do return.

@ianlancetaylor
Copy link
Contributor

Here is a possibly illustrative (but untested) example.

// Tree is a binary tree.
type Tree[E any] struct {
	val         E
	left, right *Tree
}

// All may be used in a for/range loop to iterate
// through all the values of the tree.
// This implementation does an in-order traversal.
func (t *Tree[E]) All(yield func(E) bool) {
	t.doAll(yield)
}

// doAll is a helper for All, to make it easier
// to know when the iteration stopped in a subtree.
func (t *Tree[E]) doAll(yield func(E) bool) bool {
	if t == nil {
		return true
	}
	return t.left.doAll(yield) && yield(t.val) && t.right.doAll(yield)
}

Here is an example of using that method to sum all the values in a tree.

// SumTree returns the sum of all the elements in t.
func SumTree(t *Tree[int]) int {
	s := 0
	for v := range t.All {
		s += v
	}
	return s
}

@seebs
Copy link
Contributor

seebs commented Jul 18, 2023

Hmm. Okay, so, the actual implementation of break foo or continue foo might be more complex, but this is expressed as "if a loop is supposed to terminate, its yield function returns false". But you may then be in a state where, once the function returns, you immediately cause the next yield function in the sequence to return. But that is implementation-difficulty, it's invisible to the user; the break/continue just work as expected.

Perhaps it should be specified what, if anything, happens if you call yield again after it returned false; I think I'd expect a runtime panic?

@jba
Copy link
Contributor

jba commented Jul 18, 2023

The proposal should specify what happens if the yield function escapes and then is called.

@rittneje
Copy link

@rsc I built the toolchain from your branch as per your instructions (using make.bash) but it doesn't work.

$ ./go/bin/go version
go version devel go1.21-3bf145578f Mon Jul 17 17:18:06 2023 -0400 linux/amd64
$ ./go/bin/go run range.go 
# command-line-arguments
./range.go:8:17: cannot range over 5 (untyped int constant)

range.go

package main

import (
	"fmt"
)

func main() {
	for i := range 5 {
		fmt.Println(i)
 	}
}

What am I missing?

@qiulaidongfeng
Copy link
Contributor

I successfully compiled using gotip download 510541 and output the expected results.
Code copy for @rittneje

package main

import (
	"fmt"
)

func main() {
	for i := range 5 {
		fmt.Println(i)
 	}
}

My go version output go version devel go1.21-3bf145578f Mon Jul 17 17:18:06 2023 -0400 windows/amd64.

@rsc
Copy link
Contributor Author

rsc commented Aug 30, 2023

I believe the open issues of discussion remaining here are:

  1. Whether to allow writing fewer range variables than there are yield function arguments. The current text says it is allowed; I believe we should disallow it, to enable the pattern of iterating with an error.
  2. What if the iterator function calls yield on a different goroutine? If properly sequenced this is probably OK. If we disallow it, we create functions that are locked to specific goroutines, which would be new and unfortunate.
  3. What if the iterator function calls yield on multiple goroutines simultaneously? This should probably be disallowed, and compilation with -race should probably diagnose it.
  4. What if the iterator function recovers a panic in the loop body? This should probably be disallowed, but we need to figure out how to do it efficiently enough.

Are there any other open issues that need to be resolved?

@bcmills
Copy link
Member

bcmills commented Aug 30, 2023

What if the iterator function recovers a panic in the loop body?

Ideally I think we should define that question away, by having yield return false instead of making the panic visible to the iterator. Then the panic sequence would be:

  1. Loop body begins panicking.
  2. Control transfers to the iterator and yield returns false.
  3. The iterator returns normally.
  4. Control transfers back to the caller, which then propagates the panic.

(For me that's the ideal semantics, but I realize that implementation concerns could reasonably lead to a different definition, and I think other options could be reasonable too.)

@DeedleFake
Copy link

DeedleFake commented Aug 30, 2023

A few comments about only allowing a single loop variable. This is untenable to me, because it means you can't write something like slices.Backward (#61899) that makes iterating backward over a slice as convenient as iterating forward over a slice.

My biggest issue is having two versions of iterators supported. If the decision is that two-value iterators are absolutely necessary, I would prefer that the single-value version be removed instead.

Edit: Actually, I'd be fine with some form of variadic generics capable of making the point moot, but that seems unlikely to happen before iterstors do.

@carlmjohnson
Copy link
Contributor

My biggest issue is having two versions of iterators supported.

Don't you think it will be enough to have helper functions to turn a two value iterator into a one value iterator and vice versa? How much duplication will there be in practice? It's not like writing a pre-generic data structure where it's a pain because there are arbitrarily many different types you might want to plug in. There are three sizes—0, 1, and 2—and it doesn't really make sense to chain the 0 value form. The 2 and 1 value forms can be bridged with generic helpers. Other than the ugliness of some mild duplication and helper calls, is it untenable?

@DeedleFake
Copy link

Other than the ugliness of some mild duplication and helper calls, is it untenable?

No, it's not untenable, but it has turned out to be quite a bit more annoying than I expected it to be as I've been implementing a ton of iterator-related functionality to play around with. It's not a huge problem by any means, but it has led to a fair bit of awkwardness, despite the helper functions.

@gazerro
Copy link
Contributor

gazerro commented Aug 30, 2023

@rsc has finally been decided whether the bool value returned by the iterator function will be removed?

@gazerro
Copy link
Contributor

gazerro commented Aug 30, 2023

Iterators over two variables, where the first value is the index, could be written as iterators over a single variable if the use of range with two variables were allowed in this case.

So, if f has the type func(yield func(T) bool) bool, in addition to writing:

for v := range f

it would be allowed to write:

for i, v := range f

where i would be automatically incremented for each iteration.

@rsc
Copy link
Contributor Author

rsc commented Aug 30, 2023

@gazerro, yes, the bool value returned by the iterator function is gone.

@fzipp
Copy link
Contributor

fzipp commented Aug 30, 2023

where i would be automatically incremented for each iteration

@gazerro
For slices.Backward the index is supposed to count backwards as well.

@gazerro
Copy link
Contributor

gazerro commented Aug 30, 2023

For slices.Backward the index is supposed to count backwards as well.

Yes, slices.Backward will remain a two value iterator.

@jimmyfrasche
Copy link
Member

if I had some unreleased code I was refactoring

range x := range f() {

where f return a map that's legal but if I change it to return a 2 item iterator now it's a compiler error? That seems a bit harsh, especially if in this case I can ignore the second value. I could rewrite it to

range x, _ := range f() {

but it's weird that I have to do that when it's not how it works for the built in types.

I made this comment earlier somewhere (can't find it now between github hiding comments and there being multiple iterator-related issues) but I don't really see having the second value being an error helps unless you plan on immediately returning mid loop like

for v, err := range f() {
  if err != nil {
    return err
  }
  // loop code

if you actually need to do anything else it becomes quite awkward:

var v T
var err error
for v, err = range f() {
  if err != nil {
    break
  }
  // loop code
}
if err != nil {
  // handle the failure in the loop

(it does make sense to have the second value be an error when individual items in the sequence can have errors that do not imply that the iteration itself failed)

Given that, I don't think the language should do anything special to encourage something that may become an idiom or may become one of those things that everyone tried but it just didn't really pan out in practice so we do something else now.

@Merovius
Copy link
Contributor

Merovius commented Aug 30, 2023

@gazerro Not every iteration key is an integer. For a map-like datastructure (maps iterating in insertion order are an often requested feature for Go and one of the most anticipated use-cases for generics), the key-type can be ~anything.

[edit] I now see what you are suggesting - not to remove two-variable iterators, but to expand two-variable iteration to one-variable iterators. TBQH I don't see the value in that - surely, it is trivial to provide a helper to do that. [/edit]

@gazerro
Copy link
Contributor

gazerro commented Aug 30, 2023

@Merovius For those implementing an iterator that iterates over individual values, a question arises regarding whether to implement it with a single-value iterator or with a two-value iterator. Implementing it with two values would allow, in usage, to have a variable, often useful, with the iteration index starting from zero.

My proposal aims to avoid this uncertainty, simplifying the iterator's implementation and leaving it to the user to decide whether to declare the index variable or not.

@AndrewHarrisSPU
Copy link

@gazerro

For those implementing an iterator that iterates over individual values, a question arises regarding whether to implement it with a single-value iterator or with a two-value iterator.

When order of appearance is not deterministic, the language has had a clear answer. Emphatically, it works to eliminate inadvertent dependencies on order of appearance. I think that suggests not including implicit behavior to generate an order of appearance index from a 1-ary iterator.

@Merovius
Copy link
Contributor

@gazerro I still don't understand. If I write

func WithIndex[T any](s Seq[T]) Seq2[int, T] {
    return func(yield (int, T) bool) {
        i := 0
        for v := range s {
            if !yield(i, v) { return }
            i++
        }
    }
}

Then how is for i, v := range WithIndex(s) { … } different from your proposed for i, v := range s { … }? So given that we can easily write WithIndex (and provide it as a library function), why do we need to add language-level behavior for this?

@gazerro
Copy link
Contributor

gazerro commented Aug 31, 2023

@Merovius In summary, give s a one-value iterator, with this proposal, we have two options

i := 0
for v := range s {
    i++
    ...
}
for i, v := range xiter.WithIndex(s) {
    ...
}

With my additional proposal, we have

for i, v := range s {
    ...
}

So, the question is whether this situation comes up often enough to add a language-level behavior to this, also considering that it might be added later based on the experience.

The other question I still have is whether those who implement an iterator would be inclined to create a two-value iterator, instead of a one-value iterator, solely to enable writing for i, v := range s in its usage.

@willfaught
Copy link
Contributor

Lots of comments about library decisions. I disagree with the people who say we should tangle these discussions all together. Separating the language change from the library change lets us consider the simple idea of range over function separately from the idioms that we will build around it.

@rsc If you mean feedback about this proposal's design based on how it plays out in library proposals, then I don't think that's a good idea. You'd be ignoring valuable feedback about how well this design works in practice.

A few comments about only allowing a single loop variable. This is untenable to me, because it means you can't write something like slices.Backward (#61899) that makes iterating backward over a slice as convenient as iterating forward over a slice.

I don't think that's true, as I've demonstrated above or elsewhere (can't remember). You can use a Pair type for equivalent functionality. It would be

func Backward[Slice ~[]Elem, Elem any](s Slice) iter.Seq[Pair[int, Elem]]

@DeedleFake
Copy link

You can use a Pair type for equivalent functionality.

Giving the Pair type a method to help could even make splitting it fairly straightforward:

// Not exporting the fields forces checking errors the same as the two-value iterators do.
type Pair[T1, T2 any] struct {
  v1 T1
  v2 T2
}

func Wrap[T1, T2 any](v1 T1, v2 T2) Pair[T1, T2] {
  return Pair[T1, T2]{v1: v1, v2: v2}
}

func (p Pair[T1, T2]) Split() (T1, T2) { return p.v1, p.v2 }

And then elsewhere:

for cur := range SeqWithErrors() {
  v, err := cur.Split()
  // ...
}

Or, actually, as long as iterator functions are being added, maybe just also add a rule that an iterator that yielda structs of certain formats can be automatically split by range.

Either way, using a second level of type to denote two-value iteration completely removes all of the problems with the need to support two different types of iterators in all other APIs and/or call conversion methods to go back and forth all the time.

@gazerro
Copy link
Contributor

gazerro commented Aug 31, 2023

Allow me to make an additional proposal regarding the use of one-value iterators with ranges.

A slice s can be used with range in three ways

for range s { ... }

for i := range s { ... }

for i, v := range s { ... }

Why not adopt the exact same syntax for one-value iterators as for slices?

In various use cases, one-value iterators will be used instead of slices. So, why have two different syntaxes for range?

For example, for functions like strings.Split and strings.SplitSeq (as proposed), it would be more intuitive if they were interchangeable in a range.

For example, we can write

for _, v := range strings.Split(s) { ... }

for i, v := range strings.Split(s) { ... }

but with the current proposal, for the strings.SplitSeq function, we would instead have to write

for v := range strings.SplitSeq(s) { ... }

i := 0
for v := range strings.SplitSeq(s) {
  i++
  ... 
}

@DeedleFake
Copy link

DeedleFake commented Aug 31, 2023

@gazerro

Or you could write

seq := iters.Enumerate(strings.SplitSeq(s, sep))
for i, v := range seq {
  // ...
}

@meling
Copy link

meling commented Aug 31, 2023

(Edit: I retract this suggestion... I now think it is a bad idea.)

If the language construct were to provide the index for a single-value iterator function, then one idea is to adopt the current slice-behavior:

for range strings.SplitSeq(s) { ... } // no way to access index or value
for i := range strings.SplitSeq(s) { ... } // no way to access value *)
for i, v := range strings.SplitSeq(s) { ... }
for _, v := range strings.SplitSeq(s) { ... } // only interested in value

*) This is less useful than iterating over a slice because with the slice you can still access the value at index i.

This would be consistent with what we have for slices now. I'm not so sure I like it though... just putting it out there.

@Merovius
Copy link
Contributor

@meling Apart from the fact that the proposal probably not allow to implicitly ignore values (i.e. the first two forms from that list are disallowed), that seems to be exactly the same as what the WithIndex/Enumerate helper allows. And if we don't want to allow ignoring values in general, I don't see why we'd want to allow it in this case. So if anything, that speaks against this change - it makes it far more confusing in which situations you have to assign to _ and in which you don't.

To be clear

  1. I don't think this facility is super helpful in any case. If the underlying data structure allows retrieval by integer-index, it should provide a two-variable iterator - otherwise, I don't think the index is very often useful.
  2. If we do deem it super helpful, we can trivially support it using a library function.
  3. If we deem it so helpful that it deserves language-level support, it would be backwards-compatible to do that later. So we should still start with a library function, to assess the need.

I don't think there is a reasonable case to modify this proposal with that behavior right now.

@DeedleFake
Copy link

This would be consistent with what we have for slices now.

There's no particular reason why this has to be consistent with the indexing behavior of slices. range already works with not only slices but also maps, arrays, and channels. Of those, arrays are the only ones that behave like slices and they do so not for consistency but because they are a lot like slices anyways and the same behavior makes sense. Maps iterate similarly to slices at a glance, but with keys instead of indices, so they're only consistent by analogy because 'index' would make no sense with a map.

Channels, on the other hand, don't even have a two-value range because there's no equivalent of an index at all for channels and it would make zero sense to introduce it simply because a completely different data structure has it. range-over-func is closest to the existing usage with channels, so if it should be consistent with anything it should probably be that one, though that doesn't entirely preclude the proposed two-value iterators by any means. But introducing an optional index for a sequence that isn't indexable seems very random.

@meling
Copy link

meling commented Aug 31, 2023

Yeah. I agree with your excellent points @Merovius and @DeedleFake ...

@gazerro
Copy link
Contributor

gazerro commented Aug 31, 2023

@Merovius

if we don't want to allow ignoring values in general, I don't see why we'd want to allow it in this case

We are avoiding the ability to ignore values because two-value iterators can return an error as second value. Consequently, we have extended this behavior to all iterators because it is consistent with the rest of the specification on iterators.

Making one-value iterators, for ranges, more similar to slices, this similarity among iterators is no longer present.

@Merovius
Copy link
Contributor

Merovius commented Aug 31, 2023

@gazerro It would be great if you could, if you insist on continuing this, answer the question of why we wouldn't just start out with a library function, if we actually thought this is a good idea.

I don't understand what you are trying to say, but I genuinely don't think it matters - because we shouldn't add a language-level mechanism for something we can evaluate with a library function first. There's just no way this is a good idea to do this right now, before at least trying out if people want it.

@gazerro
Copy link
Contributor

gazerro commented Aug 31, 2023

@Merovius Since I made two different proposals regarding the use of one-value iterators, I apologize if this may have caused some confusion. I'm ok to start with a library function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Active
Development

No branches or pull requests