Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: allow range-over-func to omit iteration variables #65236

Closed
rsc opened this issue Jan 23, 2024 · 70 comments
Closed

spec: allow range-over-func to omit iteration variables #65236

rsc opened this issue Jan 23, 2024 · 70 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jan 23, 2024

In discussion during the implementation of #61405 we changed range-over-func to require mentioning iteration variables. The idea is that if we do end up with idioms like:

for line, err := range FileLines("/etc/passwd") {
	...
}

Then we want to diagnose:

for line := range FileLines("/etc/passwd") {
	...
}

as an error. However, this is inconsistent with other range loops and also not what the #61405 text said. Probably we should change it. Starting a separate proposal for that discussion.

@earthboundkid
Copy link
Contributor

What if it's just a vet check that looks for errors not silenced with , _? Then you could still write for key := range mapish.All().

@timothy-king
Copy link
Contributor

I don't think I understand what is being proposed. Doesn't the spec text of #61405 already "allow range-over-func to omit iteration variables"?

The example :

For example if f has type func(yield func(T1, T2) bool) bool any of these are valid:

for x, y := range f { ... }
for x, _ := range f { ... }
for _, y := range f { ... }
for x := range f { ... }
for range f { ... }

The proposed text:

As with range over other types, it is permitted to declare fewer iteration variables
than there are iteration values.

Anyways I am confused about what is being proposed here.

for line, err := range FileLines("/etc/passwd") {

If we think there is a subset of functions like FileSet that need to not ignore one of the values in RangeStmts, this seems like it could fits in well with the unusedresult checker in vet. I am not sure it makes sense to enforce usage at the language level.

@jimmyfrasche
Copy link
Member

I don't see how this buys anything other than ceremony.

@jba
Copy link
Contributor

jba commented Jan 23, 2024

We know that the design of bufio.Scanner, which requires a call to the Err method after iteration, is flawed. We have empirical evidence that people forget the call.

We can guess that range-over-func will become the dominant way to iterate over all sorts of sequences where failure is possible: file lines, DB rows, RPCs that list resources, and so on.

I'm not saying that we should require the iteration variables, but I am saying that if we don't, we need a good story for how to avoid introducing many bugs into Go code. Like a vet check for when the second return type is error that is enabled for testing.

@meling

This comment was marked as outdated.

@jimmyfrasche
Copy link
Member

I agree that not handling the error is a real problem. I do not think that iterators yielding errors solves that problem very well, especially if that means having to immediately buttress that in a way that forces everyone to use range-over-func differently than native iterators. Even if I'm wrong and yielding errors is a boon, then static analysis is more than capable of linting it only in the case when an error is iterated over.

@jba here's https://pkg.go.dev/bufio#example-Scanner.Bytes rewritten to use a hypothetical iterator that returns ([]byte, error): https://go.dev/play/p/G-Wv80AbfqF I don't think having the error handling in the loop is an improvement. One of the alternatives I saw posted was for the iterator factory to take a *error. I didn't much like the idea initially but it has grown on me: https://go.dev/play/p/DdqILPDt9B7

@meling
Copy link

meling commented Jan 24, 2024

The only benefit of passing an *error to the iterator as mentioned @jimmyfrasche is that you must declare and pass an error to the iterator, but you can still forget to check it after the loop. More problematic though is that the pattern prevents you from logging/handling individual errors.

I see a few possible patterns related to errors:

for line, err := range FileLines("/etc/passwd") {
    if err != nil {
        // log or handle error
	break
    }
    ...
}
for line, err := range FileLines("/etc/passwd") {
    if err != nil {
	return err
    }
    ...
}

Both of these are IMO better solutions than handling the error after the loop as with the bufio.Scanner example, as the error handling happens immediately after the call producing the error.

The other possible pattern (if-err-continue) is:

for line, err := range FileLines("/etc/passwd") {
    if err != nil {
        // ignore errors
	continue
    }
    ...
}

If the intent of the developer is to ignore errors, then the above could instead be written as (same semantic meaning):

for line, _ := range FileLines("/etc/passwd") {
    ...
}

The if-err-continue pattern above could be detected by a tool (gopls) and a rewrite to the simple form could be suggested (similar to some other patterns).

However, I would advice against allowing the following pattern if the iterator returns an error, because the _ is a more explicit signal that ignoring the error is intentional:

for line := range FileLines("/etc/passwd") {
    ...
}

If you want to handle/log individual errors and continue, you would still need to use this pattern:

for line, err := range FileLines("/etc/passwd") {
    if err != nil {
        // handle or log error
	continue
    }
    ...
}

Handling and logging individual errors is not possible with the handling after the loop approach.

@earthboundkid
Copy link
Contributor

We have empirical evidence that people forget the call.

Looks like they finally fixed it, but for months the marketing page for Copilot showed code with a missing call to .Err().

@jimmyfrasche
Copy link
Member

@meling To clarify:

I fully support yielding (T, error)—when the error is per iteration.

I oppose an error that signals an abnormal end to (or failure to start) the iteration being handled within the loop.

In other words, I think if you can continue on err != nil it's fine, but if you MUST break/return it's not. (Presumably in the latter case the code would still accidentally work correctly if you continued as the iteration would stop but that's not a great argument in the pattern's favor).

@seh
Copy link
Contributor

seh commented Jan 25, 2024

What about an iterator that's using a context.Context internally to govern its fetching of additional values? If the Context is done, do you think that the iterator should yield the result of Context.Err? If so, should it respect that call to yield returning true and attempt to continue, or should it ignore yield's return value?

I've been writing such an iterator, and I've changed my mind about the answers to these questions several times now.

@AndrewHarrisSPU
Copy link

What about an iterator that's using a context.Context internally to govern its fetching of additional values?

This is a really interesting question - to me a practically useful solution has been to delay passing context.Context far enough that it isn't deeply internal. In other words, don't pass around an iterator with an internal context, pass around a func(context.Context) -> iter.Seq and a context.Context as far as possible, so that the base context.Context is in scope where it should be checked. If a computation using iterated values needs to branch, the context should be in scope.

@seh
Copy link
Contributor

seh commented Jan 26, 2024

In other words, don't pass around an iterator with an internal context, pass around a func(context.Context) -> iter.Seq and a context.Context as far as possible, so that the base context.Context is in scope where it should be checked.

That's already the case for my approach, but it doesn't obviate the question of whether the iterator should yield Context.Err when it detects that the Context is done.

@thediveo
Copy link

That's already the case for my approach, but it doesn't obviate the question of whether the iterator should yield Context.Err when it detects that the Context is done.

What would be a reason for the iterator to not return the ctx.Err()?

@Merovius
Copy link
Contributor

This seems certainly an interesting API design question, but I don't think it's something that needs to be answered here. If it doesn't yield an error, then there is no question. If it does, that doesn't seem different from any other error, as far as this issue is concerned.

Personally, I'd just leave that up to the third party package, in any case, if there is no clearly correct answer.

@seh
Copy link
Contributor

seh commented Jan 27, 2024

What would be a reason for the iterator to not return the ctx.Err()?

It could just stop the iteration upon detecting that the Context is done, and leave it to the caller to discern that may have been the cause. Supplying the Context as a governor for how long to try acquiring more items means that the caller expects that the effort may need to cease prematurely.

@stevedalton
Copy link

stevedalton commented Feb 1, 2024

Maybe pure blasphemy, but

for err, line := range FileLines("/etc/passwd") {...}

helps force the issue for error handling.

@ianlancetaylor
Copy link
Contributor

It's a moderately common mistake to write for v := range s when the author really meant for _, v := range s. This mistake is undetected at compile time if s is []int. Perhaps the mistake was in permitting range variables to be omitted at all. We can't change that for the existing types, but we don't need to repeat the mistake for the new function types.

@szabba
Copy link

szabba commented Feb 10, 2024

It probably could be changed the way loop iteration variable semantics were changed?

Different types being iterated over having different rules for omitting loop variables feels very surprising. I think if it gets addressed, it should be addressed for all types that can be iterated over in for ... range.

@isgj
Copy link

isgj commented Feb 11, 2024

We can't change that for the existing types, but we don't need to repeat the mistake for the new function types.

Then it was just re-introduced with range-over-int

for range 5 {
        ...
}

(errors are simple values, nothing special)

+1 for the vet rule though

@jba
Copy link
Contributor

jba commented Feb 20, 2024

Here is an experience report that bears on this issue.

Previously, I advocated that iterators that can fail should return iter.Seq2[T, error], so loops would look like

for t, err := range seq {...}

Over the weekend I converted a hobby project to use the experimental range-over-func feature. Part of the project is a persistent map: Map[K, V any]. Since items are read from disk, there is always the chance of an error.

I started by defining

func (*Map[K, V]) Keys() iter.Seq2[K, error]
func (*Map[K, V]) Values() iter.Seq2[V, error]
func (*Map[K, V]) Items() iter.Seq2[Item[K, V], error]

As I anticipated, to iterate over keys-value pairs I would need a type for those pairs, since there is no 3-variable range statement. That was always going to be a little awkward. In my mind I was imagining loops like

for item, err := range productsByID.Items() {
    if err != nil {...}
    fmt.Printf("id=%d, product=%+v\n", item.ID, item.Product)
}

where I could at least use good names for the keys and values. But these maps are generic, which means Item[K, V] is generic, which means the fields are never going to be any better than Key and Value. At this point we've gone from

for id, product := productsByID  // normal map iteration

to

for item, err := range productsByID.Items() {
   if err ...
   ... item.Key ... item.Value
}

And that started to really feel ugly. So I started looking for another way. First I used pointer-to-error arguments, as mentioned above:

func (*Map[K, V]) Items(errp *error) iter.Seq2[K, V]

That's an improvement, but it still made me uncomfortable. For one thing, although you do have to pass an argument, the compiler doesn't help you check it—it considers passing the pointer to be using the variable, so it doesn't fail with "err declared and not used" if you don't check err.

Another problem is that you might check the error too soon:

var err error
seq := productsByID.Items(&err)
for id, product := range seq {
   if err != nil {...}
  ...
}

That's definitely a mistake—err will always be nil inside the loop—but nothing catches it.

Where I ended up was something that is obvious in hindsight:

func (Map[K, V]) Items() (iter.Seq2[K, V], func() error)

You use it like this:

seq, errf := productsByID.Items()
for id, product := range seq {
    ...
}
if err := errf(); err != nil {
   return err
}

(Like the pointer-to-error approach, this assumes that the first error ends the loop.)

This might look like it still has the problems of bufio.Scanner. But it doesn't, not quite. It's unfortunate that the error check is far away from the source. But unlike Scanner.Err, which doesn't appear in your code unless you write it, here we have errf, which the compiler will complain about if you don't use it. (Two caveats to that: if this is your second loop in a function and you've already defined errf, you won't get an error if you don't use the second one; and the compiler will only insist that you use errf, not call it.)

Having the compiler complain about an unused errf is a fix for the first problem of error pointers: no notice if you don't check them. And we can fix the second problem, checking the error too early: since errf is a function, we can make it panic if it's called before the end of the loop.

Why is this obvious in hindsight? Remember that iter.Seq is a function type. Just as we move from sequences:

itemSlice := m.ItemsSlice() // returns a slice (a sequence)

to functions over sequences:

itemSeq := m.Items() //  returns a function over a sequence

so we move from errors:

itemSlice, err := m.ItemsSlice()

to functions over errors:

itemSeq, errf := m.Items()

A Haskell programmer would say that we've "lifted" our values into the space of functions. (I think.)

For every utility function that "collapses" a sequence to a value like a slice, we can have a variant that takes an error function:

func ToSlice[T any](iter.Seq[T]) []T
func ToSliceErrf[T any](iter.Seq[T], func() error) ([]T, error)

So simple uses of iterators, ones where you just want the get the slice (or map or whatever), still can be simple:

itemSlice, err := ToSliceErrf(productsByID.Items())

For many other functions on iterators, like Map, Filter and Reduce, getting the error out of the iter.Seq means that we don't need error-aware variants of those. If there is an error, they will just observe a shorter sequence and stop early.

Unfortunately functions that take multiple sequences will need special attention. For example, consider the function that Python calls Chain, which takes a list of sequences and returns their concatenation. If the first sequence fails, Chain will just move on to the next. In general, there is no safe time to call the error function for the initial sequence until the whole chain finishes. I haven't thought carefully about the best approach for those kinds of functions. But this problem isn't severe enough to make me want to give up on error functions.

Writing an iterator that returns an error function involves some boilerplate that we can encapsulate in a type:

type ErrorState ...

func (es *ErrorState) Done() // indicates the iteration is done
func (es *ErrorState) Set(err error)  // sets the error if err != nil
func (es *ErrorState) Func() func() error // returns an error function

Here is how you would use it to write an iterator over the lines of a file:

func Lines(filename string) (iter.Seq[string], func() error) {
	var es ErrorState
	return func(yield func(string) bool) {
		defer es.Done()
		f, err := os.Open(filename)
		if err != nil {
			es.Set(err)
			return
		}
		defer f.Close()
		s := bufio.NewScanner(f)
		for s.Scan() {
			if !yield(s.Text()) {
				break
			}
		}
		es.Set(s.Err())
	}, es.Func()
}

@Merovius
Copy link
Contributor

@jba IMO that is still problematic. Note that an iter.Seq can be called (iterated over) multiple times. To me, your "error-function return" doesn't really tell the user how errf calls correspond to iterations. That is, if I iterate over the returned iter.Seq twice, is the errf re-used? I'll note that it doesn't get reset in your ErrorState abstraction, though that can be fixed by adding a es.Set(nil) at the top of the function-literal.

Not that this isn't also a problem with taking a pointer (which I also don't like, FWIW). I just don't feel the same level of "this solves everything" optimism I read in your post.

@meling
Copy link

meling commented Feb 20, 2024

@jba Thank you for the report. Comparing this:

for item, err := range productsByID.Items() {
   if err != nil {
      // handle error or return or continue
   }
   id, product := item.Key, item.Value
   ...
}

to this:

seq, errf := productsByID.Items()
for id, product := range seq {
    ...
}
if err := errf(); err != nil {
   return err
}

There are no extra lines. For the drawbacks of the latter, I'm still very much in favor of the former approach; it is simple and easy to understand.

Regarding the functional part of your post; the boilerplate and complexity doesn't seem worth it. I think a simpler approach is necessary, if I'm going to like it. But I confess that I have not thought carefully about how iterator functions can be used and combined when there are errors involved.

edit: added missing range keyword in first example.

@jba
Copy link
Contributor

jba commented Feb 20, 2024

@Merovius, good point about using the sequence multiple times. The ErrorState should be reset at the start of the iteration function, as you suggest. I don't feel that "this solves everything," it just seems to offer better trade-offs than the other approaches.

Regarding the functional part of your post; the boilerplate and complexity doesn't seem worth it.

@meling, I'm not sure what this refers to. What boilerplate?

@atdiar
Copy link

atdiar commented Feb 20, 2024

I am a bit confused by the examples.
Are there missing "range" keywords?

Isn't this an issue of nested iterations?

I thought the error returning function could perhaps be a good idea until I saw how it would be used and it made me think.

I guess my question is whether the error is about the iterator failing (?) , which should not be an error but a fault/panic probably, or the value returned by the iterator being wrong?
(the latter which is similar in spirit to the traditional comma, ok idiom with maps retrieval)

Said otherwise, is it possible to consider that iterators never fail while it is possible that some of them just return erroneous values (the error value being used as a sentinel)?

For some errors, as soon one such error is returned, while designing the iterator, one might decide that this error is always returned so as to indicate that the object of the iteration is in an error state, or one might decide to make the iterator fault-tolerant?

In which case the initial error returning code would be fine? Just check for the error within the loop and either continue or break?

re. Item

Just thinking that maybe, you might want to compose your generic constructors to have specific maps with specific types that implement your desired interface.
If Item[K, V] is too generic, maybe that it should be a type parameter that implements a given interface instead?

Just an idea, not sure it works.

@earthboundkid
Copy link
Contributor

Interesting. Let me be the first to say, welcome to errf. 😆

@haraldrudell
Copy link

haraldrudell commented Jun 10, 2024

I have just such type which I wrote in 2022 or earlier

There is certainly smarter people than me on this thread, but what I can provide is years of experience with trying to get the logic right when it comes to iterators. I wanted them because ECMAScript had them and Go didn’t. I knew how great iterators are

https://pkg.go.dev/github.com/haraldrudell/parl@v0.4.183/pfs#ResultEntry

It is returned as a single tuple-value-type value like suggested above and has value-receiver methods to avoid allocations

It is used to allow the consuming code to decide whether iteration should continue or not, no matter how bad it went. It provides two-way communication with the iterator internals

What you are suggesting is possible and I am already doing it. 99% if the time it isn’t worth it, but I coded it up. The value it has is that you can decide you want to look into the red ones but not the blue ones

That is a different problem from the iterator unilaterally deciding it is a fail. The error pointer makes implementing such decision dead-simple, reliable and cost-free. You can’t beat free

@atdiar
Copy link

atdiar commented Jun 10, 2024

@haraldrudell not bad. Still unsure about the error value pointer.

How is it simpler than assigning to an error variable in the loop body and breaking?

Assuming that the loop iteration returns a pair v, err := range...

e.g.

func checkEverything() (err error) {
    for v, Err := range select.Star() { 
        if Err != nil{
            err = Err
            break
        }
        // ... 
    } 

    return err
} 

Is there a problem that I'm not seeing?

@rsc
Copy link
Contributor Author

rsc commented Jun 12, 2024

As I keep using these, it feels very very wrong that range over some types has one rule and range over other types has a different rule. The Go 1.23 release candidate is upon us. I think it seems clear we should remove the special case for range-over-func.

@rsc rsc modified the milestones: Proposal, Go1.23 Jun 12, 2024
@cherrymui cherrymui added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 12, 2024
@griesemer griesemer self-assigned this Jun 12, 2024
@rsc
Copy link
Contributor Author

rsc commented Jun 12, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to make range-over-func the same as range-over-anything else as far as omitting range variables is concerned. So if you have an iter.Seq2[X, Y] you can write

for x, y := range seq2 { ... }
for x := range seq2 { ... }
for range seq2 { ... }

just like if seq2 were a map. (Today because seq2 is a func the language insists you write for x, _ = range seq2 or for _, _ = range seq2. That’s what’s changing.)

Same for iter.Seq[X] and for range seq instead of for _ = range seq.

(It is okay to submit this change for the release candidate even while the issue is only “likely accept”.)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/592176 mentions this issue: go/types, types2: allow range-over-func to omit iteration variables

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/592295 mentions this issue: spec: allow range-over-func to omit iteration variables

gopherbot pushed a commit that referenced this issue Jun 13, 2024
For #65236.

Change-Id: I63e57c1d8e9765979e9e58b45948008964b32384
Reviewed-on: https://go-review.googlesource.com/c/go/+/592176
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
gopherbot pushed a commit that referenced this issue Jun 13, 2024
For #65236.

Change-Id: I5a11811cc52467ea4446db29c3f86b119f9b2409
Reviewed-on: https://go-review.googlesource.com/c/go/+/592295
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@dominikh
Copy link
Member

A nice side-effect of this is that we won't have to fix gofmt -s, which doesn't currently know it can't simplify blank iteration variables away 😅

@griesemer
Copy link
Contributor

Not a release blocker at this point. The feature has been implemented and documented but still needs approval (likely to come on Thu 6/20). Leaving open until then.

@rsc
Copy link
Contributor Author

rsc commented Jun 24, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to make range-over-func the same as range-over-anything else as far as omitting range variables is concerned. So if you have an iter.Seq2[X, Y] you can write

for x, y := range seq2 { ... }
for x := range seq2 { ... }
for range seq2 { ... }

just like if seq2 were a map. (Today because seq2 is a func the language insists you write for x, _ = range seq2 or for _, _ = range seq2. That’s what’s changing.)

Same for iter.Seq[X] and for range seq instead of for _ = range seq.

(It is okay to submit this change for the release candidate even while the issue is only “likely accept”.)

@rsc rsc changed the title proposal: spec: allow range-over-func to omit iteration variables spec: allow range-over-func to omit iteration variables Jun 24, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/594555 mentions this issue: gopls/internal/analysis/simplifyrange: reenable on range-over-func

gopherbot pushed a commit to golang/tools that referenced this issue Jun 24, 2024
This reverts the functional change of commit 3629652 (CL 588056) that caused
range-over-func to differ from other range operands; but we keep some tests
and tidy up the logic.

It was decided at the 11th hour to permit redundant blanks in range-over-func
statements in the go1.23 spec; see
golang/go#65236 (comment).

Fixes golang/go#67239
Updates golang/go#65236

Change-Id: Ib3c1c535a1107a05f18732e07d7c8844bbac4d1e
Reviewed-on: https://go-review.googlesource.com/c/tools/+/594555
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/596095 mentions this issue: gopls/internal/analysis/simplifyrange: reenable on range-over-func

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/596135 mentions this issue: cmd/compile: verify that rangefunc assigning to no vars works

@griesemer
Copy link
Contributor

This proposal has been implemented and documented. Closing.

gopherbot pushed a commit that referenced this issue Jul 30, 2024
This adds a test for
   for range seq2rangefunc { ... }
and
   for onevar := range seq2rangefunc { ... }

For #65236.

Change-Id: I083f8e4c19eb4ba0d6024d5314ac29d941141778
Reviewed-on: https://go-review.googlesource.com/c/go/+/596135
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Projects
Status: Accepted
Status: Done
Development

No branches or pull requests