Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: spec: for range with defined types #47707

Open
leighmcculloch opened this issue Aug 15, 2021 · 18 comments
Open

proposal: Go 2: spec: for range with defined types #47707

leighmcculloch opened this issue Aug 15, 2021 · 18 comments
Labels
Go2 A change that can only be done in Go 2 LanguageChange Proposal
Milestone

Comments

@leighmcculloch
Copy link
Contributor

leighmcculloch commented Aug 15, 2021

The proposal is to allow any defined type to be iterated on using the for with range clause if that type loosely implements the following interface. Its Do function is called with the body of the for block as the argument of the function parameter, where breaks and returns inside the for block are translated into returning false inside the function, and continues are translated into returning true.

type Rangeable interface {
    Do(f func(...) bool)
}

The parameters of f are not defined because they can be zero or more parameters of any types, and those parameters would map to being iteration variables of the for block.

This would allow user-defined types to be iterated with a for block. The most relevant and useful example is the container/set type currently being discussed in #47331:

package sets

type Set[Elem] struct {
    m map[Elem]struct{}
}

func (s Set) Do(f func(e Elem) bool) {
    for k := range s.m {
        stop := f(k)
        if stop {
            return
        }
    }
}
func main() {
    s := sets.Set{}
    // ...
    for e := range s {
        // ...
    }
}

Another trivial example is types such as a numerical range or interval:

package rng

type Interval struct {
    Start int
    Finish int
    Step int
}

func (i Interval) Do(f func(v int) bool) {
    for j := i.Start; j < i.Finish; j+=i.Step {
        stop := f(j)
        if stop {
            return
        }
    }
}
func main() {
    r := &rng.Interval{Start: 0, Finish: 100; Step: 2}
    for i := range r {
        // ...
    }
}

Why

User-defined types cannot be iterated in a way that retains control of the current scope using the return, break, continue keywords. This results in two approaches to iteration being used for built-ins vs defined types. This is already evident in discussions about how the containers/set package may be used and will become more relevant as folks write more collection types once generics are released.

For example, to search a Set type for an element and exit early, code like the below would need to be written:

func _() (Elem, bool) {
    set := sets.Set{}
    // ...
    found := false
    var foundE Elem
    set.Do(func(e Elem) bool {
        if condition {
            found = true
            foundE = e
            return false
        }
        return true
    })
    return foundE, found
}

That code is significantly harder to understand than if the Set type supported iteration using this proposal:

func _() (Elem, bool) {
    set := sets.Set{}
    // ...
    for e := range set {
        if condition {
            return e, true
        }
    }
    return Elem{}, false
}

Prior Discussion

I can't take credit for the ideas that form this proposal. As far as I know the ideas were first suggested by @rsc and further elaborated on by @Merovius. The ideas were shared in response to the problem of how do we make iteration of new generic data structure types, like container/set, as simple as built-in types that can use for range.

That said, if we were going to allow the use of range syntax on user-defined types, the best way I can see to do it would be to compile the syntax into a call to the Do function.

-- @rsc #47331 (reply in thread)

AUI such an implementation would rewrite

for v := range x {
    if cond1 { break }
    if cond2 { continue }
    if cond3 { return a }
}

into something akin to

var (
    _tmp_ret bool
    _tmp_a A
)
x.Do(func(v B) bool {
    if cond1 { return false }
    if cond2 { return true }
    if cond3 {
        _tmp_a, _tmp_ret = a, true
        return false
    }
})
if _tmp_ret {
    return _tmp_a
}

I think you'll find that this construct could do everything a normal range loop could do.

-- @Merovius #47331 (reply in thread)

Further Details

This proposal changes the Go spec's description of the range expression so that in addition to the range expression being allowed to be an array, pointer to an array, slice, string, map, or channel permitting receive operations, it may also be any defined type that has a Do function with one func parameter discussed below, and a single bool return value.

If a value that is a defined type is provided as a range expression and does not have a Do function, it is a compiler error.

If a value that is a defined type is provided as a range expression and it has a Do function, the one func argument of the Do function is called for each iteration. The body of the for with range block becomes the func argument given for the one func parameter of the Do function.

In the same way that a for with range may have zero or more iteration variables up to the number of iteration values for the built-in types, a for with range of a defined type may also have zero or more iteration variables up to the number of parameters that the Do function's signature defines in its one func parameter.

In the same way that iterations are defined by the built-in types to mean different things for slices, maps, channels, etc, a defined type also defines an iteration for itself the spec doesn't limit or constrain how a defined type might iterate. For example, a container/set-like type may iterate for each element of the set. Or, a type representing a numerical range or interval, may iterate for each interval in the range with some defined step.

The bool return value of the Do function signals if iteration should continue. breaks or returns in a for with range block are translated into returning false inside the function passed to the Do function and setting necessary temporary variables to communicate back the intention to return. See @Merovius's example above.

Proposal template
  • Would you consider yourself a novice, intermediate, or experienced Go programmer?
    Experienced

  • What other languages do you have experience with?
    Java, Ruby, C#, C, JavaScript

  • Would this change make Go easier or harder to learn, and why?
    A little harder since there's one more thing to learn, that implementing the interface supports iteration.

  • Has this idea, or one like it, been proposed before? If so, how does this proposal differ?
    I couldn't find a proposal issue that was identical. There are some proposals that attempt to do similar things:

    • proposal: Go 2: iterators #40605 - Provides a Next function that can be called repeatedly, but it is less adaptable to different types of iterables since it is limited to a single element/item being returned on each iteration.
    • proposal: Go 2: function values as iterators #43557 - Allow functions to be a range expression and used for iteration, however if a type is iterable, it seems clearer and easier to understand if the type itself can be passed to the range expression rather than a function generated by the type.
  • Who does this proposal help, and why?
    Anyone building data structures that can be iterated, and anyone using them so that their iteration code is simple and much the same to iterating any built-in type.

  • What is the proposed change?
    See above.

    • Please describe as precisely as possible the change to the language.
      See above.

    • What would change in the language spec?
      See above.

    • Please also describe the change informally, as in a class teaching Go.
      See above.

  • Is this change backward compatible?
    Yes

  • Show example code before and after the change.
    See above.

  • What is the cost of this proposal? (Every language change has a cost).

    • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
      I'm not sure.
    • What is the compile time cost?
      Little.
    • What is the run time cost?
      None.
  • Can you describe a possible implementation?
    See above.

    • Do you have a prototype? (This is not required.)
      No.
  • How would the language spec change?
    See above.

  • Orthogonality: how does this change interact or overlap with existing features?
    It builds on the existing for range semantics without changing existing semantics. It supports new data structures such as container/set such that they may be iterated in the same way as built in types, maintaining consistency for use.

  • Is the goal of this change a performance improvement?
    No.

    • If so, what quantifiable improvement should we expect?
    • How would we measure it?
  • Does this affect error handling?
    No.

  • Is this about generics?
    No. But related, because generics appear to be driving general data structures, and that's raising the issue of iteration.

@gopherbot gopherbot added this to the Proposal milestone Aug 15, 2021
@leighmcculloch
Copy link
Contributor Author

leighmcculloch commented Aug 15, 2021

(defer would be more challenging 🤔 Anyway, the point isn't really to come up with a full proposal for this, but just to show general feasibility)

-- #47331 (reply in thread)

@Merovius Building on your example, could defers work something like this? Assuming the ?'s are replaced with code for the internal types are for defers.

for v := range x {
    if cond1 { break }
    if cond2 { continue }
    if cond3 { return a }
    if cond4 { defer z.Push(v) }
}
var (
    _tmp_ret bool
    _tmp_a A
    _tmp_defers []?
)
x.Do(func(v B) bool {
    if cond1 { return false }
    if cond2 { return true }
    if cond3 {
        _tmp_a, _tmp_ret = a, true
        return false
    }
    if cond4 {
        _tmp_defers = append(_tmp_defers, ?z.Push(v)?)
    }
})
if _tmp_ret {
    return _tmp_a
}
for i := len(_tmp_defers)-1; i >= 0; i-- {
    defer ?_tmp_defers[I]?
}

@Merovius
Copy link
Contributor

Merovius commented Aug 15, 2021

Personally, I don't really see the need to make range work with defined types. I think iterators (like sql.Rows) produce well-readable, efficient code, without a language change (whereas the Do convention is really inconvenient without this language change). And they are far more flexible - e.g. it is easy to add error-handling to them, support key/value iteration or add other, more special data (like (*sql.Rows).Columns).

So, personally, I would rather if we focus on using iterators for a generic iteration interface, gain some experience with what methods are common and what we do need from them. And then, maybe, make a subset of them usable with range - though, again, the need to do that isn't very large with iterator-types, as they work well on their own.

That being said, I'm not strongly opposed to this proposal.

One thing the proposal text doesn't answer is, what would happen in this code:

type Ranger struct {
    f func() bool
}

func (r *Ranger) Do(f func(v int) bool) {
    r.f = f
    return
}

func F() func() bool {
    r := new(Ranger)
    for _ = range r {
        defer func() { fmt.Println("Hello world") } ()
    }
    return r.f
}

func main() {
    f := F()
    f() // ???
}

This uses defer because it creates a pretty immediate problem that needs answering - but IMO this is an issue with the general Do API. By using an autogenerated closure, you have to deal with the question of what happens if the function you call retains that func. I think that's a novel problem, because we don't have any language features which call user-defined code so far, so we have far more flexibility in implementation options. Note, for example, that we specifically don't allow user-code to run between fork(2) and exec(2), because of the issues of how that would interact with the runtime.

Assuming the ?'s are replaced with code for the internal types are for defers.

I think for illustrative purposes, that is just func().

@ianlancetaylor ianlancetaylor changed the title proposal: spec: for range with defined types proposal: Go 2: spec: for range with defined types Aug 15, 2021
@ianlancetaylor ianlancetaylor added Go2 A change that can only be done in Go 2 LanguageChange labels Aug 15, 2021
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Aug 15, 2021

We need to consider how to handle goto statements in the range loop.

@deanveloper
Copy link

deanveloper commented Aug 15, 2021

This doesn’t show what any iterators might look like, which is where the fault comes in. Here is an example of what an iterator would look like:

func (s SomeType) Do(forEach func (i int) bool) {
    var i int
    for {
        ok := forEach(i)
        if !ok {
            break
        }
    }
}

The issue with the Do method is that an improperly defined iterator (ie, one that forgets to check the ok variable) will continue calling the forEach function until iteration completes. In my example, the iteration would never complete because it’s building a sequence. This means that if you try to return inside of of one of these improperly-defined iterators, it simply won’t work because control flow is entirely in control of the Do function. This is not a problem that other iterator patterns have, as typically the caller of an iterator is in charge of control flow.

Also, as I mentioned in #43557 (comment) it would be much better to have a github discussion for this instead of multiple competing proposals.

@deanveloper
Copy link

deanveloper commented Aug 15, 2021

This also only supports iterating structures, whereas I may want an iterator which is defined at the top-level, like:

func Fibonacci(forEach func(i int) bool) {
    …
}

It is also worth mentioning that @DeedleFake had already proposed (and I supported) a similar concept in #43557 (comment), which was a more-flexible version of this proposal, but hadn’t thought of an ok bool return.

The main issue that I highlighted earlier had already been brought up in the other proposal (however I translated “issue with the proposal” to “issue when we forget to check the ok variable”)

@leighmcculloch
Copy link
Contributor Author

leighmcculloch commented Aug 15, 2021

Personally, I don't really see the need to make range work with defined types. I think iterators (like sql.Rows) produce well-readable, efficient code, without a language change (whereas the Do convention is really inconvenient without this language change). And they are far more flexible - e.g. it is easy to add error-handling to them, support key/value iteration or add other, more special data (like (*sql.Rows).Columns).

@Merovius I think this is a good point. Iterators are very flexible and they address the problem of how to control the current scope well during iteration. The sql.Rows example is a little different in that it is a scanning pattern which is not quite the same as iterating over known types and values, but that is definitely evidence of that patterns flexibility.

The Do function has problems when used on its own as a closure, but I think it is easier to integrate Do into for rather than a two function Next and Scan.

If generics mean that there will be a shift towards defined types, such as container/set for collections rather than using built-in types I think there is value in considering how range can be used to iterate defined types.

@leighmcculloch
Copy link
Contributor Author

leighmcculloch commented Aug 15, 2021

By using an autogenerated closure, you have to deal with the question of what happens if the function you call retains that func.

@Merovius I think in this case the rewritten function is captured which captures the scope of the function containing the for loop like it would if the user referenced variables in that scope inside the closure. That would be odd, but storing a function inside an iterator like you demonstrated is likely to have strange side-effects in any situation. Would you expect a function stored in that case to work differently?

I think the main oddity is that defers could be lifted, and you could set state that affected gotos without necessarily exiting out if the caller of the rewritten function chose not to exit early, as in @deanveloper's example #47707 (comment).

If Go had a func type that didn't have its own scope, similar to blocks in Ruby, all of these problems would probably be alleviated because we could simple say that it is a block and not a closure, but I assume that would be a huge language change and I don't know how I feel about that to even suggest it.

@leighmcculloch
Copy link
Contributor Author

leighmcculloch commented Aug 15, 2021

We need to consider how to handle goto statements in the range loop.

@ianlancetaylor I think the go-to case is workable by setting a temp bool and exiting early. For example:

Label:
// ...
for v := range x {
    if cond1 { break }
    if cond2 { continue }
    if cond3 { return a }
    if cond4 { defer z.Push(v) }
    if cond5 { goto Label }
}
var (
    _tmp_ret bool
    _tmp_a A
    _tmp_defers []?
    _tmp_goto bool
)
x.Do(func(v B) bool {
    if cond1 { return false }
    if cond2 { return true }
    if cond3 {
        _tmp_a, _tmp_ret = a, true
        return false
    }
    if cond4 {
        _tmp_defers = append(_tmp_defers, ?z.Push(v)?)
    }
    if cond5 {
        _tmp_goto = true
        return false
    }
})
for i := len(_tmp_defers)-1; i >= 0; i-- {
    defer ?_tmp_defers[I]?
}
if _tmp_ret {
    return _tmp_a
}
if _tmp_goto {
    goto Label
}

@Merovius
Copy link
Contributor

Merovius commented Aug 15, 2021

@leighmcculloch

I think in this case the rewritten function is captured which captures the scope of the function containing the for loop like it would if the user referenced variables in that scope inside the closure.

You seem to be ignoring the defer. The problem isn't capturing over the scope, but that the control flow is ill-defined.
By definition, the defered function should be run when F returns. However, the loop-body is then re-executed, when the closure is called again in main. How is that supposed to work?

This wouldn't be a problem if defer would be scoped to the loop-body, as then it could just be translated into a defer in the generated function. But it isn't, it's scoped the encompassing function.

@ianlancetaylor brought up goto, which has very similar problems. If you do

func type Ranger struct {
    f func() bool
}

func (r *Ranger) Do(f func(v int) bool) {
    r.f = f
    return
}

func F() func() bool {
    r := new(Ranger)
    for _ = range r {
        goto Foo
    }
Foo:
    fmt.Println("Hello world")
    return r.f
}

func main() {
    f := F()
    f()
}

would the call to f jump back into F? After all, you are executing the loop-body, which contains a goto.

This isn't just an issue of closing over variables.

@scott-cotton
Copy link

scott-cotton commented Aug 15, 2021

From the first sentence of the first comment in this issue:

The proposal is to allow any defined type to be iterated on using the for with range clause if that type loosely implements the following interface.

There are many possible designs to make this work. But this issue only explores using "Do", which has some issues.

Would proposing alternatives be appropriate here or in another proposal? (For example, I'm thinking Ranger like this:


type Iter [Index, Element any] interface {
     Next() (Index, Element, bool) 
}
type Ranger [Index, Element any] interface {
     Range() Iter[Index, Element]
}

```) 

The above at least would fit the analysis ssa model.

If that or other alternatives have been discussed, where are they?

@deanveloper
Copy link

deanveloper commented Aug 15, 2021

@wsc0 A similar idea was mentioned in #43557 (and a new one was brought up as an alternative, which I personally like a lot better)

@leighmcculloch
Copy link
Contributor Author

leighmcculloch commented Aug 15, 2021

Would proposing alternatives be appropriate here or in another proposal?

I think it's easier to discuss different ideas in different issues. We can still cross reference them. Could you open a new issue if you have an alternative proposal?

type Iter [Index, Element any] interface {
Next() (Index, Element, bool)
}
type Ranger [Index, Element any] interface {
Range() Iter[Index, Element]
}

@wsc0 This interface only satisfies iterating with an index and an element. For supports iterating with other values, the spec refers to them as iteration variables and they can be different depending on the type being ranged over. For example, how would a type that operates like a channel work with it since it iterates with no index?

@scott-cotton
Copy link

scott-cotton commented Aug 15, 2021

@leighmcculloch

I just took a look at #43557, which has a nice overview of different ideas, and made me realise having a separate Iter type is more verbose. I was indeed thinking of something similar to #43557 , so I'll not open another issue.

Of course, the single value return would need to be treated differently -- that was just a sketch. Thanks for pointing it out nonetheless.

@deanveloper
Copy link

deanveloper commented Sep 1, 2021

Something else that this proposal seems to leave out is how to pass around iterators. For instance, how would I write the equivalent of Map[T, U any](Iterator[T], func(T) U) Iterator[U] under this proposal? It seems impossible without goroutines and channels. Unfortunately, goroutines and channels have very significant overhead (#43557 (comment)), being multiple orders of magnitude less efficient than function-based iterators. But if our solution is to fix these performance issues, then we may as well just use channels as our iterators instead of functions.

@Merovius
Copy link
Contributor

Merovius commented Sep 2, 2021

@deanveloper

type Collection[Elem any] interface {
    Do(func(Elem) bool)
}

type mapper[T, U any] struct {
    c Collection[T]
    f func(T) U
}

func (m mapper[T, U]) Do(f func(U) bool) {
    m.c.Do(func(t T) bool {
        return f(m.f(t))
    })
}

func Map[T, U any](c Collection[T], f func(T) U) Collection[U] {
    return mapper{c, f}
}

(I called it Collection instead of Iterator, as that seems a more appropriate name for the concept under this proposal)

@deanveloper
Copy link

deanveloper commented Sep 2, 2021

@Merovius My bad - the example I was thinking of where this would have trouble wasn't a map operation, but a zip operation. The issue wasn't being able to pass around the iterator, but being able to iterate over two things at the same time. I think that it would not be possible without goroutines.

(I called it Collection instead of Iterator, as that seems a more appropriate name for the concept under this proposal)

Collection is still a bit weird considering that a mapper isn't really a Collection... Same thing for infinite "rangers" like sequences. In Java this concept is named Iterable.

@Merovius
Copy link
Contributor

Merovius commented Sep 2, 2021

@deanveloper

I agree that zipping seems hard without goroutines.

@Merovius
Copy link
Contributor

Merovius commented Sep 21, 2022

As this was just brought to my attention again:

@leighmcculloch

@Merovius Building on your example, could defers work something like this? Assuming the ?'s are replaced with code for the internal types are for defers.

for v := range x {
    if cond1 { break }
    if cond2 { continue }
    if cond3 { return a }
    if cond4 { defer z.Push(v) }
}
var (
    _tmp_ret bool
    _tmp_a A
    _tmp_defers []?
)
x.Do(func(v B) bool {
    if cond1 { return false }
    if cond2 { return true }
    if cond3 {
        _tmp_a, _tmp_ret = a, true
        return false
    }
    if cond4 {
        _tmp_defers = append(_tmp_defers, ?z.Push(v)?)
    }
})
if _tmp_ret {
    return _tmp_a
}
for i := len(_tmp_defers)-1; i >= 0; i-- {
    defer ?_tmp_defers[I]?
}

This doesn't work, no. Consider

for x := range c {
    defer fmt.Println("foo")
    panic(nil)
}

This should print "foo" for a non-empty container and then panic. But with that rewrite, it won't print anything, as it panics before the defer is pushed.

I know that I claimed that this kind of rewrite would work, but after some time and all this discussion, I really think it's at the very least extremely complex. I would go so far as to say I was mistaken to suggest it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Go2 A change that can only be done in Go 2 LanguageChange Proposal
Projects
None yet
Development

No branches or pull requests

6 participants