New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: redefine range loop variables in each iteration #20733

Open
halleknast opened this Issue Jun 19, 2017 · 24 comments

Comments

Projects
None yet
@halleknast

halleknast commented Jun 19, 2017

As an alternative to #20725 (which was originally about go vet), let the variables of range loops be implicitly redefined in each iteration like in Dart's for loops. That is,

for k, v := range vals {
  // ...
}

should be equivalent to

for k, v := range vals {
  k := k
  v := v
  // ...
}

This will make it "safe" to take the loop variable's addresses as well as capturing the loop variables in nested functions (see #16520).

The proposal could be expanded to vanilla for loops, although that would make it diverge semantically from other languages.

@gopherbot gopherbot added this to the Proposal milestone Jun 19, 2017

@gopherbot gopherbot added the Proposal label Jun 19, 2017

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Jun 19, 2017

Member

We considered this prior to Go 1, but considered it too quickly. We'll definitely reconsider it for Go 2. I actually think there's a duplicate tracking bug for this.

Member

bradfitz commented Jun 19, 2017

We considered this prior to Go 1, but considered it too quickly. We'll definitely reconsider it for Go 2. I actually think there's a duplicate tracking bug for this.

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jun 19, 2017

Contributor
Contributor

davecheney commented Jun 19, 2017

@dsnet

This comment has been minimized.

Show comment
Hide comment
@dsnet

dsnet Jun 19, 2017

Member

And I would argue even catches seasoned Go programmers occasionally. See https://golang.org/cl/40937. It took me a while staring at that logic trying to figure out why the code wasn't doing what it naturally read as.

Member

dsnet commented Jun 19, 2017

And I would argue even catches seasoned Go programmers occasionally. See https://golang.org/cl/40937. It took me a while staring at that logic trying to figure out why the code wasn't doing what it naturally read as.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 20, 2017

Contributor

Adopting this proposal can make performance of perfectly valid code worse, sometimes dramatically. For example, when escape analysis cannot determine it's safe to keep the {k,v} value on stack while it's address is taken. Then we get O(n) heap allocations instead of the current O(1).

Contributor

cznic commented Jun 20, 2017

Adopting this proposal can make performance of perfectly valid code worse, sometimes dramatically. For example, when escape analysis cannot determine it's safe to keep the {k,v} value on stack while it's address is taken. Then we get O(n) heap allocations instead of the current O(1).

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jun 20, 2017

Contributor

@cznic can you give some examples of this. I would expect that most of these cases would actually be bugs because taking the address of the same variable over and over again was the problem in the first place.

Contributor

davecheney commented Jun 20, 2017

@cznic can you give some examples of this. I would expect that most of these cases would actually be bugs because taking the address of the same variable over and over again was the problem in the first place.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 20, 2017

Contributor

@davecheney Consider #20725 (comment) and imagine escape analysis is not able to prove foo only reads via the passed pointer.

Contributor

cznic commented Jun 20, 2017

@davecheney Consider #20725 (comment) and imagine escape analysis is not able to prove foo only reads via the passed pointer.

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jun 20, 2017

Contributor

@cznic

Why would you want to take the address of a copy of the current range value? If i'm not mistaken, any modification to it would be lost.

Contributor

davecheney commented Jun 20, 2017

@cznic

Why would you want to take the address of a copy of the current range value? If i'm not mistaken, any modification to it would be lost.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 20, 2017

Contributor

@davecheney

Why would you want to take the address of a copy of the current range value?

  • To avoid copying a big value when passing it as an argument.
  • To call a pointer receiver method on the value.

If i'm not mistaken, any modification to it would be lost.

Taking the address of a value does not imply intent to modify the pointee.

Contributor

cznic commented Jun 20, 2017

@davecheney

Why would you want to take the address of a copy of the current range value?

  • To avoid copying a big value when passing it as an argument.
  • To call a pointer receiver method on the value.

If i'm not mistaken, any modification to it would be lost.

Taking the address of a value does not imply intent to modify the pointee.

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jun 20, 2017

Contributor

To avoid copying a big value when passing it as an argument.

But the value is already copied as part of the range iteration. Having a separate value escape on every iteration is sub optimal, but hopefully you'll agree it's reasonable to trade a potential performance problem for a very common correctness problem.

To call a pointer receiver method on the value.

True, that is asking a lot of escape analysis and I really don't know the ways that escape analysis and the implicit taking of address interact. But I will note that your example explicitly took the address of the copy of the current index value, so if the intention was to pass a pointer to the value down to a function, you'd want to make sure you were passing a pointer to the original slice element, not a copy. In the code that I have read that is more commonly written as

for i := range vals {
     foo(&vals[i])
}

Taking the address of a value does not imply intent to modify the pointee.

Respectfully, generally it does, as in your method example above.

Contributor

davecheney commented Jun 20, 2017

To avoid copying a big value when passing it as an argument.

But the value is already copied as part of the range iteration. Having a separate value escape on every iteration is sub optimal, but hopefully you'll agree it's reasonable to trade a potential performance problem for a very common correctness problem.

To call a pointer receiver method on the value.

True, that is asking a lot of escape analysis and I really don't know the ways that escape analysis and the implicit taking of address interact. But I will note that your example explicitly took the address of the copy of the current index value, so if the intention was to pass a pointer to the value down to a function, you'd want to make sure you were passing a pointer to the original slice element, not a copy. In the code that I have read that is more commonly written as

for i := range vals {
     foo(&vals[i])
}

Taking the address of a value does not imply intent to modify the pointee.

Respectfully, generally it does, as in your method example above.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 20, 2017

Contributor

foo(&vals[i])

Iff vals is a slice/array. There are also maps and channels.

Respectfully, generally it does, as in your method example above.

Even the innocent looking fmt.Println(value) always takes the address of 'value'. As does every function taking an interface (not only interface{}, any interface) argument. Every assignment to an interface variable takes the address of the RHS (if not already a pointer), etc.

Edit: Added '(if not already a pointer)'.

Contributor

cznic commented Jun 20, 2017

foo(&vals[i])

Iff vals is a slice/array. There are also maps and channels.

Respectfully, generally it does, as in your method example above.

Even the innocent looking fmt.Println(value) always takes the address of 'value'. As does every function taking an interface (not only interface{}, any interface) argument. Every assignment to an interface variable takes the address of the RHS (if not already a pointer), etc.

Edit: Added '(if not already a pointer)'.

@davecheney

This comment has been minimized.

Show comment
Hide comment
@davecheney

davecheney Jun 20, 2017

Contributor

Iff vals is a slice/array. There are also maps and channels.

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received. Again i'm falling back on arguments of correctness over performance.

Even the innocent looking fmt.Println(value) always takes the address of 'value'.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

Contributor

davecheney commented Jun 20, 2017

Iff vals is a slice/array. There are also maps and channels.

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received. Again i'm falling back on arguments of correctness over performance.

Even the innocent looking fmt.Println(value) always takes the address of 'value'.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 20, 2017

Contributor

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received.

Right. But with this proposal there will be, in certain situations, an additional allocation per iteration.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

You're right. The interface runtime struct has the address, but that's not different with this proposal. I misspoke, sorry.

Contributor

cznic commented Jun 20, 2017

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received.

Right. But with this proposal there will be, in certain situations, an additional allocation per iteration.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

You're right. The interface runtime struct has the address, but that's not different with this proposal. I misspoke, sorry.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 21, 2017

Member

@cznic

with this proposal there will be, in certain situations, an additional allocation per iteration.

Sure, but if it actually shows up on a profile there should be an easy workaround:

var bigValue someType
for _, v := range bigValues {
        bigValue = v  // Compiler should be smart enough to elide this copy.
        foo(&bigValue)  // Only one copy of bigValue escapes.
}

It is possible to write correct, efficient code with either definition of range loops; the question is mainly which should be the "default" or "typical" case. I believe that, in real usage, the proposed change would fix far more issues — and remove more workarounds — than it would introduce.

Member

bcmills commented Jun 21, 2017

@cznic

with this proposal there will be, in certain situations, an additional allocation per iteration.

Sure, but if it actually shows up on a profile there should be an easy workaround:

var bigValue someType
for _, v := range bigValues {
        bigValue = v  // Compiler should be smart enough to elide this copy.
        foo(&bigValue)  // Only one copy of bigValue escapes.
}

It is possible to write correct, efficient code with either definition of range loops; the question is mainly which should be the "default" or "typical" case. I believe that, in real usage, the proposed change would fix far more issues — and remove more workarounds — than it would introduce.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 21, 2017

Contributor

@bcmills

It is possible to write correct, efficient code with either definition of range loops;

True, but still the proposal has the potential to cripple existing correct and efficient code. Also, your comment does not consider v.foo(). We've agreed earlier with @davecheney that for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack. And, again, "Then we get O(n) heap allocations instead of the current O(1)."

Contributor

cznic commented Jun 21, 2017

@bcmills

It is possible to write correct, efficient code with either definition of range loops;

True, but still the proposal has the potential to cripple existing correct and efficient code. Also, your comment does not consider v.foo(). We've agreed earlier with @davecheney that for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack. And, again, "Then we get O(n) heap allocations instead of the current O(1)."

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 21, 2017

Member

@cznic

still the proposal has the potential to cripple existing correct and efficient code.

For this proposal (and Go 2 proposals in general), if the proposal is accepted we should either compile existing Go 1 programs as-is or provide a tool that converts them preserving existing semantics. That is: no "existing" Go 1 code should be affected either way.

for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack

Escape analysis of pointer methods is mostly straightforward; you only really need to worry about arguments to interface methods. And note that iterating over a container of n values is already O(n), so doing O(n) memory-management work changes only the constant factors (not the asymptotic behavior).

Member

bcmills commented Jun 21, 2017

@cznic

still the proposal has the potential to cripple existing correct and efficient code.

For this proposal (and Go 2 proposals in general), if the proposal is accepted we should either compile existing Go 1 programs as-is or provide a tool that converts them preserving existing semantics. That is: no "existing" Go 1 code should be affected either way.

for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack

Escape analysis of pointer methods is mostly straightforward; you only really need to worry about arguments to interface methods. And note that iterating over a container of n values is already O(n), so doing O(n) memory-management work changes only the constant factors (not the asymptotic behavior).

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Jun 21, 2017

Contributor

@bcmills

That is: no "existing" Go 1 code should be affected either way.

I don't think this can be achieved without solving the halting problem. But let's assume it can be done. Even then we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language. That can be done, no doubt. Isn't it better to avoid such leaks of abstractions? I think so.

Contributor

cznic commented Jun 21, 2017

@bcmills

That is: no "existing" Go 1 code should be affected either way.

I don't think this can be achieved without solving the halting problem. But let's assume it can be done. Even then we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language. That can be done, no doubt. Isn't it better to avoid such leaks of abstractions? I think so.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 21, 2017

Member

@cznic

we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language.

Why? Generally, the only time an optimization needs to go into a language spec is if it affects the asymptotic behavior of the program, the canonical example being tail-call-elimination in functional languages (which changes memory usage from O(1) to O(N) with the depth of recursion). As I noted above, per-iteration memory-management overhead affects only constant factors: running time for loops is already O(N) with the number of iterations, and peak memory usage is O(1) either way (because the value can be garbage-collected at each iteration).

Member

bcmills commented Jun 21, 2017

@cznic

we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language.

Why? Generally, the only time an optimization needs to go into a language spec is if it affects the asymptotic behavior of the program, the canonical example being tail-call-elimination in functional languages (which changes memory usage from O(1) to O(N) with the depth of recursion). As I noted above, per-iteration memory-management overhead affects only constant factors: running time for loops is already O(N) with the number of iterations, and peak memory usage is O(1) either way (because the value can be garbage-collected at each iteration).

@halleknast

This comment has been minimized.

Show comment
Hide comment
@halleknast

halleknast Jun 22, 2017

@cznic

the proposal has the potential to cripple existing correct and efficient code

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture. I find it quite hard to imaging cases where this would be a requirement for correctness (would appreciate examples). But even if it is, it's easy to make a tool (see #20725; original formulation) that would flag all relevant code for scrutinization as part of a migration process.

As to the performance argument, I believe that if a good escape analysis can't figure out where a pointer is going, then neither can programmers (or it does in fact escape the loop). Thus, if such code is correct, it's probably by accident and not for performance reasons.

halleknast commented Jun 22, 2017

@cznic

the proposal has the potential to cripple existing correct and efficient code

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture. I find it quite hard to imaging cases where this would be a requirement for correctness (would appreciate examples). But even if it is, it's easy to make a tool (see #20725; original formulation) that would flag all relevant code for scrutinization as part of a migration process.

As to the performance argument, I believe that if a good escape analysis can't figure out where a pointer is going, then neither can programmers (or it does in fact escape the loop). Thus, if such code is correct, it's probably by accident and not for performance reasons.

@martisch

This comment has been minimized.

Show comment
Hide comment
@martisch

martisch Jun 22, 2017

Member

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture.

I thought maybe something like the below just with := can be constructed where this happens too:

	x := []int{1, 2, 3, 4, 5}
	var i int
	for x[i], i = range x {
		//...
		i = 2
	}

Edit: on closer look i guess currently := allows only simple variable names so this is not an issue.

Member

martisch commented Jun 22, 2017

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture.

I thought maybe something like the below just with := can be constructed where this happens too:

	x := []int{1, 2, 3, 4, 5}
	var i int
	for x[i], i = range x {
		//...
		i = 2
	}

Edit: on closer look i guess currently := allows only simple variable names so this is not an issue.

@leonklingele

This comment has been minimized.

Show comment
Hide comment
@leonklingele

leonklingele Oct 3, 2017

Contributor

As this does not require changes to the for-range syntax, it is still valid Go 1 code.
What if one tries to run some Go 2 code using such a for-range loop with a Go 1 compiler? I can already see people complaining "this code works fine in Go 2 but misbehaves for whatever reason in Go 1".
This proposal also should add a way to break compilation with Go 1 (e.g. a // +build !go1 annotation) which no one will ever use.

Contributor

leonklingele commented Oct 3, 2017

As this does not require changes to the for-range syntax, it is still valid Go 1 code.
What if one tries to run some Go 2 code using such a for-range loop with a Go 1 compiler? I can already see people complaining "this code works fine in Go 2 but misbehaves for whatever reason in Go 1".
This proposal also should add a way to break compilation with Go 1 (e.g. a // +build !go1 annotation) which no one will ever use.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Nov 29, 2017

Member

Ran across this interesting example today (https://play.golang.org/p/DUstLropfJ):

func (k *key) f(v string) {
	fmt.Printf("%#v.f(%#v)\n", k, v)
}

func main() {
	…
	for k, v := range m {
		defer k.f(v)
	}
}

The interaction between loop variables and implicit pointer receivers makes it possible to accidentally close over a loop variable without taking its address or writing a function literal.

Member

bcmills commented Nov 29, 2017

Ran across this interesting example today (https://play.golang.org/p/DUstLropfJ):

func (k *key) f(v string) {
	fmt.Printf("%#v.f(%#v)\n", k, v)
}

func main() {
	…
	for k, v := range m {
		defer k.f(v)
	}
}

The interaction between loop variables and implicit pointer receivers makes it possible to accidentally close over a loop variable without taking its address or writing a function literal.

@Merovius

This comment has been minimized.

Show comment
Hide comment
@Merovius

Merovius Jan 13, 2018

For posterity, today someone posted on golang-nuts about stumbling into this with (*testing.T).Parallel:
https://groups.google.com/d/msg/golang-nuts/SAZ6wCSLXU0/F-TtRwDCAAAJ
As a further datapoint to fix this in Go2.

With regards to the performance-concerns: I'd expect the difference, in general, to be optimized out. Even if there are still edge-cases: Given the subtlety and prevalence of this problem, I'd strongly favor correctness over performance. If your code is too slow, you can always benchmark and hand-optimize. If your code is subtly incorrect, that's a lot harder to remedy.

Merovius commented Jan 13, 2018

For posterity, today someone posted on golang-nuts about stumbling into this with (*testing.T).Parallel:
https://groups.google.com/d/msg/golang-nuts/SAZ6wCSLXU0/F-TtRwDCAAAJ
As a further datapoint to fix this in Go2.

With regards to the performance-concerns: I'd expect the difference, in general, to be optimized out. Even if there are still edge-cases: Given the subtlety and prevalence of this problem, I'd strongly favor correctness over performance. If your code is too slow, you can always benchmark and hand-optimize. If your code is subtly incorrect, that's a lot harder to remedy.

@davecb

This comment has been minimized.

Show comment
Hide comment
@davecb

davecb May 9, 2018

On 08/05/18 06:13 PM, Bryan C. Mills wrote:

So I see the cost/benefit tradeoff of a breaking change as:
Costs
Update code generators.
Reprogram users' muscle memory.
go fix existing code, which may or may not be necessary anyway (depending on what else goes into Go 2).

Benefits
Prompt users to omit redundant workarounds.
(This proposal.) Make the common / default cases more concise.

It's up to the proposal committee to weigh those costs and benefits; the balance is not obvious to me. 
The point of this proposal is to lay out some benefits on the “breaking change” side that might not be obvious otherwise.

From previous experience with managed change (which dates back to Multics and dinosaurs roaming the earth (:-)) one of the largest problems of trying to maintain compatibility is that it can lead to persons depending on a bug, and thereby preventing it from ever being fixed.

I would argue that when this kind of deranged dependency occurs, that in order one

  1. introduces the new capability,
  2. provides a means of migration from old to new,
    
  3. marks the old as deprecated, and
    
  4. withdraws the old
    

In the special case of linked-library changes, the caller may have lost the source code they need to change, so we instead made the last step

  1. require positive action to continue to use the old (in our case that was a linker option)

Go's case is not as hard: if you can do steps 1-4, then I strongly recommend you do so.

--dave

davecb commented May 9, 2018

On 08/05/18 06:13 PM, Bryan C. Mills wrote:

So I see the cost/benefit tradeoff of a breaking change as:
Costs
Update code generators.
Reprogram users' muscle memory.
go fix existing code, which may or may not be necessary anyway (depending on what else goes into Go 2).

Benefits
Prompt users to omit redundant workarounds.
(This proposal.) Make the common / default cases more concise.

It's up to the proposal committee to weigh those costs and benefits; the balance is not obvious to me. 
The point of this proposal is to lay out some benefits on the “breaking change” side that might not be obvious otherwise.

From previous experience with managed change (which dates back to Multics and dinosaurs roaming the earth (:-)) one of the largest problems of trying to maintain compatibility is that it can lead to persons depending on a bug, and thereby preventing it from ever being fixed.

I would argue that when this kind of deranged dependency occurs, that in order one

  1. introduces the new capability,
  2. provides a means of migration from old to new,
    
  3. marks the old as deprecated, and
    
  4. withdraws the old
    

In the special case of linked-library changes, the caller may have lost the source code they need to change, so we instead made the last step

  1. require positive action to continue to use the old (in our case that was a linker option)

Go's case is not as hard: if you can do steps 1-4, then I strongly recommend you do so.

--dave

@iWdGo

This comment has been minimized.

Show comment
Hide comment
@iWdGo

iWdGo Jul 17, 2018

Contributor

About the performance loss, I have optimized in various ways the usual Reverse Polish Calculator (https://github.com/iWdGo/postfixcalculator). The best solution is obtained by emulating a Turing machine. The README explains the summary.

To avoid the loop overhead, the counter is directly accessed. This would be criticized by many programming standards. In my view, such a possibility is very useful as the compiler creates very efficient for loops.

The suggestion is to add to the above proposal to evolve the for loop, another case which would be a Turing machine-like loop where re-definition never occurs..

Contributor

iWdGo commented Jul 17, 2018

About the performance loss, I have optimized in various ways the usual Reverse Polish Calculator (https://github.com/iWdGo/postfixcalculator). The best solution is obtained by emulating a Turing machine. The README explains the summary.

To avoid the loop overhead, the counter is directly accessed. This would be criticized by many programming standards. In my view, such a possibility is very useful as the compiler creates very efficient for loops.

The suggestion is to add to the above proposal to evolve the for loop, another case which would be a Turing machine-like loop where re-definition never occurs..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment