redefining for loop variable semantics #56010
Replies: 50 comments 241 replies
-
I work on the C# team and can offer perspective here. The C# 5 rollout unconditionally changed the This change was not taken lightly. It had been discussed internally for several years, blogs were written about it, lots of analysis of customer code, upper management buy off, etc ... In end though the change was rather anticlimactic. Yes it did break a small number of customers but it was smaller than expected. For the customers impacted they responded positively to our justifications and accepted the proposed code fixes to move forward. I'm one of the main people who does customer feedback triage as well as someone who helps customers migrating to newer versions of the compiler that stumble onto unexpected behavior changes. That gives me a good sense of what pain points exist for tooling migration. This was a small blip when it was first introduced but quickly faded. Even as recently as a few years ago I was helping large code bases upgrade from C# 4. While they do hit other breaking changes we've had, they rarely hit this one. I'm honestly struggling to remember the last time I worked with a customer hitting this. It's been ~10 years since this change was taken to the language and a lot has changed in that time. Projects have a property
This separation has been very successful for us and allowed us to make changes that would not have been possible in the past. If we were doing this change today we'd almost certainly tie the break to a |
Beta Was this translation helpful? Give feedback.
-
IIRC, I think the meaning of an absent go.mod or absent
A somewhat minor detail, but as part of this discussion, it might be nice to explicitly state either that assumed version won’t change again, or that if this range variable change was hypothetically in go 1.30, then a missing go.mod or missing |
Beta Was this translation helpful? Give feedback.
-
We actually had an in-depth discussion about precisely this problem this morning in the Go Brasil telegram group (we are now joking you guys are spying on us hehe). One thing to notice in this discussion is that even after having this problem explained multiple times by different people multiple developers were still having trouble understanding exactly what caused it and how to avoid it, and even the people that understood the problem in one context often failed to notice it would affect other types of for loops. So we could argue that the current semantics make the learning curve for Go steeper. PS: I have also had problems with this multiple times, once in production, thus, I am very in favor of this change even considering the breaking aspect of it. |
Beta Was this translation helpful? Give feedback.
-
I'm totally on board with changing The case for changing the 3-clause for loop is less clear to me. I'm worried about code which intentionally modifies the loop variable in the loop body. E.g. for i := 0; i < len(a); i++ {
if a[i] == "" {
a = append(a[:i], a[i+1:]...)
i--
}
} Maybe not the best example -- but the point is that such code, while rare, almost certainly exists and could be silently broken by this change. Note that this argument does not apply to I suppose the fix in this case is to move the variable declaration out of the for loop, though that seems almost as inscrutable a change as adding -for i := 0; i < len(a); i++ {
+i := 0
+for ; i < len(a); i++ { In general, 3-clause for loops are what you reach for when doing something "tricky". The exception, of course, is when you want a simple loop over a range of integers, in which case the 3-clause form is the only option. If there were a dedicated |
Beta Was this translation helpful? Give feedback.
-
vendored dependenciesHow would vendored dependencies be handled? My concern is that since |
Beta Was this translation helpful? Give feedback.
-
If we do this, I would like us to be extra-loud about it. That is, if I'd also like a tool that removes all my |
Beta Was this translation helpful? Give feedback.
-
For what it's worth, I really like this change. I can easily teach to this and it strengthens the value semantic aspects of that loop. The ideas of how to use go.mod seem reasonable and valid. |
Beta Was this translation helpful? Give feedback.
-
This is a good change and would make using table tests a lot easier to handle due to the per-instance rather than per-loop expectations. As a further enhancement, perhaps better discussed in a different topic, would it be possible for the Go compiler to flag when such upgrades will change the way the code works and inform the users about it? In your example:
What do you think? Edit: |
Beta Was this translation helpful? Give feedback.
-
There are two aspects of this that I'd like more details on, either as part of this discussion or in a future proposal:
|
Beta Was this translation helpful? Give feedback.
-
I am glad to see this. Currently the However, for this transition, this won't be the case. Am I understanding it correct? More specifically, in the example that hypothetically assume the change is made in go 1.30, I think any attempt to compile or import the work module (with And non-module based build systems (e.g. bazel) will also need a plan to move forward. |
Beta Was this translation helpful? Give feedback.
-
The proposal for the 3-clause form, with implied copies at the start and end of the iteration, seems inherently racy in the presence of goroutines that outlive the iteration. Is there any remedy for this, or do we live with the raciness of possibly changing the variable before the iteration proceeds? |
Beta Was this translation helpful? Give feedback.
-
If theres a desire to de-risk this against breakage to existing code, perhaps one option would be to make capturing a loop variable a compilation error for a version or two?
|
Beta Was this translation helpful? Give feedback.
-
I like this change and think it would eliminate one of the biggest footguns in Go. Is it possible to mechanically rewrite old code so that it compiles, under the new semantics, to have exactly the pre-change behavior? Granted, this tool would need to be conservative in the cases where a static analysis tool can't be sure whether the change in For example,
is probably broken code. But suppose that we wanted to compile this under the proposed new
(assuming that
If such a rewrite tool existed, then a conservative workflow for transitioning a project to the new
Even better would be for the rewrite tool to delete any now-provably-unnecessary "foo := foo" lines, so that old code can benefit from this simplification. Such a tool would make it easy for people to find the places in their code that might be affected by the new semantics, Though I also see some danger that many people might just commit the rewritten code, thus perpetuating any pre-existing bugs that might otherwise have been fixed by the transition to the new semantics. Therefore, another option might be for the "rewrite" tool to insert commented-out code, or just add comments that highlight the places where the behavior might change under the new |
Beta Was this translation helpful? Give feedback.
-
Not sure, but I am worry about that this change will work correctly when using "go generate" between mixed versions. For example, if we generate Go codes for an older version with this change in effect. Of course, this is the generator's responsibility to consider older versions, but we believe it must be considered. |
Beta Was this translation helpful? Give feedback.
-
It could be helpful to provide a link to this related prior work around determining the scope of the problem on GitHub. |
Beta Was this translation helpful? Give feedback.
-
Update 2023-06-06: Go 1.21 is expected to support
GOEXPERIMENT=loopvar
as a way of trying out these new semantics. See #57969 and https://go.dev/wiki/LoopvarExperiment.We have been looking at what to do about the for loop variable problem (#20733), gathering data about what a change would mean and how we might deploy it. This discussion aims to gather early feedback about this idea, to understand concerns and aspects we have not yet considered. Thanks for keeping this discussion respectful and productive!
To recap #20733 briefly, the problem is that loops like this one don’t do what they look like they do:
That is, this code has a bug. After this loop executes,
all
containslen(items)
identical pointers, each pointing at the sameItem
, holding the last value iterated over. This happens because the item variable is per-loop, not per-iteration:&item
is the same on every iteration, anditem
is overwritten on each iteration. The usual fix is to write this instead:This bug also often happens in code with closures that capture the address of item implicitly, like:
This code prints 3, 3, 3, because all the closures print the same v, and at the end of the loop, v is set to 3. Note that there is no explicit &v to signal a potential problem. Again the fix is the same: add v := v.
Goroutines are also often involved, although as these examples show, they need not be. See also the Go FAQ entry.
We have talked for a long time about redefining these semantics, to make loop variables per-iteration instead of per-loop. That is, the change would effectively be to add an implicit “x := x” at the start of every loop body for each iteration variable x, just like people do manually today. Making this change would remove the bugs from the programs above.
In the Go 2 transitions document we gave the general rule that language redefinitions like what I just described are not permitted. I believe that is the right general rule, but I have come to also believe that the for loop variable case is strong enough to motivate a one-time exception to that rule. Loop variables being per-loop instead of per-iteration is the only design decision I know of in Go that makes programs incorrect more often than it makes them correct. Since it is the only such design decision, I do not see any plausible candidates for additional exceptions.
To make the breakage completely user controlled, the way the rollout would work is to change the semantics based on the go line in each package’s go.mod file, the same line we already use for enabling language features (you can only use generics in packages whose go.mod says “go 1.18” or later). Just this once, we would use the line for changing semantics instead of for adding a feature or removing a feature.
If we hypothetically made the change in go 1.30, then modules that say “go 1.30” or later get the per-iteration variables, while modules with earlier versions get the per-loop variables:
In a given code base, the change would be “gradual” in the sense that each module can update to the new semantics independently, avoiding a bifurcation of the ecosystem.
The specific semantics of the redefinition would be that both range loops and three-clause for loops get per-iteration variables. So in addition to the program above being fixed, this one would be fixed too:
In the 3-clause form, the start of the iteration body copies the per-loop
i
into a per-iterationi
, and then the end of the body (or any continue statement) copies the current value of the per-iterationi
back to the per-loopi
. Unless a variable is captured like in the above example, nothing changes about how the loop executes.Adjusting the 3-clause form may seem strange to C programmers, but the same capture problems that happen in range loops also happen in three-clause for loops. Changing both forms eliminates that bug from the entire language, not just one place, and it keeps the loops consistent in their variable semantics. That consistency means that if you change a loop from using range to using a 3-clause form or vice versa, you only have to think about whether the iteration visits the same items, not whether a subtle change in variable semantics will break your code. It is also worth noting that JavaScript is using per-iteration semantics for 3-clause for loops using let, with no problems.
I think the semantics are a smaller issue than the idea of making this one-time gradual breaking change. I’ve posted this discussion to gather early feedback on the idea of making a change here at all, because that’s something we’ve previously treated as off the table.
I’ve outlined the reasons I believe this case merits an exception below. I’m hoping this discussion can surface concerns, good ideas, and other feedback about the idea of making the change at all (not as much the semantics).
I know that C# 5 made this change as well, but I’ve been unable to find any retrospectives about how it was rolled out or how it went. If anyone knows more about how the C# transition went or has links to that information, please post that too. Thanks!
The case for making the change:
A decade of experience shows the cost of the current semantics
I talked at Gophercon once about how we need agreement about the existence of a problem before we move on to solutions. When we examined this issue in the run up to Go 1, it did not seem like enough of a problem. The general consensus was that it was annoying but not worth changing.
Since then, I suspect every Go programmer in the world has made this mistake in one program or another. I certainly have done it repeatedly over the past decade, despite being the one who argued for the current semantics and then implemented them. (Sorry!)
The current cures for this problem are worse than the disease.
I ran a program to process the git logs of the top 14k modules, from about 12k git repos and looked for commits with diff hunks that were entirely “x := x” lines being added. I found about 600 such commits. On close inspection, approximately half of the changes were unnecessary, done probably either at the insistence of inaccurate static analysis, confusion about the semantics, or an abundance of caution. Perhaps the most striking was this pair of changes from different projects:
One of these two changes is unnecessary and the other is a real bug fix, but you can’t tell which is which without more context. (In one, the loop variable is an interface value, and copying it has no effect; in the other, the loop variable is a struct, and the method takes a pointer receiver, so copying it ensures that the receiver is a different pointer on each iteration.)
And then there are changes like this one, which is unnecessary regardless of context (there is no opportunity for hidden address-taking):
This kind of confusion and ambiguity is the exact opposite of the readability we are aiming for in Go.
People are clearly having enough trouble with the current semantics that they choose overly conservative tools and adding “x := x” lines by rote in situations not flagged by tools, preferring that to debugging actual problems. This is an entirely rational choice, but it is also an indictment of the current semantics.
We’ve also seen production problems caused in part by these semantics, both inside Google and at other companies (for example, this problem at Let’s Encrypt). It seems likely to me that, world-wide, the current semantics have easily cost many millions of dollars in wasted developer time and production outages.
Old code is unaffected, compiling exactly as before
The go lines in go.mod give us a way to guarantee that all old code is unaffected, even in a build that also contains new code. Only when you change your go.mod line do the packages in that module get the new semantics, and you control that. In general this one reason is not sufficient, as laid out in the Go 2 transitions document. But it is a key property that contributes to the overall rationale, with all the other reasons added in.
Changing the semantics is usually a no-op, and when it’s not, it fixes buggy code far more often than it breaks correct code
We built a toolchain with the change and tested a subset of Google’s Go tests and analyzed the resulting failures. The rate of new test failures was approximately 1 in 2,000, but nearly all were previously undiagnosed actual bugs. The rate of spurious test failures (correct code actually broken by the change) was 1 in 50,000.
To start, there were only 58 failures out of approximately 100,000 tests executed, covering approximately 1.3M for loops. Of the failures, 36 (62%) were tests not testing what they looked like they tested because of bad interactions with t.Parallel: the new semantics made the tests actually run correctly, and then the tests failed because they found actual latent bugs in the code under test. The next most common mistake was appending &v on each iteration to a slice, which makes a slice of N identical pointers. The rest were other kinds of bugs canceling out to make tests pass incorrectly. We found only 2 instances out of the 58 where code correctly depended on per-loop semantics and was actually broken by the change. One involved a handler registered using once.Do that needed access to the current iteration’s values on each invocation. The other involved low-level code running in a context when allocation is disallowed, and the variable escaped the loop (but not the function), so that the old semantics did not allocate while the new semantics did. Both were easily adjusted.
Of course, there is always the possibility that Google’s tests may not be representative of the overall ecosystem’s tests in various ways, and perhaps this is one of them. But there is no indication from this analysis of any common idiom at all where per-loop semantics are required. The git log analysis points in the same direction: parts of the ecosystem are adopting tools with very high false positive rates and doing what the tools say, with no apparent problems.
There is also the possibility that while there’s no semantic change, existing loops would, when updated to the new Go version, allocate one variable per iteration instead of once per loop. This problem would show up in memory profiles and is far easier to track down than the silent corruption we get when things go wrong with today’s semantics. Benchmarking of the public “bent” bench suite showed no statistically significant performance difference over all, so we expect most programs to be unaffected.
Good tooling can help users identify exactly the loops that need the most scrutiny during the transition
Our experience analyzing the failures in Google’s Go tests shows that we can use compiler instrumentation (adjusted -m output) to identify loops that may be compiling differently, because the compiler thinks the loop variables escape. Almost all the time, this identifies a very small number of loops, and one of those loops is right next to the failure. That experience can be wrapped up into a good tool for directing any debugging sessions.
Another possibility is a compilation mode where the compiled code consults an array of bits to decide during execution whether each loop gets old or new semantics. Package testing could provide a mode that implements binary search on that array to identify exactly which loops cause a test to fail. So if a test fails, you run the “loop finding mode” and then it tells you: “applying the semantic change to these specific loops causes the failure”. All the others are fine.
Static analysis is not a viable alternative
Whether a particular loop is “buggy” due to the current behavior depends on whether the address of an iteration value is taken and then that pointer is used after the next iteration begins. It is impossible in general for analyzers to see where the pointer lands and what will happen to it. In particular, analyzers cannot see clearly through interface method calls or indirect function calls. Different tools have made different approximations. Vet recognizes a few definitely bad patterns, and we are adding a new one checking for mistakes using t.Parallel in Go 1.20. To avoid false positives, it also has many false negatives. Other checkers in the ecosystem err in the other direction. The commit log analysis showed some checkers were producing over 90% false positive rates in real code bases. (That is, when the checker was added to the code base, the “corrections” submitted at the same time were not fixing actual problems over 90% of the time in some commits.)
There is no perfect way to catch these bugs statically. Changing the semantics, on the other hand, eliminates them all.
Changing loop syntax entirely would cause unnecessary churn
We have talked in the past about introducing a different syntax for loops (for example, #24282), and then giving the new syntax the new semantics while deprecating the current syntax. Ultimately this would cause a very significant amount of churn disproportionate to the benefit: the vast majority of existing loops are correct and do not need any fixes. It seems like an extreme response to force an edit of every for loop that exists today while invalidating all existing documentation and then having two different for loops that Go programmers need to understand, especially compared to changing the semantics to match what people overwhelmingly expect when they write the code.
My goal for this discussion is to gather early feedback on the idea of making a change here at all, because that’s something we’ve previously treated as off the table, as well as any feedback on expected impact and what would help users most in a roll-out strategy. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions