Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/vet: check for failure to use result of a 'yield' call #65795

Open
adonovan opened this issue Feb 19, 2024 · 12 comments
Open

cmd/vet: check for failure to use result of a 'yield' call #65795

adonovan opened this issue Feb 19, 2024 · 12 comments
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Feb 19, 2024

Go 1.23 introduces "push" iterators, also known as "range over func", in which the compiler transforms the loop body (consumer) into a stackless coroutine yield func(T) bool for some element type T, and then passes it to the push iterator (producer). The producer calls yield for each element, and must stop if any call to yield returns false, indicating that the desired continuation of the loop body is not continue (e.g. break, outer continue, return, panic, goto).

It is a programmer error to fail to honor the result of any call to yield. The runtime reports a dynamic error in this case, but it would be nice to catch it earlier with a static check.

We should add a new analyzer (or perhaps augment the existing unusedresult analyzer) to report when the result of a dynamic call to a function named yield (with type func(T) bool for some T) is ignored.

(Update: for a second yield mistake that vet could check for, see #65795 (comment).)

@adonovan adonovan added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Feb 19, 2024
@robpike
Copy link
Contributor

robpike commented Feb 19, 2024

There are three conditions for vet checks, as listed in its README. We don't know yet whether this is a large enough problem in the wild to satisfy the frequency criterion.

@adonovan
Copy link
Member Author

adonovan commented Feb 19, 2024

There are three conditions for vet checks, as listed in its README. We don't know yet whether this is a large enough problem in the wild to satisfy the frequency criterion.

It is certainly not a large problem in the wild because no-one has merged code with a push iterator... yet; but I've already made this mistake several times in my experiments. Nonetheless your point is taken.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 20, 2024
@timothy-king
Copy link
Contributor

We should add a new analyzer (or perhaps augment the existing unusedresult analyzer) to report when the result of a dynamic call to a function named yield (with type func(T) bool for some T) is ignored.

I am not sure this will be sufficiently precise. For GoVersion<=1.21, one could already have written such a function and ignore the return value of the function named 'yield'. Thatcould never have been used in a "range over func" statement, and would have been fine.

I think we will be able to get sufficiently good precision by switching this to looking at instances of for range f where f is a function type, f(yield) is a static call, and f ignores a call to its yield function. A big disadvantage of this approach though is where to report the diagnostic. Within the body of f is more actionable, but this will not always be doable if we are looking at the package containing the for range f statement.

@adonovan
Copy link
Member Author

I am not sure this will be sufficiently precise. For GoVersion<=1.21, one could already have written such a function and ignore the return value of the function named 'yield'. Thatcould never have been used in a "range over func" statement, and would have been fine.

You're absolutely right, it's possible it is just a coincidence, but I suspect the combination of calling a function whose operand is a local variable named 'yield', with the appropriate type, is pretty rare today, and we could (with a new convention) practically carve out the name as an informally reserved word for the new loop semantics.

@timothy-king
Copy link
Contributor

I suspect the combination of calling a function whose operand is a local variable named 'yield', with the appropriate type, is pretty rare today

Fair enough.

we could (with a new convention) practically carve out the name as an informally reserved word for the new loop semantics.

It makes me a bit nervous to only check when this is called 'yield'. This is new and conventions are somewhat weak. What is special about 'yield'? Why not 'y' or 'f'? How about 'yield' and 'yield2' if there is more than one? I am personally very prone to typing 'yeild' incorrectly.

I am somewhat more comfortable about treating 'yield' specially if we have a backup plan for warning the other cases too. This falls under "a check that misses too many of the cases it's looking for will give a false sense of security" IMO. Perhaps report 'yield' instances at the missed check regardless of being called, report intra-package calls at the missed check, and report at the call site for inter-package instances (#65795 (comment)). This is a bit complicated, but still doable.

@seankhliao seankhliao added this to the Unplanned milestone Jul 13, 2024
@Deleplace
Copy link
Contributor

I 100% agree that ignoring the result of yield() is too easy to do accidentally, in good faith. I've been bitten myself.

Part of the problem is that the user never writes the body of yield, as yield is automagically provided as an argument. In many use cases the termination cause we have in mind is when the iter.Seq or iter.Seq2 decides to stop producing values, and we're not thinking about the other cause "the consumer doesn't want any more values". It was a mild surprise for me to discover that the keyword break in the consuming loop would cause yield() to return false (as admittedly stated in the spec).

Ideally the vet check would trigger only for iterator functions that are actually ranged over in at least 1 place in the codebase, but that may be too costly to implement.

@adonovan
Copy link
Member Author

Another yield-related mistake is to write:

for x := range seq {
   yield(x)
}

when seq(yield) is more efficient, since it avoids back-to-back dynamic calls and the complex range desugaring. Perhaps vet could check for this too.

@Deleplace
Copy link
Contributor

for x := range seq {
   yield(x)
}

The worst problem here is that it is incorrect, right? And indeed, not very efficient.

@adonovan
Copy link
Member Author

The worst problem here is that it is incorrect, right? And indeed, not very efficient.

This is mainly an efficiency problem. But if this code appears within an iterator of a recursive data structure (e.g. a tree) and this is the inductive case (seq is a subtree), then it could turn a linear-time iteration into an O(n log n) operation because every element would make a chain of calls proportional to its depth (log n).

@Deleplace
Copy link
Contributor

I meant that it would lead to a panic, because of ignoring the result of yield(x)

@adonovan
Copy link
Member Author

I meant that it would lead to a panic, because of ignoring the result of yield(x)

Sorry, I completely failed to notice that my for x := range seq { yield(x) } example committed two mistakes, the first being the title of this issue! Which perhaps shows how easy a mistake it is to make.

I meant to write:

for x := range seq {
  if !yield(x) { break }
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609617 mentions this issue: gopls/internal/analysis/yield: analyzer for iterator (yield) mistakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants