-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Description
Proposal Details
The bufio.Scanner type breaks an input stream into a series of tokens. It is convenient to use but has a flaw: when an error occurs, it stores the error internally, and makes it available via the Err method. This is fine in principle but in practice people routinely fail to call the Err method.
A particular error that can occur is ErrTooLong which means that the next token is too long for the buffer. On balance it's a good idea that the scanner does not permit arbitrary size tokens. Scanners are routinely used to parse input that is not under the program's control, and arbitrarily large tokens would open up a door to allocating large amounts of memory in the program. But since people routinely don't check the Err method, it's easy for an overly long token to be silently discarded.
Now that we have iterators, we can make things slightly better by using an iterator that returns a token and an error. It's still possible to ignore the error. But it's harder. It's comparable to forgetting err != nil, rather than being comparable to just continuing the loop calling scan.Scan.
The proposal is to add this method:
// Tokens returns an iterator over a pair of the next token and an error.
// At most one of the token and error will be non-nil.
// If the error is not nil, the iterator will not yield any further values.
// This is like a loop that calls the Scan method,
// except that the error is explicitly returned (as well as being stored for the Err method).
func (s *Scanner) Tokens() iter.Seq2[[]byte, error]To use it, instead of writing
for scan.Scan() {
tok := scan.Bytes()
// process tok
}
if err := scan.Err(); err != nil {
// handle error
}people will write
for tok, err := range scan.Tokens() {
if err != nil {
// handle error
break // or return
}
// process tok
}