Skip to content

proposal: bufio: add Scanner.Tokens iterator #76802

@ianlancetaylor

Description

@ianlancetaylor

Proposal Details

The bufio.Scanner type breaks an input stream into a series of tokens. It is convenient to use but has a flaw: when an error occurs, it stores the error internally, and makes it available via the Err method. This is fine in principle but in practice people routinely fail to call the Err method.

A particular error that can occur is ErrTooLong which means that the next token is too long for the buffer. On balance it's a good idea that the scanner does not permit arbitrary size tokens. Scanners are routinely used to parse input that is not under the program's control, and arbitrarily large tokens would open up a door to allocating large amounts of memory in the program. But since people routinely don't check the Err method, it's easy for an overly long token to be silently discarded.

Now that we have iterators, we can make things slightly better by using an iterator that returns a token and an error. It's still possible to ignore the error. But it's harder. It's comparable to forgetting err != nil, rather than being comparable to just continuing the loop calling scan.Scan.

The proposal is to add this method:

// Tokens returns an iterator over a pair of the next token and an error.
// At most one of the token and error will be non-nil.
// If the error is not nil, the iterator will not yield any further values.
// This is like a loop that calls the Scan method,
// except that the error is explicitly returned (as well as being stored for the Err method).
func (s *Scanner) Tokens() iter.Seq2[[]byte, error]

To use it, instead of writing

    for scan.Scan() {
        tok := scan.Bytes()
        // process tok
    }
    if err := scan.Err(); err != nil {
        // handle error
    }

people will write

    for tok, err := range scan.Tokens() {
        if err != nil {
            // handle error
            break // or return
        }
        // process tok
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a toolProposal

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions