Skip to content

bufio: allow terminating Scanner early cleanly without a final token or an error #56381

@favonia

Description

@favonia

Why

Currently, the clean way to end the scanning early is to use the error ErrFinalToken. However, when it is used, the Scanner.Scan method will unconditionally return true even if the token is nil, generating an extra token (via Scanner.Text). (Please note that this is different from the splitting function returning an empty slice. This proposal only covers the case where the second return value is exactly nil; every other case, including the case where the second return value is an empty slice, should remain the same.) In other words, a SplitFunc cannot return

len(data), nil, ErrFinalToken

as a way to cleanly terminate the scanning without adding an extra token. If I understand the logic correctly, a user thus must use a custom error or specifically check whether Scanner.Bytes returns nil to achieve token-free early termination, which is a bit awkward. However, I wonder if it makes sense for Scanner to skip the nil token in this very special case? No matter whether this proposal is accepted or not, I hope this tricky case can be more explicitly mentioned in the documentation of SplitFunc and/or Scanner.Scan.

Proposed code change

In Scanner.Sacn,

 if err == ErrFinalToken {
 	s.token = token
 	s.done = true
-	return true
+	return token != nil
 }

Example exploiting the proposed API

(This is adapted from the existing example ExampleScanner_emptyFinalToken.)

func ExampleScanner_earlyStop() {
	const input = "1,2,STOP,4,"
	scanner := bufio.NewScanner(strings.NewReader(input))
	onComma := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
		for i := 0; i < len(data); i++ {
			if data[i] == ',' {
				// if the token is "STOP", ignore the rest
				if string(data[:i]) == "STOP" {
					return i + 1, nil, bufio.ErrFinalToken
				}

				return i + 1, data[:i], nil
			}
		}
		return 0, data, bufio.ErrFinalToken
	}
	scanner.Split(onComma)

	for scanner.Scan() {
		fmt.Printf("Got a token %q\n", scanner.Text())
	}
	if err := scanner.Err(); err != nil {
		fmt.Fprintln(os.Stderr, "reading input:", err)
	}
}

Output before the change:

Got a token "1"
Got a token "2"
Got a token ""

Output after the change:

Got a token "1"
Got a token "2"

Edits

  • 10/23: Added example code.
  • 10/23: I noticed that it's easy to confuse nil tokens with empty tokens. I updated the text with more clarification. I also fixed the typo in the code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions