-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Why
Currently, the clean way to end the scanning early is to use the error ErrFinalToken
. However, when it is used, the Scanner.Scan
method will unconditionally return true
even if the token is nil
, generating an extra token (via Scanner.Text
). (Please note that this is different from the splitting function returning an empty slice. This proposal only covers the case where the second return value is exactly nil
; every other case, including the case where the second return value is an empty slice, should remain the same.) In other words, a SplitFunc
cannot return
len(data), nil, ErrFinalToken
as a way to cleanly terminate the scanning without adding an extra token. If I understand the logic correctly, a user thus must use a custom error or specifically check whether Scanner.Bytes
returns nil
to achieve token-free early termination, which is a bit awkward. However, I wonder if it makes sense for Scanner
to skip the nil
token in this very special case? No matter whether this proposal is accepted or not, I hope this tricky case can be more explicitly mentioned in the documentation of SplitFunc
and/or Scanner.Scan
.
Proposed code change
In Scanner.Sacn
,
if err == ErrFinalToken {
s.token = token
s.done = true
- return true
+ return token != nil
}
Example exploiting the proposed API
(This is adapted from the existing example ExampleScanner_emptyFinalToken
.)
func ExampleScanner_earlyStop() {
const input = "1,2,STOP,4,"
scanner := bufio.NewScanner(strings.NewReader(input))
onComma := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
for i := 0; i < len(data); i++ {
if data[i] == ',' {
// if the token is "STOP", ignore the rest
if string(data[:i]) == "STOP" {
return i + 1, nil, bufio.ErrFinalToken
}
return i + 1, data[:i], nil
}
}
return 0, data, bufio.ErrFinalToken
}
scanner.Split(onComma)
for scanner.Scan() {
fmt.Printf("Got a token %q\n", scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
}
Output before the change:
Got a token "1"
Got a token "2"
Got a token ""
Output after the change:
Got a token "1"
Got a token "2"
Edits
- 10/23: Added example code.
- 10/23: I noticed that it's easy to confuse
nil
tokens with empty tokens. I updated the text with more clarification. I also fixed the typo in the code.