-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
I have confirmed this for
go version devel +fcff3ba Mon Jan 12 02:09:50 2015 +0000 linux/amd64
go version go1.4 linux/amd64
I know that there is a 64kb limit to the Scanner buffer. My issue mainly deals with successive calls to Scan() with a scanner that has encountered a token that is too long. When a Scanner encounters a token that exceeds 64kb, a call to Scan() returns false and its token field is empty. However, it seems that if Scan() is called a second time on the same Scanner, this then populates the token field of the Scanner up to 64kb and returns true. If a third, fourth, ..., Nth call to Scan() is made, the token field is empty and returns false.
Here is an example:
...
file, _ := os.Open("line.txt") // file has a single line that exceeds the 64kb limit
scanner := bufio.NewScanner(file)
var ret bool
ret = scanner.Scan() // ret is false, scanner.Text() is an empty slice of bytes, error field says the line is too long
ret = scanner.Scan() // ret is true, scanner.Text() is a slice containing the first 64kb of the line, error field says the line is too long
ret = scanner.Scan() // ret is false, scanner.Text() is an empty slice again, error field still says the line is too long
...
I would argue that successive calls to Scan() in these situations should give a consistent result, or perhaps to advertise that a second call to Scan() gets the first 64kb of the token. I would also like to argue for making the 64kb token limit in bufio.Scanner clear in the documentation as it may save some headaches.
I know this issue is low priority, so I would be happy to take it on.