Skip to content

bufio: Scanner and empty final token #8672

@gwenn

Description

@gwenn
What does 'go version' print?
go version devel +87b93aeb1822 Fri Sep 05 15:01:09 2014 -0700 darwin/amd64

What steps reproduce the problem?
Here is a minimalist tab-delimited file reader with an empty final token
http://play.golang.org/p/jl1x-SRQ7S

What happened?

The ScanField is never called at EOF with an empty data slice so the last token cannot
be parsed:
'a' 'b' 'c'
'd' 'e' 

What should have happened instead?

The ScanField should have been called  at EOF with an empty data slice:
'a' 'b' 'c'
'd' 'e' ''

By looking at the code of bufio.Scanner.Scan :
http://golang.org/src/pkg/bufio/scan.go?s=4436:4465#L114
            if s.end > s.start {
                advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil)
the split function cannot be called with an empty data slice.

But all Split implementations are doing the following test:
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L214
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L229
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L275
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L329
        if atEOF && len(data) == 0 {
            return 0, nil, nil
        }

It seems that the test 'len(data) == 0' is always false.

Due to the fact that the Split function is never called with an empty data slice,
the last empty token cannot be parsed correctly.

A possible (but ugly) workaround is to keep the tab unconsumed.

Another workaround is to use the encoding.csv package but it doesn't support
tab-delimited file with double-quote:
See https://golang.org/issue/3150

I've also tried to change the line:
http://golang.org/src/pkg/bufio/scan.go?s=4436:4465#L114
            if s.end > s.start {
to
            if s.end >= s.start {
But it breaks all Split implementations.

Thanks and regards.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions