-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
What version of Go are you using (go version
)?
$ go version go version go1.19.2 darwin/amd64
Does this issue reproduce with the latest release?
Yes. I also saw the same issue on 1.18.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/julianedwards/Library/Caches/go-build" GOENV="/Users/julianedwards/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/julianedwards/gocode/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/julianedwards/gocode" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GOVCS="" GOVERSION="go1.19.2" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/dev/null" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/gq/_pcdqwn544g188vb6mhyffhm0000gn/T/go-build1771569906=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
package main import ( "bytes" "encoding/csv" "fmt" "log" ) const example1 = ` c0,c1,c2 abc,123, "abc",123, "a"b"c",123, ""ab"c",123, ""abc"",123, "a"b"c",123, ` const example2 = ` c0,c1,c2 abc,123, "a"bc"",123, a"b"c,123, ` func main() { // Example 1 csvReader := csv.NewReader(bytes.NewReader([]byte(example1))) // If I do not set this to a negative number, the reader can fail since // it fails to split by the comma delimiter and combines fields from // the rest of the row and the subsequent lines. csvReader.FieldsPerRecord = -1 csvReader.LazyQuotes = true records, err := csvReader.ReadAll() if err != nil { log.Fatal(err) } fmt.Println("Example 1") for i, record := range records { fmt.Println("ROW ", i) fmt.Println(record) } // Example 2 csvReader = csv.NewReader(bytes.NewReader([]byte(example2))) csvReader.FieldsPerRecord = -1 csvReader.LazyQuotes = true records, err = csvReader.ReadAll() if err != nil { log.Fatal(err) } fmt.Println("\nExample 2") for i, record := range records { fmt.Println("ROW ", i) fmt.Println(record) } }
What did you expect to see?
Example 1 ROW 0 [c0 c1 c2] ROW 1 [abc 123 ] ROW 2 [abc 123 ] ROW 3 [a"b"c 123 ] ROW 4 ["ab"c 123 ] ROW 5 ["abc" 123 ] ROW 6 [a"b"c 123 ] Example 2 ROW 0 [c0 c1 c2] ROW 1 [abc 123 ] ROW 2 [a"bc" 123 ] ROW 3 [a"b"c 123 ]
What did you see instead?
Example 1 ROW 0 [c0 c1 c2] ROW 1 [abc 123 ] ROW 2 [abc 123 ] ROW 3 [a"b"c 123 ] ROW 4 ["ab"c 123 ] ROW 5 ["abc",123, "a"b"c 123 ] Example 2 ROW 0 [c0 c1 c2] ROW 1 [abc 123 ] ROW 2 [a"bc",123, a"b"c,123, ]
Notes
- Quoted fields with bare quotes starting and ending in the middle are okay (
"*"*"*",
) - Quoted fields starting with a bare quote that ends before the last character are okay (
""*"*",
) - Quoted fields that end with a bare quote are not parsed properly (
"*"",
) - Quoted fields that start and end with bare quotes are not parsed properly (
""*"",
)–this is really a subset of the previous pattern.
I looked into the csv library code and the culprit seems to be these two lines. When finding a closing "
, the line position should not increment—similar to how the the lazy quotes case does not increment the line position when it finds an opening "
in a quoted field. The removal of quotes should always be handled by this block of code, never in the subsequent switch statement. The premature incrementing of the line causes (1) the next iteration of the loop to hit this clause in the if/else statement because there are no more quotes in the line and it assumes the end of the line is hit (causing any subsequent fields in the line and, potentially, following lines to be squashed) or (2) read until the next quote in the line squashing all of the fields in the line together (which can lead to case (1) if the next quote is the last rune of the field).