Skip to content

encoding/csv: Lazy quotes reader does not properly parse quoted fields that end in a bare quote #56329

@julianedwards

Description

@julianedwards

What version of Go are you using (go version)?

$ go version
go version go1.19.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes. I also saw the same issue on 1.18.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/julianedwards/Library/Caches/go-build"
GOENV="/Users/julianedwards/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/julianedwards/gocode/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/julianedwards/gocode"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.19.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/gq/_pcdqwn544g188vb6mhyffhm0000gn/T/go-build1771569906=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
	"bytes"
	"encoding/csv"
	"fmt"
	"log"
)

const example1 = `
c0,c1,c2
abc,123,
"abc",123,
"a"b"c",123,
""ab"c",123,
""abc"",123,
"a"b"c",123,
`
const example2 = `
c0,c1,c2
abc,123,
"a"bc"",123,
a"b"c,123,
`

func main() {
	// Example 1
	csvReader := csv.NewReader(bytes.NewReader([]byte(example1)))
	// If I do not set this to a negative number, the reader can fail since
	// it fails to split by the comma delimiter and combines fields from
	// the rest of the row and the subsequent lines.
	csvReader.FieldsPerRecord = -1
	csvReader.LazyQuotes = true

	records, err := csvReader.ReadAll()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("Example 1")
	for i, record := range records {
		fmt.Println("ROW ", i)
		fmt.Println(record)
	}

	// Example 2
	csvReader = csv.NewReader(bytes.NewReader([]byte(example2)))
	csvReader.FieldsPerRecord = -1
	csvReader.LazyQuotes = true

	records, err = csvReader.ReadAll()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("\nExample 2")
	for i, record := range records {
		fmt.Println("ROW ", i)
		fmt.Println(record)
	}
}

What did you expect to see?

Example 1
ROW 0
[c0 c1 c2]
ROW 1
[abc 123 ]
ROW 2
[abc 123 ]
ROW 3
[a"b"c 123 ]
ROW 4
["ab"c 123 ]
ROW 5
["abc" 123 ]
ROW 6
[a"b"c 123 ]

Example 2
ROW 0
[c0 c1 c2]
ROW 1
[abc 123 ]
ROW 2
[a"bc" 123 ]
ROW 3
[a"b"c 123 ]

What did you see instead?

Example 1
ROW  0
[c0 c1 c2]
ROW  1
[abc 123 ]
ROW  2
[abc 123 ]
ROW  3
[a"b"c 123 ]
ROW  4
["ab"c 123 ]
ROW  5
["abc",123,
"a"b"c 123 ]

Example 2
ROW  0
[c0 c1 c2]
ROW  1
[abc 123 ]
ROW  2
[a"bc",123,
a"b"c,123,
]

Notes

  • Quoted fields with bare quotes starting and ending in the middle are okay ("*"*"*",)
  • Quoted fields starting with a bare quote that ends before the last character are okay (""*"*",)
  • Quoted fields that end with a bare quote are not parsed properly ("*"",)
  • Quoted fields that start and end with bare quotes are not parsed properly (""*"",)–this is really a subset of the previous pattern.

I looked into the csv library code and the culprit seems to be these two lines. When finding a closing ", the line position should not increment—similar to how the the lazy quotes case does not increment the line position when it finds an opening " in a quoted field. The removal of quotes should always be handled by this block of code, never in the subsequent switch statement. The premature incrementing of the line causes (1) the next iteration of the loop to hit this clause in the if/else statement because there are no more quotes in the line and it assumes the end of the line is hit (causing any subsequent fields in the line and, potentially, following lines to be squashed) or (2) read until the next quote in the line squashing all of the fields in the line together (which can lead to case (1) if the next quote is the last rune of the field).

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeWaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions