Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: writer.UseCRLF will change \n to \r\n in data field #36445

Open
bkkgbkjb opened this issue Jan 8, 2020 · 6 comments
Open

encoding/csv: writer.UseCRLF will change \n to \r\n in data field #36445

bkkgbkjb opened this issue Jan 8, 2020 · 6 comments
Milestone

Comments

@bkkgbkjb
Copy link

@bkkgbkjb bkkgbkjb commented Jan 8, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/secret/.cache/go-build"
GOENV="/home/secret/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/secret/Dropbox/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.13"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.13/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build843557951=/tmp/go-build -gno-record-gcc-switches"

What did you do?

trying to write

"col1","col2"
"asd\njk", "2g9"

into csv file

but the newline in asd\njk has been change to asd\r\njk

playground

What did you expect to see?

\n in data field would not be changed by writer.UseCRLF

"col1,col2\r\n\"asd\njk\",2g9\r\n"

What did you see instead?

"col1,col2\r\n\"asd\r\njk\",2g9\r\n"

@bkkgbkjb
Copy link
Author

@bkkgbkjb bkkgbkjb commented Jan 8, 2020

after a further comparison to Python 3.x csv library,
I find following table:

Python:
new_line: \r\n
\r -> quote
\n -> quote
\r\n -> quote


new_line: \n
\n -> quote
\r -> no_quote
\r\n -> quote


Go:

new_line: \r\n
\n -> changed to \r\n, then quote                         (1)
\r -> removed \r, then quote remaining                    (2)
\r\n -> quote


new_line: \n
\n -> quote
\r -> quote
\r\n -> quote

though there seem no good standard on csv format, I still think touching actual data is a bad idea

My suggestion will be simply fix (1), (2) to quote
then all the \r?\n? occurrence would be quoted, which never harms

@toothrot toothrot added this to the Backlog milestone Jan 8, 2020
@toothrot
Copy link
Contributor

@toothrot toothrot commented Jan 8, 2020

/cc @dsnet @bradfitz

The issue reported seems like surprising behavior to me. I wouldn't expect data to be changed either.

@dsnet
Copy link
Member

@dsnet dsnet commented Jan 11, 2020

The godoc currently documents the behavior:

The Reader converts all \r\n sequences in its input to plain \n

Given that this is specified behavior, we can't change it. At best, we can add a Reader option to preserve newlines without mangling.

@bkkgbkjb
Copy link
Author

@bkkgbkjb bkkgbkjb commented Jan 12, 2020

well but i think we're talking about csv.Writer.UseCRLF here

the only explanation is:

If UseCRLF is true, the Writer ends each output line with \r\n instead of \n.

i suggest we add a StrictMode bool field into

struct Writer {
    ...
}

so that by enabling it, Writer would not change anything in our data

@bkkgbkjb
Copy link
Author

@bkkgbkjb bkkgbkjb commented Jan 12, 2020

So the problem here is with csv.Writer.UseCRLF enabled

csv.Writer would also change our data in quote:
remove all \r
change \n to \n\r

which is shown as

                        // Encode the special character.
			if len(field) > 0 {
				var err error
				switch field[0] {
				case '"':
					_, err = w.w.WriteString(`""`)
				case '\r':
					if !w.UseCRLF {
						err = w.w.WriteByte('\r')
					}
				case '\n':
					if w.UseCRLF {
						_, err = w.w.WriteString("\r\n")
					} else {
						err = w.w.WriteByte('\n')
					}
				}
				field = field[1:]
				if err != nil {
					return err
				}
			}

src

@lrita
Copy link

@lrita lrita commented Apr 2, 2020

Ms-excel will interpretive the \r in fields to . And we must to set UseCRLF=true for ms-excel. What a pity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.