Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: line endings in data get replaced #24426

Closed
tehsphinx opened this Issue Mar 16, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@tehsphinx
Copy link

tehsphinx commented Mar 16, 2018

What version of Go are you using (go version)?

go version go1.9.3 darwin/amd64

Does this issue reproduce with the latest release?

I created a copy of the xml package and applied the code changes of
https://go-review.googlesource.com/c/go/+/46433
which is supposed to fix issue
#20614
but the issue persists.

What did you do?

I get some xml from another service with data that contains line endings.
After parsing the data all line endings are standardized to \n.

I found this on the subject but wonder if that is supposed to even touch line endings inside CDATA. If it is just let me kindly know and I can move on to finding a workaround.
https://www.w3.org/TR/REC-xml/#sec-line-ends

Here a reproducable sample with and without CDATA escaping:
https://play.golang.org/p/PdnIyRD6Qsv

What did you expect to see?

My data with \r\n line endings intact.

What did you see instead?

My data with \n line endings.

@andybons andybons added this to the Unplanned milestone Mar 16, 2018

@andybons

This comment has been minimized.

Copy link
Member

andybons commented Mar 16, 2018

Hm. The section on CDATA makes no note about normalizing line endings, however CDATA sections are unparsed entities and the line endings section seems to only apply to parsed entities.

This could go either way. Do you know how other parsers handle it?

@ianlancetaylor @rsc?

@tehsphinx

This comment has been minimized.

Copy link
Author

tehsphinx commented Mar 17, 2018

Did some more research on the topic:

Here on stackoverflow somebody argues that since this has to be done before parsing the xml parser does not yet know if the line ending is part of a CDATA section or not: stackoverflow

MSDN library states:

XML processors treat the character sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. All are reported as a single LF character. Applications can save documents using the appropriate line-ending convention.

So I guess golang xml parser is correctly implemented and I should use base64 encoding to get line endings across.

@andybons

This comment has been minimized.

Copy link
Member

andybons commented Mar 18, 2018

OK. Closing for now. Let us know if you have any other concerns and feel free to re-open if you like.

@andybons andybons closed this Mar 18, 2018

@golang golang locked and limited conversation to collaborators Mar 18, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.