Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Marshal/Escape allows invalid characters #4235

Closed
anacrolix opened this Issue Oct 11, 2012 · 9 comments

Comments

Projects
None yet
5 participants
@anacrolix
Copy link
Contributor

anacrolix commented Oct 11, 2012

http://play.golang.org/p/ac4WDMTt5t

What is the expected output?
xml error: Invalid character

What do you see instead?
"<string>\0</string>"

Please provide any additional information below.

Invalid XML characters pass xml.Marshal. In the example I've given, the null byte is
passed through. This causes many XML decoders to barf.

http://en.wikipedia.org/wiki/Valid_characters_in_XML
@minux

This comment has been minimized.

Copy link
Member

minux commented Oct 11, 2012

Comment 1:

xml.Escape also has this problem.

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

@robpike

This comment has been minimized.

Copy link
Contributor

robpike commented Oct 11, 2012

Comment 2:

This isn't easy. NUL bytes are forbidden, as are surrogates and some other code points.
It's easy to detect them; the problem is what to do?  Escape, for instance, has no error
return. The only solution I can see is to panic, which is heavy-handed.
@minux

This comment has been minimized.

Copy link
Member

minux commented Oct 12, 2012

Comment 3:

yeah, that's why i didn't fix it.
Marshal can return error, but the Escape case is difficult.
If we fix Escape by allowing it to panic, does it break the Go 1 compatibility promise?
@anacrolix

This comment has been minimized.

Copy link
Contributor Author

anacrolix commented Oct 12, 2012

Comment 4:

I agree, it's heavy handed. However keep in mind this is actually a pretty serious bug,
as many heavy-hitting XML framework/super-libraries are pedantic about checking that
characters are valid. So the result of Marshal and Escape will cause errors in some
other process down the line. The case I have is in a SAX parser in some Java process I
don't control which abrubtly explodes when reading an attribute with a '\0' in it.
At least Marshal could return an error, and a CheckedEscape or something to that effect
could be used that allows returning an error.
Another idea is to add a function that will check a string for valid codepoints, which
could be used either before or after Escape, or whenever user code feels like it.
@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Dec 9, 2012

Comment 5:

1. Internally we will need the routine to know whether it is being called from something
that can return errors.
2. Bad characters result in an error if we're returning errors; otherwise they turn into
U+FFFD (best we can do).
3. Add EscapeText, a version of Escape that returns an error (issue #4112).
@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Dec 10, 2012

Comment 6:

Labels changed: added size-m.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Dec 10, 2012

Comment 7:

Labels changed: added suggested.

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Feb 28, 2013

Comment 8 by osaingre:

Hi, I'm working on this.
@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Mar 14, 2013

Comment 9:

This issue was closed by revision f74eb6d.

Status changed to Fixed.

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015

@rsc rsc removed the go1.1 label Apr 14, 2015

@golang golang locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.