-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character causes invalid character error #147
Comments
Unfortunately the null char is not valid in XML. |
In that case, would it make sense to strip it out of the string before attempting to serialise it ? |
Yes, probably. If you are serializing binary data there are more control characters that need to be stripped. You may want to take at a look at: |
Thanks I've had a read and didn't realise that some characters were simply invalid in xml! However, shouldn't this library handle this? According to the stack overflow posts you shared, the allowed characters are:
Do you think it makes sense for this package to strip any other characters when building an xml? Or do you think this is up to each consumer to handle? |
I may add a utility function -to be be explicitly called or configured by the user- to remove illegal characters since I don't think it would be a good idea to silently destroy people's hard-earned characters. Woodstox XML API for example throws on invalid characters by default but it also has a replacer function that can be configured by the user. |
I ran into this issue with a production app; we discovered our users were entering emoji characters into description fields and we would end up getting an invalid character error from the xmlbuilder library as we converted the input to XML to be routed into an enterprise messaging/eventing system. From the XML Specification: https://www.w3.org/TR/xml11/#charsets
And above that:
XMLBuilder throws an exception for valid characters within the range specified above, in particular, #x10000 and above; this includes Emoji. Providing a utility to strip the characters out would be an even worse option, as this would change the meaning of the data. I can't determine whether characters codepoints #x10000 and above are valid in an XML markup, but at minimum they should be encoded; for example: Did you accidentally implement the XML 1.0 specification instead of XML 1.1? In the changelist for XML 1.1, they state:
|
A bit more info on this. There are two separate issues:
ES6 fixes this issue by adding Unicode support to regular expressions. The /u flag enables Unicode support for regular expressions; without it, Javascript's Regex functionality incorrectly breaks with any supplementary plane character from Unicode. https://mathiasbynens.be/notes/javascript-unicode If the library only supports ES6, the solution is easy - add the "/u" flag at the end of the regular expressions in XMLStringifier.coffee and you're good to go. To adhere to the XML 1.1 specification, you'll also need to escape the valid control characters that are also being incorrectly prohibited by the library. If you need backwards compatibility, it's trickier. One option might be to use a 'legacy' regular expression that detects valid surrogate characters instead of the "allowSurrogateChars" boolean (remember, an XML encoder MUST support the character specification by default). This would require some tricky regular expressions that detect valid surrogate pairs but block invalid long surrogates. |
Ran into this today, our pptx export was crashed by a poop emoji, actual error log below. I suppose it's "cleaner" to remove these particular characters, but they're also arguably relevant content as it was a text open-end response to a survey of gastroenterologists.
|
colud you deploy npm library with 1f9b41a applied version? |
Use an ES5 compatible Regexp to assert legal characters. Closes #147.
xmlbuilder has fixed some bugs which my crash karma-html-detailed-reporter like oozcitak/xmlbuilder-js#147 (if there are invalid characters in report)
@oozcitak, did you ever add that utility function? I ran into a problem in which I’m trying to generate XML based on user input, and I just want to ignore characters that aren’t valid, for example, control characters. (For what it’s worth, I’m using xmlbuild-js through node-xml2js, so I don’t think xmlbuilder2 is an option). |
Just added the const obj = {
'node\x00': 'text\x08content'
}
const xmlStr = builder.create(obj, { invalidCharReplacement: '' }).end({ pretty: true });
// <?xml version="1.0"?>
// <node>textcontent</node> |
const obj = {
'node\x00': 'text\x08content'
}
const options = {
invalidCharReplacement: (c) => c === '\x00' ? '' : '_'
};
const xmlStr = builder.create(obj, options).end({ pretty: true });
// <?xml version="1.0"?>
// <node>text_content</node> |
Awesome. Thank you very much 😃 |
I guess I won’t be able to use this in xml2js, which is several versions behind on xmlbuilder (11). Should I just migrate to use xmlbuilder2 directly? It comes with a builder & a parser, so it should work fine right? Is |
Upon further investigation, the situation on
const xmlbuilder2 = require("xmlbuilder2");
const fs = require("fs");
fs.writeFileSync(
"example.xml",
xmlbuilder2.convert(
{ root: "A control character (backspace): \b" },
{ format: "xml", prettyPrint: true }
)
); Then I put the generated Should I open an issue on https://github.com/oozcitak/xmlbuilder2? |
In summary, I don’t know of a good solution for folks like me who need to parse & generate XML and support user input with strange characters:
|
@leafac I will be adding
To make sure you get well-formed XML on serialization, const xmlbuilder2 = require("xmlbuilder2");
const fs = require("fs");
fs.writeFileSync(
"example.xml",
xmlbuilder2.convert(
{ root: "A control character (backspace): \b" },
{ format: "xml", prettyPrint: true, wellFormed: true }
)
);
new XMLSerializer().serializeToString(document.createTextNode("A control character (backspace): \b")); |
Thanks for all your help in navigating this space. I’m trying to use
|
Thanks for taking the time to investigate. I opened these two issues in |
Same |
I had an issue with a module consuming this one, and I found that the problem boils down to this test whch throws the following error. Is this a bug or is it intentional?
/lib/XMLStringifier.js:148
throw new Error("Invalid character (" + chr + ") in string: " + str + " at index " + chr.index);
^
Error: Invalid character (
The text was updated successfully, but these errors were encountered: