handle encoding="..." of xml-processing instruction #56

Open
hgoebl opened this Issue Dec 20, 2011 · 5 comments

Comments

Projects
None yet
4 participants

hgoebl commented Dec 20, 2011

Hi,

if I understand the code correctly, the encoding of the first line of an XML document is not respected.
In case of non-utf8 encoded XML files, it could be a problem.

I tried to patch a bit, but it is a very stupid solution and actually just doesn't work: https://gist.github.com/1503453

IMO the problem is, that the parser is eating chunks and at this moment the buffer is already converted to a String.
Changing the encoding would only take effect after new chunks arrive and only in the streaming parser.

I'm afraid I'm lacking deep knowledge of streaming, piping and the architecture of the parser.
But I could do some testing with strange German XML documents ;-)

--Heinrich

mborho commented Dec 29, 2011

Same problem here. Unfortunately not usable for ISO-8859-1 encoded XML.

Update:

In case using the request module to load the xml, you can use "request({uri:url, encoding:'binary'}...", which works for ISO-8859-1.

@ghost

ghost commented Jan 21, 2012

Same problem here. @mborho not sure to understand your workaround

thanks

mborho commented Jan 21, 2012

@kalise if you load your iso-8859-1 encoded xml string with the request module ( https://github.com/mikeal/request ) you can use encoding="binary" as option for "request" and the loaded xml string will get parsed with sax-js.

is there any other workaround? I have the same problem with ISO-8859-1 and can't use the request module.

jeromew commented Dec 5, 2016

since the encoding= is not used by sax-js you can always try to pre-transcode your file using iconv -f ISO-8859-1 -t UTF-8. I don't know if this has some

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment