Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle encoding="..." of xml-processing instruction #56

Open
hgoebl opened this issue Dec 20, 2011 · 5 comments
Open

handle encoding="..." of xml-processing instruction #56

hgoebl opened this issue Dec 20, 2011 · 5 comments

Comments

@hgoebl
Copy link

hgoebl commented Dec 20, 2011

Hi,

if I understand the code correctly, the encoding of the first line of an XML document is not respected.
In case of non-utf8 encoded XML files, it could be a problem.

I tried to patch a bit, but it is a very stupid solution and actually just doesn't work: https://gist.github.com/1503453

IMO the problem is, that the parser is eating chunks and at this moment the buffer is already converted to a String.
Changing the encoding would only take effect after new chunks arrive and only in the streaming parser.

I'm afraid I'm lacking deep knowledge of streaming, piping and the architecture of the parser.
But I could do some testing with strange German XML documents ;-)

--Heinrich

@mborho
Copy link

mborho commented Dec 29, 2011

Same problem here. Unfortunately not usable for ISO-8859-1 encoded XML.

Update:

In case using the request module to load the xml, you can use "request({uri:url, encoding:'binary'}...", which works for ISO-8859-1.

@ghost
Copy link

ghost commented Jan 21, 2012

Same problem here. @mborho not sure to understand your workaround

thanks

@mborho
Copy link

mborho commented Jan 21, 2012

@Kalise if you load your iso-8859-1 encoded xml string with the request module ( https://github.com/mikeal/request ) you can use encoding="binary" as option for "request" and the loaded xml string will get parsed with sax-js.

@DoomyTheFroomy
Copy link

is there any other workaround? I have the same problem with ISO-8859-1 and can't use the request module.

@jeromew
Copy link

jeromew commented Dec 5, 2016

since the encoding= is not used by sax-js you can always try to pre-transcode your file using iconv -f ISO-8859-1 -t UTF-8. I don't know if this has some

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants