Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 as default for file sources #10

Merged
merged 4 commits into from
Oct 6, 2014
Merged

UTF-8 as default for file sources #10

merged 4 commits into from
Oct 6, 2014

Conversation

jlous
Copy link

@jlous jlous commented Sep 26, 2014

XML.read() uses the system default encoding (ignoring what the file itself says).

The correct thing would be to look for a charset attribute in the XML header, defaulting to any BOM-specified encoding if it is not there, and lastly default to UTF-8 if there is no BOM.

This quick fix still ignores what the file says, but defaults straight to UTF-8 and is at least slightly less likely to cause problems.

@jhannes
Copy link
Owner

jhannes commented Sep 29, 2014

If I understand the tests correctly, only one of these tests methods are relevant for eaxy used in this context. If you agree, remove the other tests and I'll merge the pull request.

@jlous
Copy link
Author

jlous commented Oct 1, 2014

The challenge was to make a test suite that would fail if
a) the fix was gone but the host stack happened to map UTF8 correctly
b) the hack for altering the system default charset in the test proved ineffective

I think I manged to wrap it all into a single sane test case now, albeit with two asserts.

jhannes added a commit that referenced this pull request Oct 6, 2014
UTF-8 as default for file sources
@jhannes jhannes merged commit 887c6f7 into jhannes:master Oct 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants