New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd conversion from UTF-8 to ISO-8859-1 #33
Comments
Test out some solutions for convertion UTF8/Latin1 Possible solution(s) for gggeek#33
I've started playing with some possible solutions (I referenced the commit to this issue). @gggeek Any thoughts or concerns on my WIP stuff? |
Thanks for providing test cases. Just to be sure that I understand correctly the situation and can reproduce the problem:
I ask because a few fixes were indeed made to the code in the area of charset handling, to make sure that some corner cases are properly handled, and I would expect the lib to actually fare better... |
The problem is when you have a UTF-8 file with Well the charset declaration or the BOM is not the issue as there are no "garbled" characters. I ran into the issue when trying to send XML with the Client. It seems to be that it HTML entity encodes characters which need not be. The CharsetFixTest file is my attempt at writing a fix (with the code in the test file) and should be removed before being merged. |
Hi Guys, |
Sorry, I have been too busy to work on this. I will do my best to find enough time the upcoming weekend |
@DianonForce the simplest workaround seems to me to set |
Fixed in version 4.0.1. |
Thanks @gggeek I'll check my use case on Tuesday. :) |
Compatible ISO-8859-1 characters like åäö are converted into numeric HTML entities when they don't need to be...
In my opinion only characters that are unsupported or clashes with XML, e.g. <, > and others, should be converted into numeric HTML entities.
If I remember correctly this was not the case in the version 3.0.1 of this library.
The text was updated successfully, but these errors were encountered: