Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd conversion from UTF-8 to ISO-8859-1 #33

Closed
JoakimLofgren opened this issue Feb 25, 2016 · 8 comments
Closed

Odd conversion from UTF-8 to ISO-8859-1 #33

JoakimLofgren opened this issue Feb 25, 2016 · 8 comments

Comments

@JoakimLofgren
Copy link

Compatible ISO-8859-1 characters like åäö are converted into numeric HTML entities when they don't need to be...

In my opinion only characters that are unsupported or clashes with XML, e.g. <, > and others, should be converted into numeric HTML entities.

If I remember correctly this was not the case in the version 3.0.1 of this library.

JoakimLofgren added a commit to JoakimLofgren/phpxmlrpc that referenced this issue Feb 25, 2016
Test out some solutions for convertion UTF8/Latin1

Possible solution(s) for gggeek#33
@JoakimLofgren
Copy link
Author

I've started playing with some possible solutions (I referenced the commit to this issue).

@gggeek Any thoughts or concerns on my WIP stuff?

@gggeek
Copy link
Owner

gggeek commented Feb 28, 2016

Thanks for providing test cases.

Just to be sure that I understand correctly the situation and can reproduce the problem:

  • the problem arises when the lib receives latin-1 chars from external sources, and passes utf8 to the app / viceversa / both ways ? Is there a charset declaration in the xml prolog and/or a BOM involved?
  • you are providing test code; did you investigate the root of the problem?
  • are the 2 test classes in your code alternatives to do the exact same thing?

I ask because a few fixes were indeed made to the code in the area of charset handling, to make sure that some corner cases are properly handled, and I would expect the lib to actually fare better...

@JoakimLofgren
Copy link
Author

The problem is when you have a UTF-8 file with å and then you try to send it as ISO-8859-1 (e.g. it goes into this case in the Charset file).

Well the charset declaration or the BOM is not the issue as there are no "garbled" characters.

I ran into the issue when trying to send XML with the Client. It seems to be that it HTML entity encodes characters which need not be.

The CharsetFixTest file is my attempt at writing a fix (with the code in the test file) and should be removed before being merged.

@DianonForce
Copy link

Hi Guys,
is there a posible solution or a workaround for this problem?
I have the problem, that i have to make a connection to an xmlrpc-server, wich need to get user information from me. Any idiot decided, that my username contains 'ö' insted of 'oe', so the 'ö' is converted in '#&246;' wich give me an error 'no such user' as respond from xmlrpc-server.

@gggeek
Copy link
Owner

gggeek commented Mar 16, 2016

Sorry, I have been too busy to work on this. I will do my best to find enough time the upcoming weekend

@gggeek
Copy link
Owner

gggeek commented Mar 26, 2016

@DianonForce the simplest workaround seems to me to set
$client->request_charset_encoding = 'UTF-8';
did you try it ?

gggeek added a commit that referenced this issue Mar 27, 2016
@gggeek
Copy link
Owner

gggeek commented Mar 27, 2016

Fixed in version 4.0.1.
Thanks @JoakimLofgren, I incorporated your test case, albeit slightly reduced.
As for version 3.0.1, I checked the code for character encoding conversion and it was the same as it in in version 4, so if this was not happening before, it must have been a long time ago.

@gggeek gggeek closed this as completed Mar 27, 2016
@JoakimLofgren
Copy link
Author

Thanks @gggeek

I'll check my use case on Tuesday. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants