[HTTP Binding] Encoding bug with umlauts in website #5514
Comments
Here's my item:
|
And this is the input HTML:
|
In regard to the version, 1.11.0 is the same as the current snapshot. No changes have been made. This is beginning to smell like a UI problem. What browser are you using? Also, if you check the events.log, you should see the value update:
Here's the byte view:
|
I tried it with the iOS OpenHAB App, Safari on macOS, Chromium and Firefox on Windows. In every browser the same behavior. So in the events.log I get this message: |
Most unixes (unii?) allow something like |
Also, I went back and checked your original posting of the website code you're working with, and the character indeed appears to be encoded in UTF-8 and not 8859-1:
|
So this should be the dump of the Anne item:
So this is a part of the website. It should say "Wetterübersicht"
Github could also change the encoding to UTF-8 when you paste them. |
This:
actually matches the output you had posted in the forum:
0xef = LATIN SMALL LETTER I WITH DIAERESIS So it looks to me like your input file has the u diaeresis correctly encoded, but not the a diaeresis. |
But I tried also your website, which works for you. |
The only other thing I can think of for you to try at this point would be to get a DEBUG log or a network sniff and see if you can get a look at the data as it comes in off the wire. |
The debug log isn't really helpful:
This is the packet, that comes from the site www.staor.net (IP: 45.55.67.26) for getting the item Anne. |
So now I also captured my website with the date I parse in OpenHAB. Charset table: |
Here's an idea. I created a new file that contains only the name of the month and nothing else.
If the output is different, then the problem is in the transform. If the output is still the same, my next suggestion will be to put the http transport (org.openhab.io.net.http) into trace mode, rerun and recheck logs. |
This Burgundy item won't work, because the http binding needs a transformation.
This is the log when I set the http binding and the http transport to trace mode:
|
Well that's unfortunate. I mean, it makes sense, because in almost all cases you wouldn't want the entire contents of the URL you're pulling. You could try a regex transform like Maybe I can find a way to write an identity transform that just passes the content through unchanged... |
So the output of But I don't think that the issue is in the transform, because when I enabled the trace log for the http transport, the umlauts are already displayed wrong. That means there is an issue before the transformation rule comes in. |
Please post your trace log (note: not copy and paste) |
So now I have changed the org.ops4j.pax.logging.cfg file to output the http logs in a separate file.
Here we go: |
Unfortunately, this didn't provide any additional info. It shows, as you said, the binding receives the data incorrectly:
You could try going a level deeper and getting the traces from the apache http class |
So I enabled trace logging for this class, but there isn't any additional output. I only get some messages for other bindings, but this is not relevant to this. But what comes off the wire should be clear, because I made already a wireshark sniff. |
One last idea, and then I'm out of ideas. You could try forcing the system's file encoding as described in the last post in this thread. FWIW, if the apache logging was set correctly there would have been a huge amount of additional output, along these lines:
|
So I tried to force OpenHAB to use the encoding with this command: I had a lot of these debugging lines, but they were all from the Fritzbox Binding and not from the HTTP-Binding. @kaikreuzer Could you give us a hint to solve this issue? |
I don't have any other hints than @9037568 already provided. |
What can we do now? |
I did some additional digging. It appears that this line in the HttpUtil class, which is the class that the HTTP Binding uses to execute its fetches, converts its inputs using the platform encoding:
I'll see if I can make some changes to the HttpUtil class and give you a test jar... |
Update: I've made changes, but I'm having trouble getting them installed into a running version of OH for testing, apparently due to the recent QuantityType changes. Fearing I may be forced to install 2.2 now... |
When you need help, I have OpenHAB 2.2 and could install a second instance for testing purposes. |
If you're feeling adventurous, you can try installing Note that this is a .jar file that has been renamed to .zip to force stupid GitHub to accept it. If you can get it to install in a working state, and set the logging up correctly, you should see the following debug output:
Note that the package for debug logging here should be |
The uninstallation hasn't really worked:
After placing the jar file into the folder /usr/share/openhab2/addons I got the following openhab log entry
Now it produces a lot of log entries:
|
This is the error I was getting as well. Which would appear to mean we're stuck until there's a release of 2.3. |
Doesn’t it work on the lastest snapshots, which would probably be 2.3? Is there already a planned release date for 2.3? |
Indeed, you're correct. There's a snapshot 2.3 distro in cloudbees |
So now I tried the latest snapshot, but I get the following error message:
|
Oops. Fixed that, I think. Here's a new jar with a little more debugging added. |
Now I only get this log message:
But the items are shown in the sitemap as "M�rz" |
The binding does not support umlauts on a html site with the encoding ISO-8859-1.
When I print out the value saved in a item, I get the following output:
11. M�rz 2018
This is my item:
String Temp_Date "Datum [%s]" <calendar> { http="<[datenlogger:3000:REGEX((?s).*Datum.*([1-3 ][0-9]. [A-zöäü]+ [0-9]{4})..font.*)]" }
Website code:
Meta tags from the website:
We already adressed this issue in this OpenHAB Community Thread: https://community.openhab.org/t/http-binding-problem-with-umlauts/41355
I'm using OpenHAB 2.2 Release on a Debian 8 environment.
The text was updated successfully, but these errors were encountered: