You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Saved a Word 2016 document as filtered html. Saved as UTF-8 also tried Latin 1.
Using the following tidy config:
anchor-as-name: yes
clean: YES
bare: YES
drop-proprietary-attributes: YES
word-2000: No
wrap: 144
vertical-space: yes
input-encoding:latin1 (tried 1252 and utf8 as well)
output-encoding:utf8
In every case. it would not convert the Word html ö, ñ, and various other accented characters typically used in German and Spanish into anything other the garbage or a boxed question mark. I've tried variations of input-encoding, output-encoding, clean, bare without improvement.
Please advise
The text was updated successfully, but these errors were encountered:
@pdo2641 thanks for the issue, but not sure what you exactly want... for sure I am no expert on character encoding issues, but have picked up a few things along the way...
You have used an o umlaut, ö, 0xf6, 246, and ñ, 0xf1, 241, and if I use a config --input-ecoding latin1 --output-encoding utf8, those two will be converted to utf-8, namely 0xc3 0xb6 and 0xc3 0xb1, resp., and in a browser are again correctly displayed as ö and ñ, so where is the problem?
Sure, in my code page 437 console, they are only shown as ?, or sort of garbage - sequence of high bit characters - since my console does not support utf-8, even if I run chcp 65001. But they are correctly displayed in good editors, and browsers, even very dumb notepad, as the character they are...
This seem nothing to do with a Word 2016 document, or Word filtered html... especially since your config shows word-2000: No... This would seem true for any html containing latin1 characters... like the follow french accented characters -
Processed this with the above config, --input-ecoding latin1 --output-encoding utf8, the output document will be displayed the same in a browser, but each has been converted to utf-8. The latin1 Ç, 0xC7, has been converted to utf-8, 0xC3 0x87, and so on for each of the others...
Maybe I misunderstanding somethings here... please explain more... thanks...
Saved a Word 2016 document as filtered html. Saved as UTF-8 also tried Latin 1.
Using the following tidy config:
anchor-as-name: yes
clean: YES
bare: YES
drop-proprietary-attributes: YES
word-2000: No
wrap: 144
vertical-space: yes
input-encoding:latin1 (tried 1252 and utf8 as well)
output-encoding:utf8
In every case. it would not convert the Word html ö, ñ, and various other accented characters typically used in German and Spanish into anything other the garbage or a boxed question mark. I've tried variations of input-encoding, output-encoding, clean, bare without improvement.
Please advise
The text was updated successfully, but these errors were encountered: