Umlauts/special characters not converted to correct html entities #936
Let's say I have a very easy HTML file with this content:
When saving this sampe html code as an ANSI encoded file with notepad++, my browser also displays these characters correctly.
But how do I tell Tidy to convert that to
Within the browser, the characters then are displayed like this:
Do I have to use win1252 character encoding for all my HTML files?
The text was updated successfully, but these errors were encountered:
@Feathered-Serpent, thank you for the issue... but what is it exactly?
It seems what we have here is a mixture of -
For the first 128 characters, that is, code points
Some brief history...
Sorry to bore you, if you already know all this...
Computers, being bits and byte machines, adopted latin1, sometimes referred to extended ASCII, early in their development, and got widespead use on the fledgling internet... meant only basically western european language displays were available... just 256 chars... 1-byte... ugh!
With the advent of the so called
History over... back to the issue at hand ;=))
When you saved the files as ASCII, it converted the first character,
So, to answer your last question,
Tidy does not have an option to convert characters to known
Tidy will preserve valid
Due to the unknowns introduce by
Have I missed some point here? If yes, please explain... thanks...
At this moment can not see a problem in
Thank you for your detailed explanation. Though one thing is wrong, as when I use the option --char-encoding win1252, then the example in my starter is converted to this:
@Feathered-Serpent, thank you for the further testing, and feedback...
Wonders never cease! ;=))
Using my in_936-1.html sample... with one paragraph of 9 entities, and then a paragraph with 9 hi-bit, single byte, chars...
As you point out, with
I think the UTF-8 encoding, reported by
Loading the output in Notepad++, it can not display these 9 chars - displays an open square instead, and suggests the file is