ISO-8859-1 Files and Umlauts #60

Closed
macrauder opened this Issue Sep 21, 2011 · 2 comments

Comments

Projects
None yet
3 participants

Hi,

i have ISO-8859-1 html files that i try to scrapp with getHtml.
The scrapping works perfect but i didn't get the umlauts back in a way to work with it.
All umlaut are broken.
Is there a way to convert iso html files to utf.8 in front of the parsing process?
For test you can use http://liveticker.toyota-handball-bundesliga.de/spiel_001512000000000000000000000000000002021.html
the String Südmeier, Sören is interpreted as S�dmeier, S�ren..

Thx

Contributor

chriso commented Sep 24, 2011

Currently there's no support for encodings other than those supported by Node (UTF8 or Binary). You can try setting the encoding option to binary (see this line) and then use the iconv library to interpret the ISO-8859-1 responses.

chriso closed this Sep 24, 2011

Hum, not sure I get it ? Are we suppose to get the correct character with node.io + v0.8.8 ?

I got iso-8859-1 html scrap and it fails to retrieve correct character set.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment