ISO-8859-1 Files and Umlauts #60

macrauder opened this Issue Sep 21, 2011 · 2 comments


None yet
3 participants


i have ISO-8859-1 html files that i try to scrapp with getHtml.
The scrapping works perfect but i didn't get the umlauts back in a way to work with it.
All umlaut are broken.
Is there a way to convert iso html files to utf.8 in front of the parsing process?
For test you can use
the String Südmeier, Sören is interpreted as S�dmeier, S�ren..



chriso commented Sep 24, 2011

Currently there's no support for encodings other than those supported by Node (UTF8 or Binary). You can try setting the encoding option to binary (see this line) and then use the iconv library to interpret the ISO-8859-1 responses.

chriso closed this Sep 24, 2011

Hum, not sure I get it ? Are we suppose to get the correct character with + v0.8.8 ?

I got iso-8859-1 html scrap and it fails to retrieve correct character set.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment