Unknown encoding! #122

Closed
williamvivier opened this Issue Nov 29, 2011 · 8 comments

Projects

None yet

4 participants

@williamvivier

My app need to scrap page around from different website so isn't possible to known the charset before get the page and inspecting the header and/or the meta tag including the charset...
I use the option "encoding: null" and then response.setEncoding('binary'); or response.setEncoding('utf-8'); depends of the page but it doesn't work...
How to make it?

Thanks,

@deoxxa
deoxxa commented Dec 17, 2011

Somewhat unrelated, but I just happen to have done this exact thing in a different project - check out unicoder.

@mikeal
Member
mikeal commented Feb 18, 2012

don't use either, let request handle this for your (request will read the response header to determine the encoding).

request(url, function (err, resp, body) { console.error(body) })

@mikeal mikeal closed this Feb 18, 2012
@deoxxa
deoxxa commented Feb 18, 2012

That's not always possible - some sites (I'm looking at you, China and Japan) don't supply encoding information in their headers or markup. Even worse, some do - but they supply the wrong encoding!

@mikeal
Member
mikeal commented Feb 18, 2012

how are their sites viewable? i've messed up those headers before and browsers got very pissed at me :)

@deoxxa
deoxxa commented Feb 18, 2012

I'm not sure, but my theory is that every installation of Firefox and Chrome is actually imbued with 100% genuine voodoo magic that makes stuff like that work. That or heuristics. I find the issue crops up a lot in router software.

@williamvivier

But when the charset is defined in the html meta tag??

@mikeal
Member
mikeal commented Feb 21, 2012

this what buffer encoding should be used for, or you can set the encoding yourself if you know what it's gonna be.

@glasser glasser referenced this issue in meteor/meteor Dec 3, 2012
Closed

Method.http allow non-UTF8 encoding #451

@peakji
peakji commented May 20, 2013

+1 for using iconv, I'm using RegExps to read the meta tags whenever the response MIME type is text/html. Hopefully someday request can do this inside the lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment