Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbled content when accessing non-utf8 website (Shift_JIS) #1080

Closed
sharmalalit opened this issue Sep 19, 2014 · 4 comments
Closed

Garbled content when accessing non-utf8 website (Shift_JIS) #1080

sharmalalit opened this issue Sep 19, 2014 · 4 comments

Comments

@sharmalalit
Copy link

I am trying to access a non utf-8 website using request module. Response is garbled for this request.
Even after setting the encoding option to Shift_JIS I am seeing garbled Japanese text.

var request = require('request');
request('http://www.alc.co.jp/', function (error, response, body) {
    if (!error && response.statusCode == 200) {
        console.log(body) // Print the web page.
    }
});
@nylen
Copy link
Member

nylen commented Sep 19, 2014

Here are the different encodings, Shift_JIS is not one of them:

http://nodejs.org/api/buffer.html#buffer_buffer

@nylen
Copy link
Member

nylen commented Sep 19, 2014

This seems to work:

var concat  = require('concat-stream'),
    Iconv   = require('iconv').Iconv,
    request = require('request');

var conv = new Iconv('Shift_JIS', 'utf8'),
    req  = request('http://www.alc.co.jp/');

req.pipe(conv);

req.on('error', function() {
    console.log('an error occurred');
});

conv.pipe(concat(function(body) {
    console.log(body.toString());
}));

@nylen nylen closed this as completed Sep 19, 2014
@sharmalalit
Copy link
Author

@nylen Thanks a lot for looking into this. Is there a way to detect mid stream that encoding is Shift_JIS and I should use Iconv ?
I can detect using 'Content-Type' header. But some the websites does not define charset in header and does the same using Meta tag in HTML.

Edit : Buffer.isEncoding can serve the purpose here. http://nodejs.org/api/buffer.html#buffer_class_method_buffer_isencoding_encoding

@nylen
Copy link
Member

nylen commented Sep 19, 2014

I think that is pretty far outside the scope of the request library. curl and wget don't have this functionality either. The easiest way I can think of is to try phantomjs which is a full-featured headless browser.

Buffer.isEncoding won't help, that just tells whether a given encoding string like utf8 or Shift_JIS is recognized by node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants