New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to detect charset from certain ShiftJIS page #199
Comments
The code used // lib/fallback.ts
ogObject.charset = chardet.detect(Buffer.from(body)) || ''; |
I've never seen the |
Another detail I found during debugging is, I guess there is some cornerer case where we cannot just read text with At least for this specific webpage This is a gist to show the difference of bytes and |
In my use case I managed to detect encoding, convert the bytes, and use Considering the tricky things in encoding problem I guess it's hard to do a perfect fix. The API was flexible enough to allow my workaround 👍🏽 . |
I've updating the charset fallback in |
Sorry I don't have other similar cases at hand. Thanks for the fix, it should make this library more complete 👍🏽 |
I had another look at "corrupted" ShiftJIS text in gist. In the suspicious |
@jokester @cm-dyoshikawa fix is live in |
With this change, users of openGraphScraper should no longer need to be aware of character encodings. This will be very useful since I am in a Japanese-speaking country and still have Shift_JIS sites. Thank you. |
Describe the bug
OpenGraphScrapter v6.3.0 couldn't detect charset from a webpage I saw.
The page had
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS">
and maybe no other clue.To Reproduce
Expected behavior
Actual behavior
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: