Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong encoding used if charset only specified in content #1588

Closed
jdennis opened this issue Sep 10, 2013 · 1 comment
Closed

Wrong encoding used if charset only specified in content #1588

jdennis opened this issue Sep 10, 2013 · 1 comment

Comments

@jdennis
Copy link

jdennis commented Sep 10, 2013

HTML pages which declared their charset encoding only in their content
had the wrong encoding applied because the content was never checked.

Currently only the request headers are checked for the charset
encoding, if absent the apparent_encoding heuristic is applied. But
the W3.org doc says one should check the header first for a charset
declaration, then if that's absent check the meta tags in the content
for a charset encoding declaration. It also says if no charset
encoding declaration is found one should assume UTF-8, not ISO-8859-1
(a bad recommendation from the early days of the web).

I have a patch (pull request), more details in the commit comment.

@Lukasa
Copy link
Member

Lukasa commented Sep 10, 2013

All relevant discussion is in #1589, so I'm closing this to centralise the discussion there. =)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants