Skip to content

Reports errors for unicode #11

Open
petdance opened this Issue Dec 5, 2011 · 0 comments

1 participant

@petdance
Owner
petdance commented Dec 5, 2011

Reported by sciurus, Sep 26, 2008

Try to parse a string containing characters such as “directional quotation
marks”. HTML::Lint 2.04 reports invalid character errors.

Comment 1 by gavin.brock, Dec 8, 2009

To add to this, I think that if the file is declared as some flavor of Unicode (e.g. http-equiv="content-type" content="text/html; charset=utf-8">) shouldn't the "
Invalid character \x65E5 should be written as" errors not be there at all.

Comment 2 by bishopw, Feb 13, 2011

I run into this when trying to run HTML::Lint on a set of pages with Japanese text.

If my name in Japanese, ビショップ, appears on the page, HTML::Lint tells me:

#  (122:21) Invalid character \x30D3 should be written as 
#  (122:21) Invalid character \x30B7 should be written as 
#  (122:21) Invalid character \x30E7 should be written as 
#  (122:21) Invalid character \x30C3 should be written as 
#  (122:21) Invalid character \x30D7 should be written as 

(The line ends where it looks like a different encoding suggestion should be.)

My pages do all include the meta tag declaring charset=utf-8.

Is it possible to increase the priority on this, since it makes HTML::Lint unusable for sites using utf-8, which is becoming the Internet default encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.