-
Notifications
You must be signed in to change notification settings - Fork 19
Reports errors for unicode #11
Comments
We're also getting hit by this for one of our projects. We're probably going to remove |
What would you do to fix it? You say you would fork it, so you must have something in mind that would alleviate the problem. Please let me know what that would be. |
It's one of our devs who was considering that. If anything, we'd simply remove the check for our code since we declare our HTML to be valid UTF-8. I would check the declared A simpler (but less correct) fix is in
That's what's causing the error. We thought that we could limit ourselves to |
I don't know squat about encodings and how to deal with them, so I could use some direction on this. I'd love to hear what your dev comes up with. I'm also wondering if it might make sense to add the ability to exclude a given error, not just a class of errors. |
Fair enough. Simply changing this:
To this:
Would be enough for right now. I suspect many people already ignore the fluff as more and more tools are requiring new attributes to be added to tags and the check for known attributes doesn't seem to make a lot of sense now. Thus, "fluff" doesn't help for a modern web app. However, as you pointed out, the ability to exclude a given error would be much more flexible. |
Seems to me that encoding would be a STRUCTURE thing. In the original post, it says "if the file is declared as some flavor of Unicode". How can I detect that? |
Look for the
I would argue that encoding isn't really a structure thing because if the page contains 日本国, that's the content causing the error from |
It's not a matter of "allowed on the page", but how it's represented. The point of the rule is that if you have left curly double quotes, that should be Maybe it should be "If you have a character that can be represented as an HTML entity but isn't"? |
Then again, if you want to check the |
That would not work for us. For example, we're writing a game and one of the items in the game is named |
It's valid to my job, because we want to encode everything we can. We get stuff from other departments that are just cut & pasted from Word and want to make sure we know what we're putting out there. But HTML::Lint was created back in 2005 and so it sounds like I'm also hearing that in modern web dev that kind of dropping in |
Then offering a way to exclude errors that are useful for some, but detrimental to others, sounds like the best compromise. |
Agreed. I created a ticket at #54. |
IO::HTML will determine the encoding for you. With that, it should be easier to figure out if a character is valid or not. If it's declared as ISO-8859-1, then yes, you want HTML encoding. Otherwise, you probably won't need it (so long as you have valid byte sequences, but even then you'd have to check that the encoded byte sequences are valid for the declared encoding). |
@petdance: yes, if this change is available, we'll be able to keep using this module. Thank you for your help. |
I agree with @Ovid here. Currently we're using HTML::Lint::Pluggable to override/ignore certain error messages in the meantime. |
Reported by sciurus, Sep 26, 2008
Try to parse a string containing characters such as “directional quotation
marks”. HTML::Lint 2.04 reports invalid character errors.
Comment 1 by gavin.brock, Dec 8, 2009
To add to this, I think that if the file is declared as some flavor of Unicode (e.g. ) shouldn't the "
Invalid character \x65E5 should be written as" errors not be there at all.
Comment 2 by bishopw, Feb 13, 2011
I run into this when trying to run HTML::Lint on a set of pages with Japanese text.
If my name in Japanese, ビショップ, appears on the page, HTML::Lint tells me:
(The line ends where it looks like a different encoding suggestion should be.)
My pages do all include the meta tag declaring charset=utf-8.
Is it possible to increase the priority on this, since it makes HTML::Lint unusable for sites using utf-8, which is becoming the Internet default encoding?
The text was updated successfully, but these errors were encountered: