Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing semicolon after html entity sometimes returns generic 'unknown entity' warning instead of specific 'missing semicolon' #862

Closed
TZubiri opened this issue Feb 19, 2020 · 2 comments

Comments

@TZubiri
Copy link

TZubiri commented Feb 19, 2020

The following html:

<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8">
    <title>Missing semicolon</title>
  </head>
  <body>
    <p>S&ampP500</p>
    <p>&amp</p>
  </body>
</html>

Produces 2 errors, each corresponding to the

lines

line 8 column 9 - Warning: unescaped & or unknown entity "&ampP500"
line 9 column 8 - Warning: entity "&amp" doesn't end in ';'

Browsers output an ampersand correctly in both cases. This behaviour is specified in the standard below:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-semicolon-after-character-reference

missing-semicolon-after-character-reference | This error occurs if the parser encounters a character reference that is not terminated by a U+003B (;) code point. Usually the parser behaves as if character reference is terminated by the U+003B (;) code point; however, there are some ambiguous cases in which the parser includes subsequent code points in the character reference.

So tidy should output the 'entity doesn't end in ;' warning in both cases.

Regards.

@geoffmcl
Copy link
Contributor

@TZubiri in general, agree that it might be nice if we had more explicit, detailed, multiple, warning messages, but...

But in this case I think the two messages output from your sample seem very sufficient...

The first, it an unknown entity... that is the simple lookup of ampP500 failed... so it seems un-important to also mention that it does not end in a ;... but could... or not...

I hope you are not suggesting tidy scrabble around, and somehow note that it begins with amp, a well known entity... that might be possible, but not easy to do... but to what avail?

So at the moment, see no open issue here...

Look forward to further feedback, comments, code, patches, etc to move this issue beyond discussion... thanks...

@geoffmcl
Copy link
Contributor

20210411: Review...

@TZubiri, as advised, do not presently see the open issue here... and no further feedback in over 6 months... so closing this for now...

Look forward to further feedback, comments, code, patches, etc, to move this issue forward... or not... thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants