Skip to content

Missing semicolon after html entity sometimes returns generic 'unknown entity' warning instead of specific 'missing semicolon' #862

Closed
@TZubiri

Description

@TZubiri

The following html:

<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8">
    <title>Missing semicolon</title>
  </head>
  <body>
    <p>S&ampP500</p>
    <p>&amp</p>
  </body>
</html>

Produces 2 errors, each corresponding to the

lines

line 8 column 9 - Warning: unescaped & or unknown entity "&ampP500"
line 9 column 8 - Warning: entity "&amp" doesn't end in ';'

Browsers output an ampersand correctly in both cases. This behaviour is specified in the standard below:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-semicolon-after-character-reference

missing-semicolon-after-character-reference | This error occurs if the parser encounters a character reference that is not terminated by a U+003B (;) code point. Usually the parser behaves as if character reference is terminated by the U+003B (;) code point; however, there are some ambiguous cases in which the parser includes subsequent code points in the character reference.

So tidy should output the 'entity doesn't end in ;' warning in both cases.

Regards.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions