Skip to content

Unexpected parsing with uppercase DOCTYPE #815

@da2x

Description

@da2x

From my blog post detailing the issue. Both of the following two test case documents trigger Info: Document content looks like HTML5 message but produce different results (see below).

Test case 1 (lower case doctype):

echo '<!DOCTYPE html><a href="#"><p>Text</p></a>' | tidy

Test case 1 output (works as expected):

<!DOCTYPE html>
<html>
<body>
<a href="#">
<p>Text</p>
</a>
</body>
</html>

Test case 2 (upper case doctype):

echo '<!DOCTYPE HTML><a href="#"><p>Text</p></a>' | tidy

Test case 2 output (broken; uses legacy pre-html5 parsing):

<!DOCTYPE html>
<html>
<body>
<a href="#"></a>
<p>Text</p>
</body>
</html>

The spec clearly states that the DOCTYPE should be parsed case-insensitively.

(Output in both examples have been trimmed to focus on the differences.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions