"Malformed" Word 2000 sequence may cause Tidy to skip document content

The following "malformed" Word 2000 sequence causes Tidy to skip document content (notice the `extra` characters):

 `<![endif]extra>`

Reason is that when Tidy sees `<![` not followed by `CDATA[`, it expects a Word 2000 sequence like this:

`<![endif]>`

In particular, Tidy expects the above sequence to terminate in `]>` or `]-->`, which neither HTML specification nor modern browser does.

As a result, Tidy skips content because as it looks for `]>`, possible until the end of the document.

Without testing, code in lexer.c suggest that similar "malformed" ASP, JSTE, and PHP sequences might likewise throw Tidy off track.

AFAIK, none of the four sequences have ever been covered by any of the HTML specs. I strongly recommend options to disable parsing them. Suggestions:
-   TidyParseWord2000
-   TidyParseASP
-   TidyParseJSTE
-   TidyParsePHP


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Malformed" Word 2000 sequence may cause Tidy to skip document content #462

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Malformed" Word 2000 sequence may cause Tidy to skip document content #462

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions