Skip to content
This repository

Allow HTML4 style comments #389

Closed
jberger opened this Issue October 04, 2012 · 13 comments

6 participants

Joel Berger Sebastian Riedel Marcus Ramberg Abhijit Menon-Sen Glen Hinkle Douglas Christopher Wilson
Joel Berger
Collaborator

Thanks to a blog comment from Tom Molesworth. Apparently the HTML4 spec allowed for comments of the form

<!--Comment Text--\s*> 

though HTML5 has removed this (mis)feature. His example is available here.

Sebastian Riedel
Owner

-1 from me. While this is one of the less awkward HTML4 features and the patch in #390 looks pretty good, i'm generally against supporting anything the HTML5 authors considered too esoteric to be included in the spec.

Marcus Ramberg
Collaborator

+1 for me, it's pragmatic to support what browsers can render.

Sebastian Riedel
Owner

@marcusramberg Disagree, HTML4 inherited many very esoteric SGML features that would be a total nightmare to support. http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.3.7

Abhijit Menon-Sen
Collaborator
Joel Berger
Collaborator

Truthfully I was back and forth on even filing this, but it was so easy to write the patch (and Tom had done all the legwork on the documentation) that I did anyway. I'm totally neutral on inclusion here.

Glen Hinkle
Collaborator

I am completely apathetic about this. Given that the submitter is neutral, I'm defaulting to -1.

Joel Berger
Collaborator

Let me say, the biggest reason I went ahead with this report/patch is that Tom's example shows a really bad result of a real-world bit of markup. Since the fix was easy I thought that it went with the spirit of

Mojo::DOM is a minimalistic and relaxed HTML/XML DOM parser with CSS selector support. It will even try to interpret broken XML, so you should not use it for validation.

taken from the Mojo::DOM docs.

Joel Berger
Collaborator

Further, as seen in this gist when HTML4 comments are encountered, they are seen as a tag, each word of the text is parsed as individual attributes to that tag, and any subsequent HTML is seen being a child of that tag. Therefore if Mojo::DOM does happen to encounter such a tag in the real world, the parse is rather broken; given such an easy fix, I would lean to say that its inclusion is worth another thought. Note that this use need not be encouraged nor even mentioned.

Douglas Christopher Wilson

Thanks, @jberger I was just about to comment on the incorrect handling of that comment according to HTML5 parsing rules. According to the HTML5 parsing rules, that example document should actually have a comment with the content

 This is a valid comment -- >
  <p>Here's a paragraph</p>
 </body>
</html>

So i.m.o. if the argument against parsing the HTML4-style comment is that the parser is to be HTML5, then the bug would be that the comment is still not parsed correctly. The bug here would be that the comment regular expression requires a comment end with -->, though the comment can also be terminated by EOF.

Glen Hinkle
Collaborator

Joel's response seems reasonable. I change my vote to +1 accordingly, though I realize I get no credit for this because the pull request has already been merged.

Marcus Ramberg
Collaborator
Sebastian Riedel
Owner

Thanks all, the patch has now been merged. Supporting all HTML4 quirks will never be a goal for Mojo::DOM, but being a little more relaxed is a good thing and perfectly in line with its philosophy.

Sebastian Riedel kraih closed this October 05, 2012
Joel Berger
Collaborator

@tempire I gladly acknowledge your noble support :-)

Thanks for the pull! I'm glad to help when I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.