Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Allow HTML4 style comments #389

Closed
jberger opened this Issue · 13 comments

6 participants

@jberger
Collaborator

Thanks to a blog comment from Tom Molesworth. Apparently the HTML4 spec allowed for comments of the form

<!--Comment Text--\s*> 

though HTML5 has removed this (mis)feature. His example is available here.

@kraih
Owner

-1 from me. While this is one of the less awkward HTML4 features and the patch in #390 looks pretty good, i'm generally against supporting anything the HTML5 authors considered too esoteric to be included in the spec.

@marcusramberg
Collaborator

+1 for me, it's pragmatic to support what browsers can render.

@kraih
Owner

@marcusramberg Disagree, HTML4 inherited many very esoteric SGML features that would be a total nightmare to support. http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.3.7

@amenonsen
Collaborator
@jberger
Collaborator

Truthfully I was back and forth on even filing this, but it was so easy to write the patch (and Tom had done all the legwork on the documentation) that I did anyway. I'm totally neutral on inclusion here.

@tempire
Collaborator

I am completely apathetic about this. Given that the submitter is neutral, I'm defaulting to -1.

@jberger
Collaborator

Let me say, the biggest reason I went ahead with this report/patch is that Tom's example shows a really bad result of a real-world bit of markup. Since the fix was easy I thought that it went with the spirit of

Mojo::DOM is a minimalistic and relaxed HTML/XML DOM parser with CSS selector support. It will even try to interpret broken XML, so you should not use it for validation.

taken from the Mojo::DOM docs.

@jberger
Collaborator

Further, as seen in this gist when HTML4 comments are encountered, they are seen as a tag, each word of the text is parsed as individual attributes to that tag, and any subsequent HTML is seen being a child of that tag. Therefore if Mojo::DOM does happen to encounter such a tag in the real world, the parse is rather broken; given such an easy fix, I would lean to say that its inclusion is worth another thought. Note that this use need not be encouraged nor even mentioned.

@dougwilson

Thanks, @jberger I was just about to comment on the incorrect handling of that comment according to HTML5 parsing rules. According to the HTML5 parsing rules, that example document should actually have a comment with the content

 This is a valid comment -- >
  <p>Here's a paragraph</p>
 </body>
</html>

So i.m.o. if the argument against parsing the HTML4-style comment is that the parser is to be HTML5, then the bug would be that the comment is still not parsed correctly. The bug here would be that the comment regular expression requires a comment end with -->, though the comment can also be terminated by EOF.

@tempire
Collaborator

Joel's response seems reasonable. I change my vote to +1 accordingly, though I realize I get no credit for this because the pull request has already been merged.

@marcusramberg
Collaborator
@kraih
Owner

Thanks all, the patch has now been merged. Supporting all HTML4 quirks will never be a goal for Mojo::DOM, but being a little more relaxed is a good thing and perfectly in line with its philosophy.

@kraih kraih closed this
@jberger
Collaborator

@tempire I gladly acknowledge your noble support :-)

Thanks for the pull! I'm glad to help when I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.