New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"HTML blocks" section vague, doesn't match parsers #177
Comments
If I recall, my thinking was that someone might start a block with an open tag with lots of attributes. If this gets hard-wrapped, the first line will be an incomplete tag. Since I was going for a parsing strategy that could be handled line-by-line (without lookahead), that motivated allowing HTML blocks to start with incomplete tags. |
Hmm... this isn't very consistent, because in order to parse this HTML block:
You need lookahead, since incomplete comments aren't allowed. Same with CDATAs, etc, etc. |
@jmendeth, for consistency I think we need to say that This will allow some things that aren't valid HTML to be passed through as raw HTML. But I think that's okay -- Markdown has always let users pass through invalid HTML. |
I'm fine with this. The section needs to be rewritten then, I suppose? And the same with CDATA and the other types? |
Yes, it needs to be rewritten -- and also CDATA, etc. |
I also would like to have a more strict definition of what counts as incomplete tag. At the moment,
doesn't count as HTML block, but
does (notice the trailing space). That's not very intuitive. (Tested with commonmark.js.) |
There are more serious problems. According the spec, this:
is not an HTML block, but both CMark and CommonMark will accept it as such. |
On reflection, I think it would make sense to make an exception to the usual rule form HTML blocks (they end at a blank line) for the following elements: |
While this exception would of course be useful, I'm not sure it's okay to complicate the HTML blocks syntax even more. I'd leave it as it is for now, get it matching on the spec, and if the community agree we can add it (it's backwards compatible, so no problem). |
The suggestion was in the service of fixing comments. I don't think we want this to count as a link reference definition inside a comment:
Nothing inside a comment should be parsed as Markdown. My original suggestion, treating |
Okay, I see what you mean now. Let me make a suggestion that will hopefully keep the syntax simple while still providing these features:
Raw text elements (
Therefore we should not make an exception for |
+++ Xavier Mendez [Feb 02 15 03:11 ]:
The problem with this, as explained above, is that it requires potentially indefinite backtracking in the parsers. We want to avoid that. I think that allowing partially formed tags is a reasonable price for avoiding backtracking.
I suppose we could mkae an exception for pre. (It seems a bit odd to me to include the caption in the pre element, though.) |
I misunderstood it then, sorry. When you said earlier: "I think to deal with this properly we must keep the HTML block open til we hit an appropriate closer", what should we do if we don't hit an appropiate closer?
Semantically it makes sense. |
+++ Xavier Mendez [Feb 02 15 08:04 ]:
No. If I forgot to mention it before, it would work like unclosed fenced code blocks currently work: the entire remainder of the document would be considered part of the HTML block.
Okay. |
I've been fiddling around with the idea, and it looks awkward when you write it. But I'm curious, how would you define HTML blocks in the spec then? |
+++ Xavier Mendez [Feb 02 15 09:45 ]:
It's not easy to see how to do it, exactly, since the matching closing The more I think about raw HTML, the more I think we should just require
context, with these tags on lines by themselves. This would give |
Just define HTML blocks in general. For instance, how is a comment parsed? |
+++ Xavier Mendez [Feb 02 15 10:03 ]:
Same problem here as with script, style. Defining the start of the Note that Gruber's original Markdown syntax guide says that in HTML |
Ah, you read my mind. Browsers do backtrack, so they would never parse this:
as a single comment. But let's go on, how would you define a generic tag HTML block? |
+++ Xavier Mendez [Feb 02 15 10:17 ]:
I think I'm okay with that - garbage in, garbage out.
Same idea. Block starts with a line beginning with a (possibly partial) Block ends with:
So, yes, this would lead to things being parsed as raw HTML blocks that |
Imagine this:
According to that rule, the above is an HTML block of four lines. |
+++ Xavier Mendez [Feb 02 15 10:35 ]:
Well, I'm trying to keep things simple. I don't think it's a priority to allow authors to do things like this (blank lines in attributes). Keeping the rule simple and making the blocks easier to parse seems to me something that needs to be weighed against the importance of fringe cases like this. |
Let's do it, then. I personally would like simple rules but if you want to avoid backtracking at all costs... |
Hmm, I like that proposal! As long as it's well defined, this issue can be closed when / if it's implemented in the spec. |
Great! 🎉 I'll try to implement the new section ASAP. |
The HTML blocks section provides the definition of an HTML block tag.
Later on, the following example is presented:
The example header says: "An incomplete HTML block tag may also start an HTML block"
The problem with this is: what's an incomplete HTML block tag? There's no definition that allows me to parse the syntax for such a tag. Does a single
<
count as an incomplete tag? According to the [current] rules of HTML block parsing, the above would render as<p><div class\nfoo</p>
.But even if we do provide a definition, what's the point in allowing incomplete tags? Could that be useful sometimes? And, more importantly, if an incomplete block tag can start a block, why can't an incomplete comment, or an incomplete CDATA section, or an incomplete processing instruction start a block?
The text was updated successfully, but these errors were encountered: