Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition lists: HTML comment edge case #7778

Open
xrat opened this issue Dec 27, 2021 · 9 comments
Open

Definition lists: HTML comment edge case #7778

xrat opened this issue Dec 27, 2021 · 9 comments

Comments

@xrat
Copy link

xrat commented Dec 27, 2021

In the following minimal example the HTML comment <!-- (…) --> is falsely not recognized as a comment:

Term
: Def
<!--
: comment def
-->

Pandoc v2.16.2 with pandoc --from markdown --to html5 produces

<dl>
<dt>Term</dt>
<dd>Def &lt;!–
</dd>
<dd>comment def –&gt;
</dd>
</dl>

pandoc --strip-comments produces the same output as above. A workaround is to put any character in front of the commented :. In other words, the commented : (or a ~) is what triggers the bug.

@xrat xrat added the bug label Dec 27, 2021
@mb21
Copy link
Collaborator

mb21 commented Dec 28, 2021

for what it's worth, the following works:

Term
: Def
<!-- : comment def
-->

not sure this is a bug... just an edge case in the pandoc's markdown syntax...

@mb21 mb21 removed the bug label Dec 28, 2021
@xrat
Copy link
Author

xrat commented Dec 28, 2021

@mb21 your suggested workaround is not feasible for larger comments b/c any line starting with : triggers the bug. IMHO this warrants the label bug.

@tarleb
Copy link
Collaborator

tarleb commented Dec 28, 2021

FWIW, multimarkdown and kramdown give the same result as pandoc.

Pandoc's CommonMark parser, with the definition_lists extension enabled, behaves more like you expect; try with --from=commonmark_x.

@jgm
Copy link
Owner

jgm commented Dec 28, 2021

This happens because the way the def list parser works is by gobbling up raw lines comprising the definition, and then parsing after the fact. This method isn't sophisticated enough to skip a multiline HTML comment. Here's another case worth considering.

Term
: Def
test <!--
: comment def
and -->

Note that this case is parsed by the commonmark+definition_list parser as

<dl>
<dt>Term</dt>
<dd>Def test &lt;!–
</dd>
<dd>comment def and –&gt;
</dd>
</dl>

which is correct given the commonmark principle that block-level structure takes precedence over inline-level structure.

@gpoore
Copy link

gpoore commented Dec 29, 2021

This doesn't just happen with HTML comments. It can also occur with other inline elements like inline code. The output makes sense based on the parsing algorithm, but it is somewhat surprising from a user perspective to discover that the validity of inline code depends on line break locations.

Here's an example with inline code:

Term
: Def
`code
: comment def
more code`

Pandoc Markdown produces this:

<dl>
<dt>Term</dt>
<dd>Def `code
</dd>
<dd>comment def more code`
</dd>
</dl>

And CommonMark (-f commonmark+definition_lists) gives the same thing. Simply removing the line break before : comment results in valid inline code.

@jgm
Copy link
Owner

jgm commented Dec 29, 2021

If you indent your definitions properly, you're less likely to run into problems like this:

Term
:   Def
    `code
:   comment def
    more code`

versus

Term
:   Def
    `code
    :   comment def
    more code`

@mb21 mb21 closed this as completed Dec 30, 2021
@xrat
Copy link
Author

xrat commented Dec 30, 2021

I am very thankful for Pandoc and its contributors. So, please excuse me asking for a clarification why in this case it is acceptable that an HTML comment of type <!-- (...) --> is not parsed as a comment by Pandoc's Markdown whereas I do not know any other such case.

@jgm jgm reopened this Dec 30, 2021
@jgm
Copy link
Owner

jgm commented Dec 30, 2021

I don't think the issue should have been closed.

@fumiyas
Copy link

fumiyas commented Feb 1, 2022

Same here with list:

* foo
<!--
* bar
-->
* baz
   * baz
<!--
* qux
-->
* end
$ pandoc --from markdown --to html5 list-with-comment.md
<ul>
<li>foo <!--
* bar
--></li>
<li>baz
<ul>
<li>baz &lt;!–</li>
</ul></li>
<li>qux –&gt;</li>
<li>end</li>
</ul>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants