Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNSAFE mode inserts <p> tags inside <pre> tag #102

Closed
ashmaroli opened this issue Aug 15, 2019 · 4 comments
Closed

UNSAFE mode inserts <p> tags inside <pre> tag #102

ashmaroli opened this issue Aug 15, 2019 · 4 comments

Comments

@ashmaroli
Copy link

Sample script

require 'commonmarker'

TEXT = <<~HTML
  <div class="highlight"><pre class="language-text" data-lang="text">    [1,] [2,]


  [1,] ALPHA BETA
  [2,] BETA ALPHA
  </pre></div>
HTML

puts
puts CommonMarker.render_html(TEXT, :UNSAFE)

Result

<div class="highlight"><pre class="language-text" data-lang="text">    [1,] [2,]
<p>[1,] ALPHA BETA
[2,] BETA ALPHA
</pre></div></p>
@kivikakk
Copy link
Collaborator

This behaviour is per-spec, and isn't related to the safe/unsafe mode — see the CommonMark dingus (be sure to choose HTML on the right; you may have to refresh the page to load it properly).

Read CommonMark §4.6 — in short, your <div … is triggering a HTML block with the start condition #7, meaning the block is considered ended at the next newline. That means the line [1,] ALPHA BETA is being treated as a new paragraph, and a <p> is inserted accordingly.

To hit start condition 1, you'll need to separate the <div … and <pre … by a line, like this:

<div class="highlight">

<pre class="language-text" data-lang="text">    [1,] [2,]

[1,] ALPHA BETA
[2,] BETA ALPHA
</pre></div>

This will cause a HTML block to span the entire <pre…</pre> section.

@ashmaroli
Copy link
Author

Thank you for your detailed response @kivikakk with the links to the emulator and spec doc.
So, CommonMark is not (minified-HTML + Markdown) friendly. Sad.

However, spec or no spec, don't you think its weird that:

  • an opening <p> is inserted within the <pre> block
  • and the closing </p> is outside it (after the closing </pre>)

I shall try and take this up to the author(s) of the CommonMark..

@kivikakk
Copy link
Collaborator

No problems!

It's weird, but one big rule for CommonMark is that it does not contain a full HTML parser, as that would significantly increase the complexity of implementing the CommonMark spec. In general, you can't paste in arbitrary HTML and hope that it'll be preserved as-is.

Hence why the <p> is added after: the parser starts a HTML block when it sees the <div> and completely ignores everything it sees (including the <pre>) until it hits a blank line, simply inserting it verbatim into the output. The next line the parser actually considers is regular text ([1,] ALPHA BETA) and so begins a paragraph, generating the <p> at that point.

It's weird, but it's probably not going to change. The set of start conditions in §4.6 is a way to try to make it easier to embed some HTML without going all the way.

(fwiw, while I'm not an author of the CommonMark spec itself, I have given input in the past and maintain a few of popular parsers for it.)

@ashmaroli
Copy link
Author

simply inserting it verbatim into the output

This is probably the very reason why the </p> is inserted after the closing </pre>,
Since this behavior is reproducible via both commonmark.js and cmark, I can see that it is not going to be "fixed". But if you ever plan on "fixing" the weirdness, feel free to use the snippet I posted here.

Thanks once again, Ashe. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants