Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken html detection #577

Closed
fabiospampinato opened this issue Jul 26, 2019 · 7 comments
Closed

Broken html detection #577

fabiospampinato opened this issue Jul 26, 2019 · 7 comments

Comments

@fabiospampinato
Copy link

fabiospampinato commented Jul 26, 2019

The CommonMark spec says that newlines are allowed inside script, pre and style tags, because their ending condition is only the presence of the their closed tag.

This is broken in markdown-it as the following:

<script>
foo

bar
</script>

<style>
foo

bar
</style>

<pre>
foo

bar
</pre>

Is being rendered (permalink) as such:

<p>&lt;script&gt;
foo</p>
<p>bar
&lt;/script&gt;</p>
<p>&lt;style&gt;
foo</p>
<p>bar
&lt;/style&gt;</p>
<p>&lt;pre&gt;
foo</p>
<p>bar
&lt;/pre&gt;</p>

While CommonMark renders (permalink) it as:

<script>
foo

bar
</script>
<style>
foo

bar
</style>
<pre>
foo

bar
</pre>
@puzrin
Copy link
Member

puzrin commented Jul 26, 2019

Mmmmm... can this be caused by disabled HTML in markdown-it permalink :) ?

This seems to work (see checkboxes on top).

HTML is disabled by default for security reasons.

@fabiospampinato
Copy link
Author

fabiospampinato commented Jul 26, 2019

That fixes it for the snippet I posted, but I actually have that option enabled already in my code and I'm still seeing this issue under some circumstances. For instance the following:

<div>
  <script>
    Foo
    Bar
  </script>
</div>

Is being rendered (permalink) as:

<div>
  <script>
    Foo
    Bar
  </script>
</div>

While this:

<div>
  <script>
    Foo
 
    Bar
  </script>
</div>

Is being rendered (permalink) as:

<div>
  <script>
    Foo
<pre><code>Bar
</code></pre>
  </script>
</div>

I get the same output from CommonMark's own implementation too, so I guess both implementations are buggy in this particular regard? Unless I'm misunderstanding something the spec seems to say pretty clearly that those snippets should be recognized as HTML.

@puzrin
Copy link
Member

puzrin commented Jul 26, 2019

I get the same output from CommonMark's own implementation too, so I guess both implementations are buggy in this particular regard?

In this case, more high probability is that you misunderstood spec about block HTML tags (it's really a bit cryptic).

As far as i remember, each HTML block can not contain empty lines inside (empty line terminates it).

<div>
  <script>
    Foo
 
    Bar
  </script>
</div>

This is considered as

  • 3 lines of html (line 4 empty => stop)
  • Bar as markdown (4 spaces => code block)
  • 2 lines of html (previous block terminated because indent too small)

See spec, it should have all this details and more examples.

@fabiospampinato
Copy link
Author

It seems that you're correct, the issue here is that since the html starts with <div> it will end as soon as an empty line is found. By removing the wrapper <div> then the script tag will be parsed correctly.

I'm closing this as it seems to actually be spec-compliant.

But I'm going to open an issue in CommonMark's own repo as this make absolutely no sense to me.

@fabiospampinato
Copy link
Author

Upstream issue: commonmark/commonmark-spec#597

@puzrin
Copy link
Member

puzrin commented Jul 26, 2019

I'd suggest to search forum first. Don't remember link, but current HTML rules were discussed very long time and very deep, with several iterations. It's current state is compromiss of hundreds edge cases.

In short - it's not possible to invent anything new, not known by spec authors :).

Probably, you may wish to use https://github.com/npm/marky-markdown instead. That's retuned markdown-it, to be more close to github (used to render readme-s on npm registry).

@fabiospampinato
Copy link
Author

In short - it's not possible to invent anything new, not known by spec authors :).

I don't agree this sentiment, if the spec were perfect there wouldn't have been multiple revisions to it in the first place.

I don't think the authors agree with that either, as for this specific issue one said:

We could talk about specific ways to change it (but please, nothing that requires unlimited backtracking).

Probably, you may wish to use npm/marky-markdown instead. That's retuned markdown-it, to be more close to github (used to render readme-s on npm registry).

That's interesting, I'll take a look at that, thanks!

I'd prefer to use something as portable as possible though, i.e. something that has a spec behind it. The minor issues I've found so far seem to arise because perhaps CommonMark was written focussing too much on machines rather than humans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants