New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle info-string in fenced code blocks #503

Closed
ad-si opened this Issue Mar 29, 2018 · 14 comments

Comments

Projects
None yet
2 participants
@ad-si

ad-si commented Mar 29, 2018

According to the commonmark spec
an info-string may contain any character except backticks (tildes).

This means following syntax (used by pandoc) should be a valid fenced code block:

```{.literate .haskell}
main = putStr "Hello World!"
```

Kramdown, however, ignores it and handles it like normal plain text.

@gettalong gettalong self-assigned this Mar 29, 2018

@gettalong

This comment has been minimized.

Owner

gettalong commented Mar 29, 2018

Please see the kramdown specification for how kramdown handles this (essentially you can append language?optionspart to the starting line of a fenced codeblock to specify the language; whats usable depends on the used syntax highlighting engine).

If you want to specify arbitrary key-value pairs for a codeblock, use the standard block IAL syntax.

@gettalong gettalong closed this Mar 29, 2018

@ad-si

This comment has been minimized.

ad-si commented Mar 29, 2018

Yeah I read the specification. I didn't see anything about why

```{.literate .haskell}
main = putStr "Hello World!"
```

shouldn't be converted to proper <pre><code></code></pre> HTML.
Looks like a bug to me.

Correctly specifying the language and arbitrary key-value pairs would only be the next step ...

@gettalong

This comment has been minimized.

Owner

gettalong commented Mar 30, 2018

The spec says: "Fenced code blocks provide an easier way to specify the language, namely by appending the language of the code block to the end of the starting line".

If you append something that doesn't look like a language tag, the starting line is not identified as a fenced code block, so it will be interpreted as a paragraph. You might also wanna either change to the standard ~~~ syntax or use the GFM parser if you want to use ```.

So, basically, if you wanna use kramdown, use its syntax, not the syntax of some other Markdown converter since all flavours handle things differently.

What you probably want to use with a literal haskell codeblock and when using rouge:

~~~ literate_haskell
main = putStr "Hello World!"
~~~
{:.some_css_class_i_wanna_use}
@ad-si

This comment has been minimized.

ad-si commented Mar 30, 2018

This looks like a language tag to me: {.literate .haskell}
Sorry, but this spec seems really unintuitive.
I don't think that anyone wants to write ~~~{.literate .haskell} as a normal paragraph 🤨.
You could probably change the implementation to behave like commonmark and it wouldn't break anyone's code. (Btw commonmark supports ~~~ and ```, so that's not the problem)

I don't care about the markdown converter, but I want to use GitHub pages and pandoc
and if Kramdown would handle this more intuitively (like commonmark) everything would be fine.

I need exactly ~~~{.literate .haskell} so that pandoc can parse it.
(I also opened an issue for pandoc to also support ~~~lhs, which would fix the problem as well: jgm/pandoc#4510)

@gettalong

This comment has been minimized.

Owner

gettalong commented Mar 30, 2018

I don't think that {.literate .haskell} is easier to understand than literate_haskell:

  • What's up with the braces?
  • Why the dots? Are they part of the language name?
  • Why do you define two languages to parse, literate and haskell?

It looks more like an IAL with class definitions (but missing the colon).

Also:

  • If you want to assign CSS classes to any kramdown block element, just use the standard block IAL syntax. This works for fenced code blocks.

  • You can even use the IAL form to assign the syntax highlighting language, see the documentation.

  • If you want to parse something with pandoc, then just write your Markdown for the Pandoc Markdown variant and you won't have any compatibility problems. As far as I know you can use Pandoc with Jekyll. If you need to use multiple different Markdown variants, use the lowest common denominator syntax (i.e. mostly only the original Markdown syntax), everything else will just not work in all cases because there are too many differences...

@ad-si

This comment has been minimized.

ad-si commented Mar 31, 2018

My point is:
Why is ~~~!{${whatever-here-stands{${){$]{ not handled the same way as ~~~ruby?

This is inconsistent and there is no different use case for the first one!
What I want to use this tag for is completely irrelevant...

@gettalong

This comment has been minimized.

Owner

gettalong commented Mar 31, 2018

Hmm... using kramdown --syntax-highlighter nil this

~~~!{${whatever-here-stands{${){$]{
etst
~~~

results in

<pre><code class="language-!{${whatever-here-stands{${){$]{">etst
</code></pre>

and this

~~~ruby
etst
~~~

results in

<pre><code class="language-ruby">etst
</code></pre>

So both are handled in exactly the same way.

@ad-si

This comment has been minimized.

ad-si commented Apr 1, 2018

Ah, I see. We're getting there!
The problem is that kramdown handles whitespace in the tag incorrectly:

~~~{.literate.haskell}
etst
~~~

~~~{.literate .haskell}
etst
~~~

yields

<pre><code class="language-{.literate.haskell}">etst
</code></pre>

<p>~~~{.literate .haskell}
etst
~~~</p>

The matching regex should be changed to something like [^\r\n]*

@gettalong

This comment has been minimized.

Owner

gettalong commented Apr 2, 2018

You always assume that kramdown does something incorrectly - why?

I have yet to see a language tag with a whitespace in its name, see for example https://github.com/jneen/rouge/wiki/List-of-supported-languages-and-lexers. Furthermore, the whitespace would have to be removed from the parsed language tag name anyway because otherwise language-tagname would not work.

On another note: The language tag you want is literate_haskell, not {.literate.haskell} as that won't be recognized by syntax highlighter I know of.

@ad-si

This comment has been minimized.

ad-si commented Apr 2, 2018

You always assume that kramdown does something incorrectly - why?

Because Commonmark is the correct way as far as I'm concerned. I don't like every decision they made, but it's the only proper specification for Markdown and I think moving forward all Markdown parsers should be Commonmark compatible. (you still can put your on syntax extensions on top).

Ironically the GitHub comments we're just using are also Commonmark compatible.
(There is even an announcement blog post: https://githubengineering.com/a-formal-spec-for-github-markdown/). They convert it like this:

~~~{.literate .haskell}
etst
~~~

<pre lang="{.literate"><code>etst
</code></pre>

Furthermore, specification aside, why would I want it to be converted to

<p>~~~{.literate .haskell}
etst
~~~</p>

?
I don't think that anyone would ever write that markdown without assuming that it will be recognized as a fenced code block. (No matter what the language tag will be)

won't be recognized by syntax highlighter I know of

pandoc + skylighting recognize it

@ad-si

This comment has been minimized.

ad-si commented Apr 2, 2018

Btw: literate_haskell would be incorrect anyways, because it says the fenced code block contains literate Haskell, which is either LaTeX with special code blocks, or plain text with > indented code blocks.
But that's not what {.literate .haskell} means. Instead it means that this fenced code block contains normal Haskell, but when the markdown is converted to LaTeX, that this code block is a special literate Haskell code block.

@gettalong

This comment has been minimized.

Owner

gettalong commented Apr 2, 2018

Because Commonmark is the correct way as far as I'm concerned.

Please, if you want to use commonmark, then just use commonmark! There is no reason to downgrade kramdown to commonmark.

Furthermore, specification aside, why would I want it to be converted to [...]

You probably don't but you didn't follow the syntax rules so the library did what you told it to do. And when looking at the result it is obvious that you didn't follow the syntax rules.

pandoc + skylighting recognize it

So Pandoc does something special so that skylighting knows what to do. That's different to skylighting recognizing {.literate .haskell} as a language tag...

@gettalong

This comment has been minimized.

Owner

gettalong commented Apr 2, 2018

Since you are referring to pandoc, it seems that you want to assign classes to a fenced codeblock, as described at https://pandoc.org/MANUAL.html#fenced-code-blocks. And since pandoc doesn't have the general block/span IAL feature of kramdown, it provides a special syntax. However, this is not needed with kramdown.

So I think what you want is the following:

~~~ haskell
main = putStr "Hello World!"
~~~
{:.literate .haskell}

This would give the following HTML output:

<pre class="literate haskell"><code class="language-haskell">main = putStr "Hello World!"
</code></pre>
@ad-si

This comment has been minimized.

ad-si commented Apr 2, 2018

Please, if you want to use commonmark, then just use commonmark! There is no reason to downgrade kramdown to commonmark.

Actually, I thought github-pages only supports kramdown, but I just found out that there has been a commonmark plugin available for a few months: https://github.com/github/jekyll-commonmark-ghpages. So yeah, I'm going to use this.

you didn't follow the syntax rules

Well, the syntax rules are really unintuitive and they should be fixed. But that's just my 2 cents.

So I think what you want is the following

What I want is first class support for markdown + fenced-code blocks as literate Haskell.
Until this happens I'll have to use this workaround to execute the markdown:

cat document.md \
| sed 's/```haskell/```{.literate .haskell}/g' \
| pandoc \
  --from markdown \
  --to markdown+lhs \
  --output temp.lhs \
| stack runhaskell \
  --resolver lts-11.1 \
  -- \
  temp.lhs \
; rm -f temp.lhs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment