Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfmExtensions parser drops out of list when | is encountered #95

Closed
andreasabel opened this issue Jul 19, 2022 · 15 comments · Fixed by #111
Closed

gfmExtensions parser drops out of list when | is encountered #95

andreasabel opened this issue Jul 19, 2022 · 15 comments · Fixed by #111

Comments

@andreasabel
Copy link
Contributor

( Lifted from

Rendering

- foo
- `a|b`
- bar

with gfmExtensions included in the parser gives this:

  • foo

- a|b

  • bar
rather than this:
  • foo
  • a|b
  • bar

Reproducer:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Commonmark
import Commonmark.Extensions

import Data.String.QQ (s)
import Data.Text.Lazy.IO as TLIO

main :: IO ()
main = do
  commonmarkWith (gfmExtensions <> defaultSyntaxSpec) "inline" input >>= \case
    Left e                  -> error (show e)
    Right (html :: Html ()) -> TLIO.putStr $ renderHtml html
  where
  input = [s|
- foo
- `a|b`
- bar
|]

{- Produces:

<ul>
<li>foo
</li>
</ul>
<p>- <code>a|b</code></p>
<ul>
<li>bar
</li>
</ul>

Seems like the gfmExtensions parser cannot deal with `|`.
-}
@jgm
Copy link
Owner

jgm commented Jul 19, 2022

I think what's happening is that the table extension is kicking in, because of the |.
I think the current behavior of this extension is to regress and parse the lines as paragraph content if a table heading line doesn't follow -- obviously, this isn't what is wanted in the current case. In fact, this case may interact badly with the commonmark parser's goal of parsing line by line with no backtracking.

@jgm
Copy link
Owner

jgm commented Jul 19, 2022

The relevant part of the source is in commonmark-extensions, src/Commonmark/Extensions/PipeTable.hs, lines 176-193.

@jgm
Copy link
Owner

jgm commented Jul 19, 2022

You may be able to work around this by moving the pipeTableSpec after defaultSyntaxSpec, i.e.

allTheGfmExtensionsExceptPipeTable <> defaultSyntaxSpec <> pipeTableSpec

@andreasabel
Copy link
Contributor Author

I think what's happening is that the table extension is kicking in, because of the |.
I think the current behavior of this extension is to regress and parse the lines as paragraph content if a table heading line doesn't follow

else
-- last line was first; check for separators
-- and if not found, convert to paragraph:
try (do aligns <- pDividers
guard $ length aligns ==
length (pipeTableHeaders tabledata)
let tabledata' = tabledata{ pipeTableAlignments = aligns }
return $! (pos, Node ndata{
blockLines = []
, blockData = toDyn tabledata'
} children))
<|> (return $! (pos, Node ndata{
blockSpec = paraSpec } children))

@andreasabel
Copy link
Contributor Author

Judging from the examples given with https://github.github.com/gfm/#table, and its present implementation on github.com, a table header needs to be followed by a delimiter row, otherwise it is not a table.

i|am|not|a|table

|i|am|also|not|a|table|

i am a table
i am also a table

In fact, this case may interact badly with the commonmark parser's goal of parsing line by line with no backtracking.

Apparently, you need two lines to start a table. So yes, noble goal, but reality is harsh... ;-)

@andreasabel
Copy link
Contributor Author

You may be able to work around this by moving the pipeTableSpec after defaultSyntaxSpec, i.e.

allTheGfmExtensionsExceptPipeTable <> defaultSyntaxSpec <> pipeTableSpec

Thanks for the hint!

I am a bit hesitant to follow this path, because I cannot judge the consequences. How do I know this isn't a whack-a-mole-game? We haven't secured desired behaviors of our markdown parser instance with a testsuite, so I might break something else.
I'd rather wait for a fix of the table parser...

@jgm
Copy link
Owner

jgm commented Jul 21, 2022

a table header needs to be followed by a delimiter row, otherwise it is not a table.

Yes, and that's what this parser expects. The difficulty is that this parser tries to do line-by-line block parsing with no backtracking, so it can't look ahead to the next line. The current approach is to retroactively change the block from a table to a paragraph if that second line isn't encountered. But that won't work for the kind of example you give. The fix, with the current parsing strategy, isn't obvious, unless it's the thing I mention above. Note that the monoidal composition of syntax specs is order-sensitive. So the problem here is that we're checking for a table before checking for a list item.

@andreasabel
Copy link
Contributor Author

Ok, I try the workaround then!

@andreasabel
Copy link
Contributor Author

The problem with the suggested workaround (pipeTableSpec after defaultSyntaxSpec) is that tables do not parse anymore:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Commonmark
import Commonmark.Extensions

import Data.Text.Lazy.IO as TLIO

main :: IO ()
main = do
  commonmarkWith spec "inline" input >>= \case
    Left e                  -> error (show e)
    Right (html :: Html ()) -> TLIO.putStr $ renderHtml html
  where
  spec = mconcat
    [ emojiSpec
    , strikethroughSpec
    , autolinkSpec
    , autoIdentifiersSpec
    , taskListSpec
    , footnoteSpec
    , defaultSyntaxSpec
    , pipeTableSpec
    ]
  input = table
  table = "|a|table|\n|---|---|"

Gives:

<p>|a|table|
|---|---|</p>

@jgm
Copy link
Owner

jgm commented Jul 22, 2022

OK, back to the drawing board.

Here's another case to keep in mind:

- `a|b`
- | -

This does parse -- as a table. Hm, I wonder how GitHub's parser renders it? Let's try:

  • a|b
  • | -

Let's also try your original case:

  • foo
  • a|b
  • bar

@jgm
Copy link
Owner

jgm commented Jul 22, 2022

Literal pipes in GitHub's pipe tables formerly needed to be escaped, even inside code backticks; let's see if that's still the case:

test table
` `

Answer: yes.

@jgm
Copy link
Owner

jgm commented Jul 22, 2022

Aha. You need a trailing newline in input in your program above. That's why it's not parsing as a table.

@jgm
Copy link
Owner

jgm commented Jul 22, 2022

Original case also works with the workaround; you just need to ensure that all lines are terminated with newline characters.

@jgm
Copy link
Owner

jgm commented Jul 22, 2022

I don't think the workaround will have any bad consequences: it just means that nothing will be interpreted as a table if it can be interpreted as any other kind of block-level element (aside from a paragraph), and I think that's desirable.

We can leave this issue open, though, because this is an awkward workaround and it makes it impossible to use gfmExtensions.

@andreasabel
Copy link
Contributor Author

Thanks for the further research, @jgm. I now applied your workaround to hackage-server:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants