Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem rendering pipe characters in code blocks within tables #85

Closed
huettenhain opened this issue Jun 10, 2019 · 9 comments
Closed

Problem rendering pipe characters in code blocks within tables #85

huettenhain opened this issue Jun 10, 2019 · 9 comments

Comments

@huettenhain
Copy link
Contributor

Hello, thank you heartily for this great library. It took me quite few months to encounter a bug, which I am reporting here.

Mistletoe seems unable to render pipe characters in code blocks within tables. Here is an example of observed behavior:

mistletoe [version 0.7.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> | Table | Header |
... |---    |---     |
... | `<|>` | `<|>`  |
...
... ^Z

<table>
<thead>
<tr>
<th align="left">Table</th>
<th align="left">Header</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">`&lt;</td>
<td align="left">&gt;`</td>
<td align="left">`&lt;</td>
<td align="left">&gt;`</td>
</tr>
</tbody>
</table>
>>>

The following was my expected rendering:

Table Header
<|> <|>

Interestingly, this behavior can not be expected in GFM, which requires escapes for pipes: github/markup#1078

However, escaping pipes is not working in mistletoe:

>>> | Table | Header |
... |---    |---     |
... |`<\|>` | `<\|>` |
... ^Z

<table>
<thead>
<tr>
<th align="left">Table</th>
<th align="left">Header</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">`&lt;\</td>
<td align="left">&gt;`</td>
<td align="left">`&lt;\</td>
<td align="left">&gt;`</td>
</tr>
</tbody>
</table>
@pbodnar
Copy link
Collaborator

pbodnar commented Sep 24, 2021

@huettenhain, thanks for the report. It looks like the GFM's approach seems to be even somewhat in-line with the CommonMark specification (https://spec.commonmark.org/0.30/#precedence). So I would go their way, i. e. to enable to escape | whenever it is inside a cell. I cannot quite imagine any other solution. That will mean the following changes in the code:

  1. Ignore escaped pipes when splitting a table row to cells. (This is quite easy to do, it's quite common in regex patterns used in this project.)
  2. Replace all escaped pipes with just pipes before parsing each cell's content. These escapes can be anywhere, not just within a code block. (I will have to get into the parsing logic a little bit more in here.)

What do you think? It would be great to have some feedback before starting with the implementation.

@huettenhain
Copy link
Contributor Author

That seems perfectly reasonable to me, and I should add that I am very happy and grateful to see this nice library brought to life again.

@pbodnar
Copy link
Collaborator

pbodnar commented Sep 26, 2021

That seems perfectly reasonable to me,

OK, thank you for your feedback, I think I will look at this next week then.

I should add that I am very happy and grateful to see this nice library brought to life again.

Yes, I was given write permissions by @miyuchina who seems to like my contributions, so hopefully I can bring the library a little bit further, even though it may have lost some users in the previous years. I take it as a challenge. :)

@pbodnar
Copy link
Collaborator

pbodnar commented Oct 2, 2021

Just looking at this. I have some interesting findings:

  1. According to https://www.markdownguide.org/extended-syntax, one can escape pipe by using its character reference (&#124;).
  2. Yet, according to https://github.github.com/gfm/#entity-and-numeric-character-references, this is not possible within code blocks and code spans, because character references are not recognized there.

So no change in the direction of this issue actually. I'm only re-classifying it: it is probably not a bug, but rather an enhancement.

@pbodnar pbodnar added enhancement and removed bug labels Oct 2, 2021
@pbodnar
Copy link
Collaborator

pbodnar commented Oct 2, 2021

Another finding: It looks like even GitHub is not handling this edge case:

|a|b|
|-|-|
|ending with slash: \\|other cell|

This results in just one cell in the row:

a b
ending with slash: |other cell

Of course, users will hopefully never do something like that. So I wouldn't implement handling of this edge case now. It would probably require using something more complicated than a simple row-string split (which just checks for any \ not being present right in front of splitting |, because a more precise regex look-behind cannot be used).

@pbodnar
Copy link
Collaborator

pbodnar commented Oct 2, 2021

@huettenhain, so I prepared the implementation in a dedicated branch. Can you please check it out, together with my notes above?

@pbodnar
Copy link
Collaborator

pbodnar commented Oct 9, 2021

close: I merged the branch, enjoy and feel free to reopen in case of any problems.

@pbodnar pbodnar closed this as completed Oct 9, 2021
@huettenhain
Copy link
Contributor Author

Ah, I am so sorry, I just couldn't get to this. I think this is a good way to proceed, I will let you know if any problems arise.

@pbodnar
Copy link
Collaborator

pbodnar commented Oct 9, 2021

No problem, thank you. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants