Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GFM tables not breaking on block-level structures #1598

Merged
merged 5 commits into from Mar 6, 2020

Conversation

calculuschild
Copy link
Contributor

@calculuschild calculuschild commented Feb 5, 2020

Marked version: 8.0

Markdown flavor: GitHub Flavored Markdown

Description

What was attempted

Edited rules.js to break GFM tables properly using similar syntax used to break paragraphs.

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR);

Committer

In most cases, this should be a different person than the contributor.

  • Draft GitHub release notes have been updated.
  • CI is green (no forced merge required).
  • Merge PR

Includes test cases for all block-level structures and some inline-level structures (which should still appear in the table)
@vercel
Copy link

vercel bot commented Feb 5, 2020

This pull request is being automatically deployed with ZEIT Now (learn more).
To see the status of your deployment, click on the icon next to each commit.

@vercel vercel bot requested a deployment to Preview February 5, 2020 04:53 Abandoned
@vercel
Copy link

vercel bot commented Feb 5, 2020

Deployment failed with the following error:

Configuring `regions` is not necessary for Static Files. Deploying Serverless Functions to multiple regions is restricted to paid plans. Upgrade here: https://zeit.co/teams/markedjs/settings/billing.

src/rules.js Outdated Show resolved Hide resolved
@UziTech
Copy link
Member

UziTech commented Feb 5, 2020

By including it all in the regex I'm a little worried about this introducing a new ReDos vulnerability.

I'll look into this but it might take some time.

/cc @davisjam

@vercel vercel bot requested a deployment to Preview February 5, 2020 15:58 Abandoned
@vercel
Copy link

vercel bot commented Feb 5, 2020

Deployment failed with the following error:

Configuring `regions` is not necessary for Static Files. Deploying Serverless Functions to multiple regions is restricted to paid plans. Upgrade here: https://zeit.co/teams/markedjs/settings/billing.

@calculuschild
Copy link
Contributor Author

calculuschild commented Feb 5, 2020

By including it all in the regex I'm a little worried about this introducing a new ReDos vulnerability.

If it helps, the only change was in the detection of table cells (the third captured regex group) where I replaced

 *[^>\n ]

with

(?!^|>|\\n| |hr|heading|lheading|code|fences|list|html)

and then apply the helper.js "edit" function in identical manner to the paragraph breaking logic earlier in the document. Note that this new bit of code does not apply repetition (“+”, “*”) to a complex subexpression

Given that, if the paragraph code has been deemed safe from REDOS attacks there shouldn't be anything different here except that I also include regex for code and lheading which were not included in the paragraph regex. Of course, I'm not a Regex expert so please do double check.

</tr>
</tbody>
</table>
<h1 id="title">title</h1>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub seems to render this one differently

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... The spec says "The table is broken at the first empty line, or beginning of another block-level structure", so that would seem to break their rule, as the setext headings fall under the "Leaf Block" category.

At that point do you follow the spec, or do you follow GitHub's implementation of the spec? Either way, it's a simple change of just removing the associated lines that check for lheading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, your test seems correct. I also double checked to make sure our GFM spec tests included 201 [extension] Tables and it is passing.

Copy link
Member

@UziTech UziTech Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the spec:

In general, a setext heading need not be preceded or followed by a blank line. However, it cannot interrupt a paragraph, so when a setext heading comes after a paragraph, a blank line is needed between them.

Maybe since it cannot interrupt a paragraph it also shouldn't interrupt a table? I'm leaning toward following the implementation and posing this as an issue to GitHub to see what they have to say. github/cmark-gfm#179

We could leave the regex in and just comment it out with a note so we can change it easily in the future.

Copy link
Member

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks! 🎉

@davisjam
Copy link
Contributor

davisjam commented Feb 10, 2020 via email

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few edits to make the regex faster and less susceptible to a redos attack.

src/rules.js Outdated Show resolved Hide resolved
src/rules.js Outdated Show resolved Hide resolved
@UziTech
Copy link
Member

UziTech commented Feb 10, 2020

@davisjam yes that would be great. I kind of looked it over but I'm sure you are better at finding them.

src/rules.js Outdated Show resolved Hide resolved
Simplify `code` regex

Co-Authored-By: Tony Brix <tony@brix.ninja>
@calculuschild calculuschild mentioned this pull request Feb 10, 2020
4 tasks
Unneeded + on regex heading detection after tables and paragraphs
Fences after tables fixed in line with PR markedjs#1600.
@calculuschild
Copy link
Contributor Author

Is this stable enough for me to look at the regexes for ReDoS?

Has there been any progress on this step? Is there anything I can help with?

@UziTech
Copy link
Member

UziTech commented Mar 2, 2020

@davisjam have you had time to look at these?

@vercel vercel bot requested a deployment to Preview March 3, 2020 14:16 Abandoned
@vercel

This comment has been minimized.

This was referenced Apr 22, 2020
zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this pull request Nov 8, 2021
Fix GFM tables not breaking on block-level structures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Heading immediately following a table is being rendered in the table element
4 participants