Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow processing on malformed markdown (0.6.2) #1493

Closed
bartgrantham opened this issue May 28, 2019 · 5 comments
Closed

Extremely slow processing on malformed markdown (0.6.2) #1493

bartgrantham opened this issue May 28, 2019 · 5 comments

Comments

@bartgrantham
Copy link

@bartgrantham bartgrantham commented May 28, 2019

Describe the bug
Infinite loop Extremely slow processing on malformed markdown

Edit: I tested it under Chrome and it finished significantly faster than Firefox, but still quite slow (7 seconds).

I believe it is a combination of malformed input and an exponential-runtime regexp combining to hang the parser. I tried to strip the markdown to the exact syntax that breaks the parser, but it's difficult to narrow down. The reason I suspect it's a run-away regexp is that removing obvious non-syntax like the strings of plain alphabetic words allows it to finish, but only after 30+ seconds of processing.

(for anyone curious, the markdown is from some old notes I had on the AREXX programming language)

To Reproduce

Steps to reproduce the behavior:

Parse this (be careful):

INDEX(string, pattern[, start)` : searches for the first occurrence of pattern in string, starting from start: `INDEX("123123", "23", 3)` == `5`
`INSERT(new, old[, start][, length][, pad])` : inserts the new string into the old string after the specified position (default is 0), new string is truncated or padded (default is " ") to the specified length, if start is beyond the end of old old will be padded
`LASTPOS(pattern, string[, start])` : searches backwards for the last occurrence of pattern in string, starting from start: `LASTPOS("123123", "23", 4)` == `2`
`LINES(file)` : returns the number of lines typed ahead at the interactive stream: `push("a line"); push("second line"); lines(STDIN); /* == 2 */`
`MAX(number, number[, number,...])` : obvious
`MIN(number, number[, number,...])` : obvious
`OPEN(filehandle, filename[, "APPEND"|"READ"|"WRITE"])` : opens file, returns boolean for success: `OPEN("MyCon", "CON:160/50/320/100/MyCon/CDS")` == `1`
`OVERLAY(new, old[, start][, length][, pad])` : overlays new string onto old one at start for length chars padding with pad if necessary: `OVERLAY("4", "123", 5, 5)` == `"123-4----"`
`POS(pattern, string[, start])` : same as index

Test string at marked.js.org/demo

I'm running 0.6.2 downloaded from jsdelivr (the header says "/npm/marked@0.6.2/lib/marked.js"), however it's apparently also broken in the demo. I'm doing all this under Firefox 67 but it appears to hang Chrome, too.

Expected behavior
Totally screwy marked up text, but not exponential runtime.

@bartgrantham bartgrantham changed the title Infinite loop on malformed markdown (0.6.2) ~Infinite loop~ Extremely slow processing on malformed markdown (0.6.2) May 28, 2019
@bartgrantham bartgrantham changed the title ~Infinite loop~ Extremely slow processing on malformed markdown (0.6.2) ~~Infinite loop~~ Extremely slow processing on malformed markdown (0.6.2) May 28, 2019
@bartgrantham bartgrantham changed the title ~~Infinite loop~~ Extremely slow processing on malformed markdown (0.6.2) Extremely slow processing on malformed markdown (0.6.2) May 28, 2019
@UziTech
Copy link
Member

@UziTech UziTech commented May 29, 2019

looks like this has to do with marked checking for valid code blocks.

If you start deleting `'s the time will drop dramatically.

@Feder1co5oave
Copy link
Contributor

@Feder1co5oave Feder1co5oave commented May 30, 2019

This ReDOS was introduced in 47365c1, and is caused in fact by the link _label subrule:

inline._label = /(?:\[[^\[\]]*\]|\\[\[\]]?|`[^`]*`|`(?!`)|[^\[\]\\`])*?/;

The two branches allowing backticks:

  • `[^`]*`
  • `(?!`)
    are not mutually exclusive and cause exponential backoff with pathological inputs.

The solution would be to actually try to parse a code span whenever a ` is met inside a link text.

@davisjam
Copy link
Contributor

@davisjam davisjam commented Jul 2, 2019

For what it's worth, this patch causes introduce 4 test failures, and only affects fairly obscure situations (namely, a single unmatched backtick).

-inline._label = /(?:\[[^\[\]]*\]|\\[\[\]]?|`[^`]*`|`(?!`)|[^\[\]\\`])*?/;
+inline._label = /(?:\[[^\[\]]*\]|\\[\[\]]?|`[^`]*`|[^\[\]\\`])*?/;

Thoughts on the performance-correctness tradeoff?

@UziTech
Copy link
Member

@UziTech UziTech commented Jul 2, 2019

I'm ok with losing a few tests for security.

@UziTech UziTech mentioned this issue Jul 2, 2019
4 tasks
@UziTech
Copy link
Member

@UziTech UziTech commented Jul 2, 2019

I created PR #1515 with a patch for this vulnerability.

@UziTech UziTech closed this in #1515 Jul 4, 2019
@UziTech UziTech mentioned this issue Jul 5, 2019
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants