Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex backtracking issues #1312

Closed
joeyparrish opened this issue Feb 20, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@joeyparrish
Copy link
Member

commented Feb 20, 2018

@davisjam reported some potential regex backtracking vulnerabilities to us via email. In such a vulnerability, extremely long inputs could cause a regex to block for a very long time while parsing.

We believe there is no significant risk to these particular issues. Four of six of them are in the jsdoc template, and therefore do not affect Shaka Player itself. The other two are in the TTML text parser.

Application developers generally have some control or trust in their content catalogs and are not subject to malicious TTML content. Such content, if encountered, would only cause individual browser tabs to lock up. Shaka Player does not run in nodejs or other such environments, and we do not expect this could be used for any kind of DOS attack.

The affected regex should be refactored to avoid this. @davisjam recommends these tools to assess progress:

Here are the details of the reported vulnerabilities:

Vuln 1:

  • file: "docs/jsdoc-template/static/scripts/prettify/lang-css.js"
  • pattern: "^(-?(?:[_a-z]|\\[\da-f]+ ?)(?:[\w-]|\\\\[\da-f]+ ?))\s:"

Vuln 2:

  • file: "docs/jsdoc-template/static/scripts/prettify/lang-css.js"
  • pattern: "^"(?:[^\n\f\r"\\]|\\(?:\r\n?|\n|\f)|\\[\S\s])*""

Vuln 3:

  • file: "docs/jsdoc-template/static/scripts/prettify/lang-css.js"
  • pattern: "^'(?:[^\n\f\r'\\]|\\(?:\r\n?|\n|\f)|\\[\S\s])*'"

Vuln 4:

  • file: "docs/jsdoc-template/static/scripts/prettify/prettify.js"
  • pattern: "^<(?:(?:(?:\.\.\/)|\/?)(?:[\w-]+(?:\/[\w-]+)+)?[\w-]+\.h|[a-z]\w)>"

Vuln 5:

  • file: "lib/text/ttml_text_parser.js"
  • pattern: "^(\d*\.?\d*)t$",

Vuln 6:

  • file: "lib/text/ttml_text_parser.js"
  • pattern: "^(?:(\d*\.?\d*)h)?(?:(\d*\.?\d*)m)?(?:(\d*\.?\d*)s)?(?:(\d*\.?\d*)ms)?$",

@joeyparrish joeyparrish added the bug label Feb 20, 2018

@joeyparrish joeyparrish added this to the Backlog milestone Feb 20, 2018

@TheModMaker

This comment has been minimized.

Copy link
Member

commented Feb 21, 2018

For those unfamiliar with catastrophic backtracking, here is a short explanation.

Consider the following regex: /a*a*b/

So if you have an input string of aaax, it obviously doesn't match. First it tries to match the first a* with the longest string of a's (since it's greedy). The 'x' doesn't match 'a', so it tries the next regex part; the second a* is optional, so it tries the next regex part; the 'x' doesn't match the expected 'b' so it fails to match.

But the * matches any number of something, so what if we chose less of them? So the regex engine backs up and selects one fewer 'a' from the first a* and moves forward. This time, the remaining string is 'ax', so the second a* can match the 'a'. But the 'x' still doesn't match 'b'.

But what if the second a* didn't use all the a's? See the problem? Here is a list of all the matches the regex engine will try on the above string:

a* a* b
aaa x
aa a x
aa ax
a aa x
a a ax
a aax
aaa x
aa ax
a aax
aaax

You see there are 10 attempts to match the given string, which is really just a series of a's. The regex /a*b/ will only make 4 matches as it tries less values for a*. This can become exponentially worse as there are more characters in the input string.

In the TTML parser we use the regex /\d*\.?\d*/ to match numbers. Since the . is optional, it is similar to the above case.

shaka-bot pushed a commit that referenced this issue Feb 21, 2018

Fix catastrophic backtracking in TTML text parser.
Issue #1312

Change-Id: I0aed14068776a800eee35f03d2f878db0dd565b6

joeyparrish added a commit that referenced this issue Feb 21, 2018

Fix catastrophic backtracking in TTML text parser.
Issue #1312

Change-Id: I0aed14068776a800eee35f03d2f878db0dd565b6
@joeyparrish

This comment has been minimized.

Copy link
Member Author

commented Feb 21, 2018

TTML parser fix cherry-picked for v2.3.3. The remaining issues are in the documentation template, not Shaka Player.

@joeyparrish

This comment has been minimized.

Copy link
Member Author

commented Feb 27, 2018

The vulnerable expressions in jsdoc are all from the "prettify" module, which enables syntax highlighting for CSS. https://github.com/jsdoc3/jsdoc/tree/master/templates/default/static/scripts/prettify

That code seems to have been forked in 2012 and never updated. The original code has been updated a few times since then: https://github.com/google/code-prettify/commits/master/src/lang-css.js

I will put together a bug report against jsdoc for this, and a pull request to update, if I can.

@davisjam

This comment has been minimized.

Copy link

commented Feb 27, 2018

Sounds good. Link to the bug report and I can also take a peek at a fix.

@davisjam

This comment has been minimized.

Copy link

commented Feb 28, 2018

@joeyparrish I have contacted the code-prettify devs by email. I CCd you in case you want to weigh in or point out the prettify fork.

@ismena

This comment has been minimized.

Copy link
Member

commented Mar 21, 2018

@joeyparrish Is there something else we should with regards to this issue or are we good to close it?

@joeyparrish

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2018

We have fixed the regex issues that affect our library, but there are others in our jsdoc template. We're leaving it open until we fix those, too.

Our template is forked from the jsdoc default template, and the relevant code in jsdoc was forked from prettify (and never updated). If we do fix these issues in the template ourselves before jsdoc and prettify do, we can upstream the fix.

@shaka-bot shaka-bot closed this in d16e1a4 May 15, 2019

@joeyparrish joeyparrish modified the milestones: Backlog, v2.6 May 16, 2019

@shaka-bot shaka-bot added the archived label Jul 14, 2019

@google google locked and limited conversation to collaborators Jul 14, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.