Skip to content

Conversation

esbena
Copy link
Contributor

@esbena esbena commented Feb 20, 2020

Regular expressions with superlinear time-complexity on contemporary regular expression engines are the cause of several CVEs. We already flag the exponential cases with js/redos, regardless of how the regular expression is used. This is fine as the exponential case is bad regardless of malicious users.

It is possible to identify a large class of regular expression terms that multiplies the time-complexity of the enclosing regular expression by a linear time factor, so one such term results in quadratic time-complexity, and two such terms results in cubic time-complexity. (The 11 Class A CVEs in https://github.com/github/codeql-javascript-team/issues/63 contain patterns that are flagged by PolynomialBackTrackingTerm!)

These superlinear polynomial cases are extremely common, and mostly benign in practice. So it will be too noisy to flag all of them in general, even though many can easily be rewritten to have a linear time complexity.

To flag only the interesting cases, this PR introduces a taint tracking query that requires remote flow to be matched with the expensive regular expression. Note that the alert location is the expensive regular expression term , and that the path is for the remote flow.

In practice, client-side ReDoS is uninteresting, so the query only considers HTTP::RequestInputAccess as a source. By default, NodeJS servers only allow 8KB of data outside the body of HTTP requests, so most sources in the query will be limited to a length of less than 8000 characters, which in practice translates to roughly 100ms evaluation time for a quadratic regular expression. This is not a lot, so I have set the severity of the query to warning.

There's still room for a few improvement in the query (for instance, handing of negated character classes), but that work starts encroaching on the js/redos query implementation, so that can be done later.

For a sneak-peek at the results, check out the link hidden at https://git.semmle.com/gist/esben/313a7bfdfc6a1f30383e6891a138a2a4.

@esbena esbena added the JS label Feb 20, 2020
@esbena esbena requested a review from a team as a code owner February 20, 2020 11:00
@esbena esbena added the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 20, 2020
Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very reasonable. I like the decision to use taint tracking, and using RequestInputSource to restrict to server code.

I think we should remove the word "exploitable" from the name, though, as the exploitability of these results are no different from the other taint queries. "Polynomial ReDoS" sounds fine to me.

}

/**
* Holds if `t` matches at least an epsilon symbol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already available through RegExpTerm.isNullable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost, but not exactly. The two ^$ anchors in particular are nullable, but they do not match "at least an epsilon symbol", they match the implicit start and end symbols instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this term does not restrict the language of the enclosing regular expression

I think this makes it clear that ^ and $ are not epsilon matchers.

}

/**
* Gets a term that matches the symbol immediately before `t` is done matching.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rephrase this? I don't understand what this means, and I can't figure it out from the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried that already several times, I think I stuck in the wrong corner. Let me state the purpose of this predicate, and then you may view this problem from the right angle to help me get a better naming/docstring.

The predicate is used to find a term t1:

  1. t1 consumes the last symbol just before an infinitely repeating term t2.
    • for example: t1 is a, and t2 is b+ here: /...ab+.../ and here: /...a(foo)?b+.../. (this is what getAMatchPredecessor(this.getPredecessor()) implements)
  2. t1 is infinitely repeating (this is what InfiniteRepetitionQuantifier implements)
  3. t1 can consume at least one of the same symbols as t2 (this is what compatible implements)

Copy link
Contributor

@asgerf asgerf Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can see how it's not easy to capture in a short sentence.

It might be enough to just add some examples, like:

  • For b? in ab?c this gets a and b (in addition to b? itself).
  • For (ab|cd) this gets b and d (in addition to ab|cd and (ab|cd)).

@esbena
Copy link
Contributor Author

esbena commented Feb 21, 2020

The performance is again surprisingly good.
The comparison is for js/redos vs js/redos+js/polynomial-redos, so the additional taint-tracking seems to be for free.

@asgerf
Copy link
Contributor

asgerf commented Feb 21, 2020

The performance is again surprisingly good.

Those wall clock timings are clearly biased in favor of the second run. Could you try running with --dpm?

@esbena
Copy link
Contributor Author

esbena commented Feb 21, 2020

with dpm, now the timing barely favors the run with only js/redos, so it should be good enough to land IMO.

Ping @mchammer01 for docreview.
NB:

  • the QHelp Preview is broken for this PR
  • the content in the two <include> qhelp files has simply been moved out of ReDoS.qhelp.

@esbena esbena removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 21, 2020
asgerf
asgerf previously approved these changes Feb 25, 2020
Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just waiting for doc review.

Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esbena - apologies for the late review but I am currently in the US for a team mini-summit.
Unfortunately the preview failed so I wasn't able to see the example snippets in-situ, nor what the overview or the references look like.
I have made a few comments for your consideration. Feel free to ignore the ones you don't agree with.

@esbena esbena force-pushed the js/practically-exploitable-redos branch from 2d871c9 to 3e714d1 Compare February 26, 2020 10:08
@esbena
Copy link
Contributor Author

esbena commented Feb 26, 2020

Thank you @mchammer01. Except for the capitalisation comment, I have addressed all of your comments with minor fixes.

asgerf
asgerf previously approved these changes Feb 26, 2020
mchammer01
mchammer01 previously approved these changes Feb 26, 2020
Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc updates @esbena - LGTM 👍

@esbena esbena dismissed stale reviews from mchammer01 and asgerf via 0187c73 February 26, 2020 12:55
@esbena esbena force-pushed the js/practically-exploitable-redos branch from 3e714d1 to 0187c73 Compare February 26, 2020 12:55
@erik-krogh
Copy link
Contributor

Doesn't look like the qhelp preview likes your inline regexp:
PolynomialReDoS.qhelp:48:21: The content of elements must consist of well-formed character data or markup.

Line 48: (<code>/^\s+|(?<!\s)\s+$/g</code>), or just by using the built-in trim

@asgerf
Copy link
Contributor

asgerf commented Feb 26, 2020

I get a build error when building the qhelp offline:

Error on line 48 column 21 of PolynomialReDoS.qhelp:
  SXXP0003: Error reported by XML parser: The content of elements must consist of
  well-formed character data or markup.

It seems to be caused by the < in this regexp:

(<code>/^\s+|(?<!\s)\s+$/g</code>), or just by using the built-in trim

Edit: erik beat me to it

@esbena esbena force-pushed the js/practically-exploitable-redos branch from 0187c73 to bc99954 Compare February 26, 2020 21:33
@esbena esbena force-pushed the js/practically-exploitable-redos branch from bc99954 to 1b73cee Compare February 27, 2020 07:43
@erik-krogh erik-krogh merged commit 9c06c48 into github:master Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants