JS: add query js/exploitable-polynomial-redos #2884

esbena · 2020-02-20T11:00:52Z

Regular expressions with superlinear time-complexity on contemporary regular expression engines are the cause of several CVEs. We already flag the exponential cases with js/redos, regardless of how the regular expression is used. This is fine as the exponential case is bad regardless of malicious users.

It is possible to identify a large class of regular expression terms that multiplies the time-complexity of the enclosing regular expression by a linear time factor, so one such term results in quadratic time-complexity, and two such terms results in cubic time-complexity. (The 11 Class A CVEs in https://github.com/github/codeql-javascript-team/issues/63 contain patterns that are flagged by PolynomialBackTrackingTerm!)

These superlinear polynomial cases are extremely common, and mostly benign in practice. So it will be too noisy to flag all of them in general, even though many can easily be rewritten to have a linear time complexity.

To flag only the interesting cases, this PR introduces a taint tracking query that requires remote flow to be matched with the expensive regular expression. Note that the alert location is the expensive regular expression term , and that the path is for the remote flow.

In practice, client-side ReDoS is uninteresting, so the query only considers HTTP::RequestInputAccess as a source. By default, NodeJS servers only allow 8KB of data outside the body of HTTP requests, so most sources in the query will be limited to a length of less than 8000 characters, which in practice translates to roughly 100ms evaluation time for a quadratic regular expression. This is not a lot, so I have set the severity of the query to warning.

There's still room for a few improvement in the query (for instance, handing of negated character classes), but that work starts encroaching on the js/redos query implementation, so that can be done later.

For a sneak-peek at the results, check out the link hidden at https://git.semmle.com/gist/esben/313a7bfdfc6a1f30383e6891a138a2a4.

asgerf

Looks very reasonable. I like the decision to use taint tracking, and using RequestInputSource to restrict to server code.

I think we should remove the word "exploitable" from the name, though, as the exploitability of these results are no different from the other taint queries. "Polynomial ReDoS" sounds fine to me.

asgerf · 2020-02-20T12:04:31Z

javascript/ql/src/semmle/javascript/security/performance/SuperlinearBackTracking.qll

+}
+
+/**
+ * Holds if `t` matches at least an epsilon symbol.


I think this is already available through RegExpTerm.isNullable

Almost, but not exactly. The two ^$ anchors in particular are nullable, but they do not match "at least an epsilon symbol", they match the implicit start and end symbols instead.

this term does not restrict the language of the enclosing regular expression

I think this makes it clear that ^ and $ are not epsilon matchers.

asgerf · 2020-02-20T12:06:37Z

javascript/ql/src/semmle/javascript/security/performance/SuperlinearBackTracking.qll

+}
+
+/**
+ * Gets a term that matches the symbol immediately before `t` is done matching.


Could you rephrase this? I don't understand what this means, and I can't figure it out from the implementation.

I have tried that already several times, I think I stuck in the wrong corner. Let me state the purpose of this predicate, and then you may view this problem from the right angle to help me get a better naming/docstring.

The predicate is used to find a term t1:

t1 consumes the last symbol just before an infinitely repeating term t2.

for example: t1 is a, and t2 is b+ here: /...ab+.../ and here: /...a(foo)?b+.../. (this is what getAMatchPredecessor(this.getPredecessor()) implements)

t1 is infinitely repeating (this is what InfiniteRepetitionQuantifier implements)

t1 can consume at least one of the same symbols as t2 (this is what compatible implements)

Yeah I can see how it's not easy to capture in a short sentence.

It might be enough to just add some examples, like:

For b? in ab?c this gets a and b (in addition to b? itself).

For (ab|cd) this gets b and d (in addition to ab|cd and (ab|cd)).

javascript/ql/src/semmle/javascript/security/performance/SuperlinearBackTracking.qll

esbena · 2020-02-21T08:23:23Z

The performance is again surprisingly good.
The comparison is for js/redos vs js/redos+js/polynomial-redos, so the additional taint-tracking seems to be for free.

asgerf · 2020-02-21T09:58:58Z

The performance is again surprisingly good.

Those wall clock timings are clearly biased in favor of the second run. Could you try running with --dpm?

esbena · 2020-02-21T20:13:21Z

with dpm, now the timing barely favors the run with only js/redos, so it should be good enough to land IMO.

Ping @mchammer01 for docreview.
NB:

the QHelp Preview is broken for this PR
the content in the two <include> qhelp files has simply been moved out of ReDoS.qhelp.

asgerf

LGTM - just waiting for doc review.

mchammer01

@esbena - apologies for the late review but I am currently in the US for a team mini-summit.
Unfortunately the preview failed so I wasn't able to see the example snippets in-situ, nor what the overview or the references look like.
I have made a few comments for your consideration. Feel free to ignore the ones you don't agree with.

javascript/ql/src/Performance/PolynomialReDoS.ql

javascript/ql/src/Performance/PolynomialReDoS.qhelp

esbena · 2020-02-26T10:11:09Z

Thank you @mchammer01. Except for the capitalisation comment, I have addressed all of your comments with minor fixes.

mchammer01

Thanks for the doc updates @esbena - LGTM 👍

erik-krogh · 2020-02-26T15:21:21Z

Doesn't look like the qhelp preview likes your inline regexp:
PolynomialReDoS.qhelp:48:21: The content of elements must consist of well-formed character data or markup.

Line 48: (<code>/^\s+|(?<!\s)\s+$/g</code>), or just by using the built-in trim

asgerf · 2020-02-26T15:22:27Z

I get a build error when building the qhelp offline:

Error on line 48 column 21 of PolynomialReDoS.qhelp:
  SXXP0003: Error reported by XML parser: The content of elements must consist of
  well-formed character data or markup.

It seems to be caused by the < in this regexp:

(<code>/^\s+|(?<!\s)\s+$/g</code>), or just by using the built-in trim

Edit: erik beat me to it

esbena added the JS label Feb 20, 2020

esbena requested a review from a team as a code owner February 20, 2020 11:00

esbena added the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 20, 2020

asgerf reviewed Feb 20, 2020

View reviewed changes

esbena removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 21, 2020

asgerf previously approved these changes Feb 25, 2020

View reviewed changes

mchammer01 reviewed Feb 25, 2020

View reviewed changes

esbena dismissed asgerf’s stale review via 2d871c9 February 26, 2020 10:08

esbena force-pushed the js/practically-exploitable-redos branch from 2d871c9 to 3e714d1 Compare February 26, 2020 10:08

asgerf previously approved these changes Feb 26, 2020

View reviewed changes

mchammer01 previously approved these changes Feb 26, 2020

View reviewed changes

esbena dismissed stale reviews from mchammer01 and asgerf via 0187c73 February 26, 2020 12:55

esbena force-pushed the js/practically-exploitable-redos branch from 3e714d1 to 0187c73 Compare February 26, 2020 12:55

esbena force-pushed the js/practically-exploitable-redos branch from 0187c73 to bc99954 Compare February 26, 2020 21:33

JS: add js/exploitable-polynomial-redos

1b73cee

esbena force-pushed the js/practically-exploitable-redos branch from bc99954 to 1b73cee Compare February 27, 2020 07:43

asgerf approved these changes Feb 27, 2020

View reviewed changes

erik-krogh merged commit 9c06c48 into github:master Feb 27, 2020

JS: add query js/exploitable-polynomial-redos #2884

JS: add query js/exploitable-polynomial-redos #2884

Uh oh!

Conversation

esbena commented Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 20, 2020

Choose a reason for hiding this comment

Uh oh!

esbena Feb 20, 2020

Choose a reason for hiding this comment

Uh oh!

esbena Feb 20, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 20, 2020

Choose a reason for hiding this comment

Uh oh!

esbena Feb 20, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

esbena commented Feb 21, 2020

Uh oh!

asgerf commented Feb 21, 2020

Uh oh!

esbena commented Feb 21, 2020

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

mchammer01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

esbena commented Feb 26, 2020

Uh oh!

mchammer01 left a comment

Choose a reason for hiding this comment

Uh oh!

erik-krogh commented Feb 26, 2020

Uh oh!

asgerf commented Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

esbena commented Feb 20, 2020 •

edited

Loading

asgerf Feb 20, 2020 •

edited

Loading

asgerf commented Feb 26, 2020 •

edited

Loading