-
Notifications
You must be signed in to change notification settings - Fork 1.8k
JS: add query js/exploitable-polynomial-redos #2884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS: add query js/exploitable-polynomial-redos #2884
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very reasonable. I like the decision to use taint tracking, and using RequestInputSource
to restrict to server code.
I think we should remove the word "exploitable" from the name, though, as the exploitability of these results are no different from the other taint queries. "Polynomial ReDoS" sounds fine to me.
} | ||
|
||
/** | ||
* Holds if `t` matches at least an epsilon symbol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already available through RegExpTerm.isNullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost, but not exactly. The two ^$
anchors in particular are nullable, but they do not match "at least an epsilon symbol", they match the implicit start and end symbols instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this term does not restrict the language of the enclosing regular expression
I think this makes it clear that ^
and $
are not epsilon matchers.
} | ||
|
||
/** | ||
* Gets a term that matches the symbol immediately before `t` is done matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rephrase this? I don't understand what this means, and I can't figure it out from the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried that already several times, I think I stuck in the wrong corner. Let me state the purpose of this predicate, and then you may view this problem from the right angle to help me get a better naming/docstring.
The predicate is used to find a term t1
:
t1
consumes the last symbol just before an infinitely repeating termt2
.- for example:
t1
isa
, andt2
isb+
here:/...ab+.../
and here:/...a(foo)?b+.../
. (this is whatgetAMatchPredecessor(this.getPredecessor())
implements)
- for example:
t1
is infinitely repeating (this is whatInfiniteRepetitionQuantifier
implements)t1
can consume at least one of the same symbols ast2
(this is whatcompatible
implements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can see how it's not easy to capture in a short sentence.
It might be enough to just add some examples, like:
- For
b?
inab?c
this getsa
andb
(in addition tob?
itself). - For
(ab|cd)
this getsb
andd
(in addition toab|cd
and(ab|cd)
).
javascript/ql/src/semmle/javascript/security/performance/SuperlinearBackTracking.qll
Outdated
Show resolved
Hide resolved
The performance is again surprisingly good. |
Those wall clock timings are clearly biased in favor of the second run. Could you try running with |
with dpm, now the timing barely favors the run with only Ping @mchammer01 for docreview.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just waiting for doc review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esbena - apologies for the late review but I am currently in the US for a team mini-summit.
Unfortunately the preview failed so I wasn't able to see the example snippets in-situ, nor what the overview or the references look like.
I have made a few comments for your consideration. Feel free to ignore the ones you don't agree with.
2d871c9
to
3e714d1
Compare
Thank you @mchammer01. Except for the capitalisation comment, I have addressed all of your comments with minor fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc updates @esbena - LGTM 👍
3e714d1
to
0187c73
Compare
Doesn't look like the qhelp preview likes your inline regexp: Line 48: |
I get a build error when building the qhelp offline:
It seems to be caused by the
Edit: erik beat me to it |
0187c73
to
bc99954
Compare
bc99954
to
1b73cee
Compare
Regular expressions with superlinear time-complexity on contemporary regular expression engines are the cause of several CVEs. We already flag the exponential cases with
js/redos
, regardless of how the regular expression is used. This is fine as the exponential case is bad regardless of malicious users.It is possible to identify a large class of regular expression terms that multiplies the time-complexity of the enclosing regular expression by a linear time factor, so one such term results in quadratic time-complexity, and two such terms results in cubic time-complexity. (The 11
Class A
CVEs in https://github.com/github/codeql-javascript-team/issues/63 contain patterns that are flagged byPolynomialBackTrackingTerm
!)These superlinear polynomial cases are extremely common, and mostly benign in practice. So it will be too noisy to flag all of them in general, even though many can easily be rewritten to have a linear time complexity.
To flag only the interesting cases, this PR introduces a taint tracking query that requires remote flow to be matched with the expensive regular expression. Note that the alert location is the expensive regular expression term , and that the path is for the remote flow.
In practice, client-side ReDoS is uninteresting, so the query only considers
HTTP::RequestInputAccess
as a source. By default, NodeJS servers only allow 8KB of data outside the body of HTTP requests, so most sources in the query will be limited to a length of less than 8000 characters, which in practice translates to roughly 100ms evaluation time for a quadratic regular expression. This is not a lot, so I have set the severity of the query towarning
.There's still room for a few improvement in the query (for instance, handing of negated character classes), but that work starts encroaching on the
js/redos
query implementation, so that can be done later.For a sneak-peek at the results, check out the link hidden at https://git.semmle.com/gist/esben/313a7bfdfc6a1f30383e6891a138a2a4.