-
Notifications
You must be signed in to change notification settings - Fork 1.8k
JS: Make ReDoS.ql understand lots of character classes #4585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5de12c7
to
ddcffc2
Compare
54229c0
to
ae981dc
Compare
Could these range-reasoning improvements be used in codeql/javascript/ql/src/semmle/javascript/security/performance/SuperlinearBackTracking.qll Lines 98 to 110 in 9249444
|
08dffe7
to
59872d4
Compare
0c9115e
to
1f5d7be
Compare
…n more char classes
…rt for RegExpCharacterClassEscape
d5cbb67
to
3ef5d89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive results! 👏
Thanks for breaking it up in mostly independent commits. It made it much easier to review, although I ended up deleting most of my comments again as they became obsolete.
The first commit removes a redundant check that had no influence on the final result of the query.
I'm not convinced that this was strictly true in the first commit, as intersect
was heuristic and two compatible terms may have come up without an intersection. But with the more powerful intersect
predicate in later commits I agree the check becomes unnecessary.
@@ -447,7 +371,7 @@ predicate isFork(State q, InputSymbol s1, InputSymbol s2, State r1, State r2) { | |||
// Use pragma[noopt] to prevent compatible(s1,s2) from being the starting point of the join. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated comment refers to the compatible predicate
@@ -94,8 +94,17 @@ var good9 = '(a|aa?)*b'; | |||
// NOT GOOD | |||
var bad18 = /(([^]|[^a])*)"/; | |||
|
|||
// NOT GOOD | |||
// NOT GOOD - but not flagged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we lost this result. The change in this commit looks like a strict improvement.
I understand we win it back in a later commit, but I'd still like to understand why we lost it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'd still like to understand why we lost it.
So would I.
I never figured out why it was happening.
// TODO: also supposed to match \f and vertical tab (\x0B). | ||
char = [" ", "\t", "\r", "\n"] | ||
( | ||
char = [" ", "\t", "\r", "\n", "\\u000c", "\\u000b"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the trick below (which lgtm), do we gain anything from having the strings \u000c
and \u00b
in this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't.
It was a mistake from my end to include those entries in that list.
@@ -777,6 +783,7 @@ predicate isPumpable(State fork, string w) { | |||
*/ | |||
State process(State fork, string w, int i) { | |||
isPumpable(fork, w) and | |||
min(string s | isPumpable(fork, s)).prefix(w.length()) = w and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with a comment. So we're only processing strings that are prefixes of the minimum pumpable string. Why is that? And how is this different from requiring that w
is the minimum pumpable string? (given that we already use i
to address a specific prefix of w
we're not a risk of losing the base case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one got removed later, exactly because of the reasons you mention. (See the change in the select
, where I moved some not
out of the min(..)
).
We did in fact only look at the minimum pumpable string, so we did not loose any results when this change was made.
* Gets any char that could possibly be matched by a regular expression. | ||
* Includes all printable ascii chars, all constants mentioned in a regexp, and all chars matches by the regexp `/\s|\d|\w/`. | ||
*/ | ||
private string getAnyPossiblyMatchedChar() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly true as the internal values of long non-ascii intervals aren't included (nor should they be).
Maybe just getARelevantChar
?
@@ -873,7 +878,7 @@ from RegExpTerm t, string c, int i | |||
where | |||
c = min(string w | isPumpable(Match(t, i), w)) and | |||
not isPumpable(epsilonSucc+(Match(t, i)), _) and | |||
not epsilonSucc*(process(Match(t, i), c, c.length() - 1)) = Accept(_) | |||
not epsilonSucc*(process(Match(t, i), c, [0 .. c.length() - 1])) = Accept(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could [0 .. c.length() - 1]
simply be replaced with _
here?
Also I'm not sure I understand the change. The condition is that it that there must exist a suffix x
that causes the string to be rejected after matching many repetitions of w
. The commit message reads:
remove false positives where the analysis would wrongly conclude that the accept state could not be reached
The analysis isn't trying to prove that the accept state can't be reached. It's trying to prove that for some input, an accept state can't be reached. Unfortunately this is hard in an NFA due to non-determinism. (Simply checking if a rejecting state is reachable doesn't work, as for a given input string there may be another choice of edges that leads to an accept state.)
Originally we just checked that the empty string is rejected after pumping. That guarantees that such a suffix exists (the empty string fits the bill), so the check can't cause FPs but may have FNs.
With this change, we're now checking multiple strings (all the prefixes of w
) and saying that they must all be rejected. That's not right. We only need one such rejecting suffix. It would be sufficient to check that one of them is rejected. The check is now more causing FNs than before, and it didn't have any FPs to begin with.
The commit message talks about fixing FPs but modifies a check that couldn't cause FPs. I'd therefore argue that the root cause for these FPs is to be found elsewhere, and this change is hiding them for the wrong reason (due to its increased FN rate).
I think we should try to hunt down the remaining FPs (see the other comment regarding anchors) and then revisit this and see if it can be reverted once the root cause is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could
[0 .. c.length() - 1]
simply be replaced with_
here?
Yes.
and it didn't have any FPs to begin with.
It did have FPs.
And I don't think we got any FN's due to this change (We lost results in the test case, but those were FPs, even though some were marked as TPs).
(I might not have worded the commit message in the best way).
What I'm doing is checking if the accepting state can be reached while processing the witness string (repeatedly).
Consider this regexp: /(A*A*X)*/
.
If we change [0 .. c.length() - 1]
back to c.length() - 1
, then the analysis concludes that it can have exponential backtracking with the witness string AXA
.
If the witness string is processed, then we end up in the state that is about to match X
, which is not an accepting state.
But while processing AXA
we "go through" an accepting state.
The JavaScript regular expression evaluator appears to be greedy, because it will not do catastrophic backtracking if an accepting state has been reached at some point.
I cannot construct a string that will cause the above regexp to perform badly.
(I can get quadratic runtime if the g
flag is set).
If we instead consider the regexp /(A*A*X)*Y/
or /(A*A*X)*$/
then it is trivial to construct a badly performing string.
I don't think I've missed anything, but I'm happy to be proved wrong if you can construct a string where performance is exponential in the length when matched against /(A*A*X)*/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change [0 .. c.length() - 1] back to c.length() - 1, then the analysis concludes that it can have exponential backtracking with the witness string AXA.
Yes, but I didn't think this check was the root cause of that FP. I saw the check and thought it was suspicious that it didn't match the theory laid out at the top of the query. It wasn't obvious why passing through an accept state mattered here. In such cases, it often pays off to get to the bottom of it, even if it's tempting to just use what appears to work and move on.
After giving it some more thought, I believe it has to do with how we model anchors (or lack of anchors). There's an ambiguity around what the accept state means. Because of our loose handling of anchors, being in the accept state could mean one of two things:
- I accept if the string ends here (the text book interpretation of the accept state), or
- I can just stop matching here if I want (the unanchored regexp interpretation).
The old check made sense assuming (1), but given how we map JS regexps to an NFA, (2) is closer to reality.
If we were to introduce proper handling of anchoring, I think the old check would need to be reinstated, but with the current NFA construction, we need to emulate the (2) interpretation. I see that the modified check here was a way to do that, and I'm happy if you want to leave it at that (but please explain in a comment). A more direct way to do the same might be to add an Any()
transition from the accept state to itself.
As an aside, we could also try to implement better anchor support. That might be well out of scope for this PR, but just a few thoughts:
We could introduce a new state MkAcceptAnySuffix()
which accepts an arbitrary suffix from that point on. It can be modelled via an epsilon transition to MkAccept()
and an Any()
edge to itself.
With that state in hand, we have a way to model anchors:
- At a
$
anchor, add transition toMkAccept()
(which only succeeds if the string ends here and now) - At the end of the regexp, transition to
MkAcceptAnySuffix()
.
We might lose some TPs though, since only testing the empty string probably won't be enough anymore. Maybe the empty string check only works so well because $-anchored regexps have an unreachable accept state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great writeup ❤️
Yes, I had to some extent overlooked how the accept state works compared to the theory.
I see that the modified check here was a way to do that, and I'm happy if you want to leave it at that (but please explain in a comment)
I think we should keep the implementation as is for now, and I'll put in a comment at the process()
predicate about which assumptions is makes about the accept state.
I'll look into anchors again sometime after merging this PR.
Note: I'm somewhat sure we don't need to model the start anchor ^
.
Co-authored-by: Asger F <asgerf@github.com>
… not matched by the char class
b710752
to
d7404b9
Compare
A full run on LGTM revealed 12 projects that failed for some reason. Our search-space would sometimes blow up with infeasible state-pairs, when there were cycles where the minimum length was long. /\s{0,3}(\s*<em class="gbifHl">\s*<\/em>\s*)+([^\s<]+\s){0,2}([^\s<]*)/ Limiting the search-space to only the state-pairs - where it is feasible to reach the "goal" within the given length - solved the issue. |
… the given length
d7404b9
to
55f2f86
Compare
If you're looking for more ways to prune, perhaps we could disregard roots whose only repetition is the root itself, or is at the end of a sequence at the root. In this case, it can't possibly reject after matching (in terms of our NFA, it will always pass through the accept state when looping around). |
…out which kind of accepting state is assumed.
I am. But after that last I've added your suggestion to things I'll look at for that followup. |
Co-authored-by: Asger F <asgerf@github.com>
A large came back fine. |
I got prefix/suffix construction (and another round of performance improvements) ready on another branch. I can add them here, or wait and put it in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems ready to land, so let's merge this first and do the other improvements in a new PR.
Gets a TP/TN for CVE-2020-7662 and CVE-2018-3738.
Adds support for character classes in ReDoS (the existing support was very basic).
We can now flag a regexp like:
/((\Sx|[\da-z])*)"/
, which will do exponential backtracking on many repetitions of a string like"9x"
.The first commit removes a redundant check that had no influence on the final result of the query.
(Removing that check had no effect on test results or performance).
Also does some performance improvements along the way.
Without the performance improvements in the query would perform at least 15X worse on our test case.
Some of the commits have
// TODO:
in them, that is on purpose.(The final commit should have none).
Each commit should be understandable on its own, and going through them in order should help understand the implementation.
Note: some of the code is refactored in later commits.
An evaluation looks good in terms of performance.
The total time taken is pretty close to being exactly the same, with some benchmarks being slower and some faster.
This commit made performance twice as bad on our test case, but apparently it is not bad in the real world.
There are many new results - overwhelmingly TPs.
And I haven't found a lost TP (some of them are still reported with a different witness string).