-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Refactorizations of the ReDoS libraries #9422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ping @ginsbach : Parameterised modules in here. |
Thanks for pinging. The use of parameterised modules here looks good to me, no concerns! |
I've rebased on the latest I've also re-enabled the QL-for-QL consistency- and dead-code-query in this PR. |
…ttern for js/polynomial-redos
I did a merge with @github/codeql-python can I get a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really love this work (although it could arguably have been in several PRs).
The requested changes are all minor.
*/ | ||
pragma[noinline] | ||
private predicate isLikelyRejectable(State s) { | ||
s = stateInPumpableRegexp() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we cannot define classes inside parameterised modules yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* Holds if repetitions of `pump` at `t` will cause polynomial backtracking. | ||
*/ | ||
predicate polynimalReDoS(RegExpTerm t, string pump, string prefixMsg, RegExpTerm prev) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate should probably be renamed to polynomialReDoS
everywhere. Although, while the code looks new here, I think this name was not introduced in this PR.
/** DEPRECATED: Alias for HtmlMatchingRegExp */ | ||
deprecated class HTMLMatchingRegExp = HtmlMatchingRegExp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK to delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'll reintroduce it.
It's in a Query.qll
so it's likely fine, but I'll get it back anyway.
( | ||
test(reg, str, ignorePrefix) | ||
or | ||
testWithGroups(reg, str, ignorePrefix) | ||
) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( | |
test(reg, str, ignorePrefix) | |
or | |
testWithGroups(reg, str, ignorePrefix) | |
) and | |
isCandidate(reg, str, ignorePrefix, _) | |
and |
* A module for determining if a regexp matches a given string, | ||
* and reasoning about which capture groups are filled by a given string. | ||
* | ||
* The module parameter `isRegexpMatchingCandidateSig` determines which string should be tested, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The module parameter `isRegexpMatchingCandidateSig` determines which string should be tested, | |
* The module parameter `isCandidate` determines which strings should be tested, |
} | ||
|
||
/** | ||
* Holds if `reg` matches `str`, where `str` is either in the `test` or `testWithGroups` predicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Holds if `reg` matches `str`, where `str` is either in the `test` or `testWithGroups` predicate. | |
* Holds if `reg` matches `str`, where `str` is in the `isCandidate` predicate. |
private predicate isRelevant(Node n) { | ||
isARelevantEnd(n) | ||
or | ||
exists(Node prev | isRelevant(prev) | n = getPrev(prev)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exists(Node prev | isRelevant(prev) | n = getPrev(prev)) | |
exists(Node succ | isRelevant(succ) | n = getPrev(succ)) |
/** Gets an ancestor of `end`, where `end` is a node that should have a result in `concretize`. */ | ||
private Node getANodeInLongChain(Node end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be called getAnAncestor
? And could it be written via prev*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it can be written like that, I don't remember if I wrote the recursion manually for a reason 🤷
I'll do another evaluation to check.
/** Gets the `i`th character on the path from the root to `n`. */ | ||
pragma[noinline] | ||
private string getPrefixChar(Node n, int i) { | ||
exists(Node prev | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename prev
to ancestor
(or an abbreviation thereof)
predicate reachesOnlyRejectableSuffixes(State fork, string w) { | ||
isReDoSCandidate(fork, w) and | ||
forex(State next | next = process(fork, w, w.length() - 1) | isLikelyRejectable(next)) and | ||
not epsilonSucc*(getProcessPrevious(fork, _, w)) = AcceptAnySuffix(_) // we stop `process(..)` early if we can, check here if it happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can rewrite it like this
not epsilonSucc*(getProcessPrevious(fork, _, w)) = AcceptAnySuffix(_) // we stop `process(..)` early if we can, check here if it happened. | |
not getProcessPrevious(fork, _, w) = acceptsAnySuffix() // we stop `process(..)` early if we can, check here if it happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Yep, looks great :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
The evaluations are all looking good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.
Many things are happening in this PR, all of which are outlined below.
Commit-by-commit review is highly recommended.
All the tests should pass on every commit.
1: Disable parts of QL-for-QLThe consistency andql/dead-code
queries are very noisy until we have proper support for parameterized modules.So I've disabled those.
No longer needed now that #9575 has been merged.
2: Introduce a
ReDoSPruning
parameterized moduleThe
class Config extends string
pattern used previously was ugly.This is just a straight-forward conversion of that pattern into a parameterized module.
3: Implement a more efficient algorithm for computing strings from chains
Based on @hvitved's work: #8589
But moved into a parameterized module in order to share more code between exponential and superlinear ReDoS.
It does the same thing as @hvitved's implementation, but in a slightly different way (to make a pretty signature module).
4: add further normalization of char classses
Inspired by @esbena: #4847 (comment)
Now
\d
and[\d]
and[0-9]
will literally be represented with a single value.(an arbitrary one of them is chosen as the canonical representative).
The extra normalization has a real, but limited, impact.
The largest impact is on the size of the
SuperlinearBackTracking::getAThreewayIntersect
predicate, the size of that predicate drops by ~15% on large JS benchmarks.I also drive-by fixed a bug related to case-normalization.
(The bug didn't affect anything before the changes in this PR).
5: make a parameterized module out of the RegexpMatching implementation
I didn't like the configuration pattern used previously, which was also why I hadn't made a real library out of it.
It can be better with parameterized modules, so I did that.
6: Rename
ReDoSUtil
toNfaUtils
, and rename theperformance
folder toregexp
.The
ReDoSUtil
file was used for more than just ReDoS: it's a general library for constructing and reasoning about NFAs.And the
performance
folder was also a bad name, so I renamed it toregexp
.7: Improve performance.
Lots of states in the
process()
predicate could be eliminated.Because we know that if a regexp ever "reaches" the accept-all state, then it can never be rejected, and thus we can skip them early.
I've also ported this exponential-redos fix from Shack to the superlinear library.
Evaluations.
All look good, and mostly show a significant speedup (except for Python, which has neutral performance).
JavaScript, Ruby, Java, Python.
Previous version of this PR, which got filled up with noise: #8522