Skip to content

Refactorizations of the ReDoS libraries #9422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 16, 2022
Merged

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Jun 2, 2022

Many things are happening in this PR, all of which are outlined below.

Commit-by-commit review is highly recommended.
All the tests should pass on every commit.

1: Disable parts of QL-for-QL

The consistency and ql/dead-code queries are very noisy until we have proper support for parameterized modules.
So I've disabled those.

No longer needed now that #9575 has been merged.

2: Introduce a ReDoSPruning parameterized module

The class Config extends string pattern used previously was ugly.
This is just a straight-forward conversion of that pattern into a parameterized module.

3: Implement a more efficient algorithm for computing strings from chains

Based on @hvitved's work: #8589

But moved into a parameterized module in order to share more code between exponential and superlinear ReDoS.
It does the same thing as @hvitved's implementation, but in a slightly different way (to make a pretty signature module).

4: add further normalization of char classses

Inspired by @esbena: #4847 (comment)

Now \d and [\d] and [0-9] will literally be represented with a single value.
(an arbitrary one of them is chosen as the canonical representative).

The extra normalization has a real, but limited, impact.
The largest impact is on the size of the SuperlinearBackTracking::getAThreewayIntersect predicate, the size of that predicate drops by ~15% on large JS benchmarks.

I also drive-by fixed a bug related to case-normalization.
(The bug didn't affect anything before the changes in this PR).

5: make a parameterized module out of the RegexpMatching implementation

I didn't like the configuration pattern used previously, which was also why I hadn't made a real library out of it.
It can be better with parameterized modules, so I did that.

6: Rename ReDoSUtil to NfaUtils, and rename the performance folder to regexp.

The ReDoSUtil file was used for more than just ReDoS: it's a general library for constructing and reasoning about NFAs.
And the performance folder was also a bad name, so I renamed it to regexp.

7: Improve performance.

Lots of states in the process() predicate could be eliminated.
Because we know that if a regexp ever "reaches" the accept-all state, then it can never be rejected, and thus we can skip them early.

I've also ported this exponential-redos fix from Shack to the superlinear library.


Evaluations.

All look good, and mostly show a significant speedup (except for Python, which has neutral performance).
JavaScript, Ruby, Java, Python.


Previous version of this PR, which got filled up with noise: #8522

@aschackmull
Copy link
Contributor

Ping @ginsbach : Parameterised modules in here.

@aschackmull aschackmull requested a review from ginsbach June 13, 2022 10:30
@ginsbach
Copy link
Contributor

Thanks for pinging. The use of parameterised modules here looks good to me, no concerns!

@erik-krogh
Copy link
Contributor Author

I've rebased on the latest main now that QL-for-QL supports parameterized modules: #9575

I've also re-enabled the QL-for-QL consistency- and dead-code-query in this PR.

@erik-krogh
Copy link
Contributor Author

I did a merge with main to fix a merge conflict.

@github/codeql-python can I get a review?

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really love this work (although it could arguably have been in several PRs).

The requested changes are all minor.

*/
pragma[noinline]
private predicate isLikelyRejectable(State s) {
s = stateInPumpableRegexp() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we cannot define classes inside parameterised modules yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.
Screenshot 2022-08-12 at 19 53 56

I can only extend database types inside a parameterised module.
And that's not feasible here.

/**
* Holds if repetitions of `pump` at `t` will cause polynomial backtracking.
*/
predicate polynimalReDoS(RegExpTerm t, string pump, string prefixMsg, RegExpTerm prev) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate should probably be renamed to polynomialReDoS everywhere. Although, while the code looks new here, I think this name was not introduced in this PR.

Comment on lines 207 to 208
/** DEPRECATED: Alias for HtmlMatchingRegExp */
deprecated class HTMLMatchingRegExp = HtmlMatchingRegExp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK to delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'll reintroduce it.
It's in a Query.qll so it's likely fine, but I'll get it back anyway.

Comment on lines 53 to 57
(
test(reg, str, ignorePrefix)
or
testWithGroups(reg, str, ignorePrefix)
) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(
test(reg, str, ignorePrefix)
or
testWithGroups(reg, str, ignorePrefix)
) and
isCandidate(reg, str, ignorePrefix, _)
and

* A module for determining if a regexp matches a given string,
* and reasoning about which capture groups are filled by a given string.
*
* The module parameter `isRegexpMatchingCandidateSig` determines which string should be tested,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The module parameter `isRegexpMatchingCandidateSig` determines which string should be tested,
* The module parameter `isCandidate` determines which strings should be tested,

}

/**
* Holds if `reg` matches `str`, where `str` is either in the `test` or `testWithGroups` predicate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Holds if `reg` matches `str`, where `str` is either in the `test` or `testWithGroups` predicate.
* Holds if `reg` matches `str`, where `str` is in the `isCandidate` predicate.

private predicate isRelevant(Node n) {
isARelevantEnd(n)
or
exists(Node prev | isRelevant(prev) | n = getPrev(prev))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exists(Node prev | isRelevant(prev) | n = getPrev(prev))
exists(Node succ | isRelevant(succ) | n = getPrev(succ))

Comment on lines 1301 to 1302
/** Gets an ancestor of `end`, where `end` is a node that should have a result in `concretize`. */
private Node getANodeInLongChain(Node end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be called getAnAncestor? And could it be written via prev*?

Copy link
Contributor Author

@erik-krogh erik-krogh Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it can be written like that, I don't remember if I wrote the recursion manually for a reason 🤷

I'll do another evaluation to check.

/** Gets the `i`th character on the path from the root to `n`. */
pragma[noinline]
private string getPrefixChar(Node n, int i) {
exists(Node prev |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename prev to ancestor (or an abbreviation thereof)

predicate reachesOnlyRejectableSuffixes(State fork, string w) {
isReDoSCandidate(fork, w) and
forex(State next | next = process(fork, w, w.length() - 1) | isLikelyRejectable(next)) and
not epsilonSucc*(getProcessPrevious(fork, _, w)) = AcceptAnySuffix(_) // we stop `process(..)` early if we can, check here if it happened.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can rewrite it like this

Suggested change
not epsilonSucc*(getProcessPrevious(fork, _, w)) = AcceptAnySuffix(_) // we stop `process(..)` early if we can, check here if it happened.
not getProcessPrevious(fork, _, w) = acceptsAnySuffix() // we stop `process(..)` early if we can, check here if it happened.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

@erik-krogh erik-krogh requested a review from yoff August 12, 2022 18:39
@erik-krogh
Copy link
Contributor Author

@yoff I think I've addressed your review comments, and I also addressed some additional documentation issues identified by QL-for-QL.

The remaining QL-for-QL alerts are false positives that will be fixed by #9884

I've started new evaluations for JavaScript, Python, Java, and Ruby.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@yoff
Copy link
Contributor

yoff commented Aug 13, 2022

Yep, looks great :-)

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@erik-krogh
Copy link
Contributor Author

I've started new evaluations for JavaScript, Python, Java, and Ruby.

The evaluations are all looking good.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 27 potential problems in the proposed changes. Check the Files changed tab for more details.

@erik-krogh erik-krogh merged commit f106e06 into github:main Aug 16, 2022
@github github deleted a comment Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants