-
Couldn't load subscription status.
- Fork 1.8k
C++: Range analysis measure bounds #20645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ab09ae5 to
4864e82
Compare
| /** | ||
| * Finds any expression that has a lower bound, but where `nrOfBounds` does | ||
| * not compute an estimate. | ||
| */ |
Check warning
Code scanning / CodeQL
Missing QLDoc for parameter Warning
Check warning
Code scanning / CodeQL
Missing QLDoc for parameter Warning
0ac00f8 to
ab836bb
Compare
ab836bb to
9502d83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses performance issues in the C++ simple range analysis library by implementing a pre-analysis to estimate bounds growth and applying widening when the estimate exceeds a threshold. The solution prevents combinatorial explosions that could cause analysis timeouts.
Key changes:
- Adds bounds estimation logic (
BoundsEstimatemodule) that estimates potential bounds count before running full analysis - Implements selective widening based on bounds estimates to prevent performance issues
- Updates test cases to reflect new analysis behavior with widening applied to expressions with many estimated bounds
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| SimpleRangeAnalysis.qll | Core implementation adding bounds estimation module and widening logic |
| test.c | New test cases demonstrating combinatorial explosion scenarios |
| upperBound.expected | Updated expected results reflecting widening behavior |
| lowerBound.expected | Updated expected results reflecting widening behavior |
| ternaryUpper.expected | Updated expected results for ternary expressions |
| ternaryLower.expected | Updated expected results for ternary expressions |
| nrOfBounds.ql | New test query for bounds estimation debugging |
Comments suppressed due to low confidence (1)
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll:1
- Corrected spelling of 'anncuracies' to 'inaccuracies'.
/**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strategy seems sensible to me (but @MathiasVP has spent much more time working with this library). I will review the (second) DCA run when it finishes.
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/test/library-tests/rangeanalysis/SimpleRangeAnalysis/lowerBound.expected
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
|
Thanks for the review @geoffw0 with some great catches 👍. I've applied your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments. Thanks a lot for this, Simon!
I think it would be good to add an inline expectations test which shows the number of bounds for a given expression. This would also make it easier to spot problems with non-functionality of nrOfBoundsExpr. Would you mind adding such an inline expectations test while you're here?
| float getBoundsLimit() { | ||
| // This limit is arbitrary, but low enough that it prevents timeouts on | ||
| // specific observed customer databases (and the in the tests). | ||
| result = 2.0.pow(40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a perfectly fine threshold to start with. FWIW when I introduce an "arbitrary threshold" like this I like to do a small amount of investigation into the underlying distribution. See for example what I did in this PR from last year where I plotted "number of nested bitwise operations" for each bitwise operation in a database. It would be interesting to see a similar plot for "number of bounds" for each expression on a database or two.
... but as I said: I think this very high arbitrary threshold is perfectly fine as a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. @andersfugmann also suggested that we could create a statistics query and potentially get telemetry for this to make a more quantified upper bound.
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
| or | ||
| exists(ConditionalExpr condExpr | | ||
| e = condExpr and | ||
| result = nrOfBoundsExpr(condExpr.getThen()) * nrOfBoundsExpr(condExpr.getElse()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This likely doesn't give us the best possible join ordering since this recursion is non-linear, but we've never bothered to actually fix this in the main recursion of SimpleRangeAnalysis itself so it's probably all fine.
|
I've kicked off another DCA run for good measure, but assuming that one doesn't show anything then I think we're good to merge? There is the option of doing a QA run as well. But the change seems low enough of a risk that that's worth it? Thoughts? |
|
There's some failures in DCA now and retrying didn't make them go away. Looking at the errors, they don't really look like they're caused by the QL changes. |
Those errors were fixed late yesterday afternoon. Could you re-run? |
|
Thanks @jketema. I triggered retries 2 hours ago, should I start a new DCA run or just do the retry commands again? |
|
I don't think retries work in this case: you'll need to start a new experiment. |
|
I think this is ready to merge now. Please double-check that y'all agree with my assessment of the DCA report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DCA LGTM.
@MathiasVP were all your questions addressed?
The only thing I am missing is this part of my review:
|
|
I agree that inline expectations would be nice (perhaps even more so for the lower/upper bounds themselves). But, is it ok if we don't do that for now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional test can be done as follow-up.
This PR intends to address (some) performance issues in the simple range analysis library.
The basic idea is rather simpler:
e1 + e2is number the number of bounds fore1times the number of bounds fore2.Note to reviewers:
nrOfBoundsExpris a good place to start.