Skip to content

Conversation

@paldepind
Copy link
Contributor

@paldepind paldepind commented Oct 15, 2025

This PR intends to address (some) performance issues in the simple range analysis library.

The basic idea is rather simpler:

  1. Do a simple pre-analysis that tries to estimate how many bounds the range analysis would produce for a given expression. For instance, the number of bounds for e1 + e2 is number the number of bounds for e1 times the number of bounds for e2.
  2. If the estimate is large for a given expression then we turn on widening for that expression. This ensures that when range analysis runs the estimated blowup does not happen.

Note to reviewers:

  • Per commit review is probably easiest.
  • I've tried to thoroughly document stuff in comments (which I wont repeat that here). The comment for nrOfBoundsExpr is a good place to start.
  • A good chunk of code is quite straightforward. The tricky bits are around how phi nodes are handled. Things are a bit heuristic-y there, more optimal things might be possible, and there's a few details about the phi nodes here that I'm still a bit puzzled by. I've tried to add comments, but I think it's ok if a reviewer don't follow all the details there.
  • The limit beyond which widening is turned on is very large. This is to be conservative. We could lower it later.

@github-actions github-actions bot added the C++ label Oct 15, 2025
@paldepind paldepind force-pushed the cpp/range-analysis-measure branch 3 times, most recently from ab09ae5 to 4864e82 Compare October 16, 2025 10:44
/**
* Finds any expression that has a lower bound, but where `nrOfBounds` does
* not compute an estimate.
*/

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for e, but the QLDoc mentions nrOfBounds

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for n, but the QLDoc mentions nrOfBounds
@paldepind paldepind force-pushed the cpp/range-analysis-measure branch from 0ac00f8 to ab836bb Compare October 16, 2025 12:17
@paldepind paldepind force-pushed the cpp/range-analysis-measure branch from ab836bb to 9502d83 Compare October 16, 2025 13:07
@paldepind paldepind marked this pull request as ready for review October 16, 2025 13:47
Copilot AI review requested due to automatic review settings October 16, 2025 13:47
@paldepind paldepind requested a review from a team as a code owner October 16, 2025 13:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses performance issues in the C++ simple range analysis library by implementing a pre-analysis to estimate bounds growth and applying widening when the estimate exceeds a threshold. The solution prevents combinatorial explosions that could cause analysis timeouts.

Key changes:

  • Adds bounds estimation logic (BoundsEstimate module) that estimates potential bounds count before running full analysis
  • Implements selective widening based on bounds estimates to prevent performance issues
  • Updates test cases to reflect new analysis behavior with widening applied to expressions with many estimated bounds

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
SimpleRangeAnalysis.qll Core implementation adding bounds estimation module and widening logic
test.c New test cases demonstrating combinatorial explosion scenarios
upperBound.expected Updated expected results reflecting widening behavior
lowerBound.expected Updated expected results reflecting widening behavior
ternaryUpper.expected Updated expected results for ternary expressions
ternaryLower.expected Updated expected results for ternary expressions
nrOfBounds.ql New test query for bounds estimation debugging
Comments suppressed due to low confidence (1)

cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll:1

  • Corrected spelling of 'anncuracies' to 'inaccuracies'.
/**

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strategy seems sensible to me (but @MathiasVP has spent much more time working with this library). I will review the (second) DCA run when it finishes.

@paldepind
Copy link
Contributor Author

Thanks for the review @geoffw0 with some great catches 👍. I've applied your suggestions.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. Thanks a lot for this, Simon!

I think it would be good to add an inline expectations test which shows the number of bounds for a given expression. This would also make it easier to spot problems with non-functionality of nrOfBoundsExpr. Would you mind adding such an inline expectations test while you're here?

float getBoundsLimit() {
// This limit is arbitrary, but low enough that it prevents timeouts on
// specific observed customer databases (and the in the tests).
result = 2.0.pow(40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a perfectly fine threshold to start with. FWIW when I introduce an "arbitrary threshold" like this I like to do a small amount of investigation into the underlying distribution. See for example what I did in this PR from last year where I plotted "number of nested bitwise operations" for each bitwise operation in a database. It would be interesting to see a similar plot for "number of bounds" for each expression on a database or two.

... but as I said: I think this very high arbitrary threshold is perfectly fine as a start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. @andersfugmann also suggested that we could create a statistics query and potentially get telemetry for this to make a more quantified upper bound.

or
exists(ConditionalExpr condExpr |
e = condExpr and
result = nrOfBoundsExpr(condExpr.getThen()) * nrOfBoundsExpr(condExpr.getElse())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely doesn't give us the best possible join ordering since this recursion is non-linear, but we've never bothered to actually fix this in the main recursion of SimpleRangeAnalysis itself so it's probably all fine.

@paldepind
Copy link
Contributor Author

I've kicked off another DCA run for good measure, but assuming that one doesn't show anything then I think we're good to merge?

There is the option of doing a QA run as well. But the change seems low enough of a risk that that's worth it? Thoughts?

@paldepind
Copy link
Contributor Author

There's some failures in DCA now and retrying didn't make them go away. Looking at the errors, they don't really look like they're caused by the QL changes.

@jketema
Copy link
Contributor

jketema commented Oct 22, 2025

There's some failures in DCA now and retrying didn't make them go away. Looking at the errors, they don't really look like they're caused by the QL changes.

Those errors were fixed late yesterday afternoon. Could you re-run?

@paldepind
Copy link
Contributor Author

Thanks @jketema. I triggered retries 2 hours ago, should I start a new DCA run or just do the retry commands again?

@jketema
Copy link
Contributor

jketema commented Oct 22, 2025

I don't think retries work in this case: you'll need to start a new experiment.

@paldepind paldepind requested a review from geoffw0 October 23, 2025 12:10
@paldepind
Copy link
Contributor Author

I think this is ready to merge now. Please double-check that y'all agree with my assessment of the DCA report.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DCA LGTM.

@MathiasVP were all your questions addressed?

@MathiasVP
Copy link
Contributor

@MathiasVP were all your questions addressed?

The only thing I am missing is this part of my review:

I think it would be good to add an inline expectations test which shows the number of bounds for a given expression. This would also make it easier to spot problems with non-functionality of nrOfBoundsExpr. Would you mind adding such an inline expectations test while you're here?

@paldepind
Copy link
Contributor Author

I agree that inline expectations would be nice (perhaps even more so for the lower/upper bounds themselves). But, is it ok if we don't do that for now?

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional test can be done as follow-up.

@paldepind paldepind merged commit a0a6f28 into github:main Oct 24, 2025
16 checks passed
@paldepind paldepind deleted the cpp/range-analysis-measure branch October 24, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants