-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Add path-sensitivity to StackVariableReachability
#6004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Add path-sensitivity to StackVariableReachability
#6004
Conversation
07cbec3
to
4510083
Compare
Most of the LGTM false positives we lose seem to be cases of There are no new results. The CPP-Differences results show a lot of changes for a different query, Is there an explanation of the code changes anywhere? |
Thanks for taking a look at the results! I'll go over the ones you mention. I did find a bug that caused some false negatives, and I have a fix for that locally that I'm testing now. That might be why we lose some results that you couldn't make sense of. I will add explanatory comments once I know that I can get a handle on performance. The main problem is the recursive CPP-Differences only shows results for |
b984d8f
to
866e5b2
Compare
…tyWithReassignment' by using the old StackVariableReachability predicates that don't care about paths.
…ively pull in 'semmle.code.cpp.Print' when including 'cpp'.
866e5b2
to
14a04ee
Compare
I gather you're happy with results and performance now? The results I've seen from this PR are great, but I've still little idea how to approach reviewing the QL changes. What kind of path sensitivity do we now support, and what are the limitations?
Is that necessary? It doesn't seem like a desirable change. |
I'm running a last CPP-Difference to make sure that I didn't mess up anything: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2080/. It's been running for quite a while now, so maybe I did mess up something 🤔. It should still be fine to start reviewing it at this point, however. As the changes at some point in the past had good performance, I don't expect any performance fixes to involve a major rewrite.
On
For instance, let's say we're analyzing this piece of code: 1. source();
2. if(b)
3. foo();
4. if(!b)
5. sink(); and the first pass reports a path
I totally agree. I haven't been able to figure out how to avoid this, though. If you have any ideas on how to fix this I'd very much like to hear them. With that said, no code should really depend on the output of
I would happily jump on a Zoom call at some point so we can discuss the changes. Would that help? |
Not a review, just curiosity from my side: int array[] = {0, 1, 0, 1};
char* data = nullptr;
for (int i = 0; i < 4; ++i) {
int b = array[i];
if (b) sink(data);
if (!b) data = source();
} If yes, how does it work? I tried coming up with a meaningful example that could be plugged into the use-after-free query, but I'm not so sure that works great: int array[] = {0, 1, 0, 1};
char* data = malloc(10 * sizeof(char));
for (int i = 0; i < 4; ++i) {
int b = array[i];
if (b) use(data);
if (!b) free(data);
} However, as there is special loop handling, a test showcasing some of that could be nice? |
My idea was that this would work because we used GVN to compare conditions and so the (This goes back to an old discussion regarding GVN in loops I happened to come across on Slack.) |
The code seems to be pulled in by
Which leads me to the likely practical fix - remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of questions and a few trivial corrections. Not a finished review.
cpp/ql/src/semmle/code/cpp/controlflow/StackVariableReachability.qll
Outdated
Show resolved
Hide resolved
cpp/ql/src/semmle/code/cpp/controlflow/StackVariableReachability.qll
Outdated
Show resolved
Hide resolved
cpp/ql/src/semmle/code/cpp/controlflow/StackVariableReachability.qll
Outdated
Show resolved
Hide resolved
/** | ||
* Gets a `Condition` that controls `b`. | ||
* | ||
* Note: The trivial `True` condition always controls `b`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check my understanding, this is a Condition that must hold to enter b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I've spelled this out in the QLDoc in 656ff4a.
cpp/ql/src/semmle/code/cpp/controlflow/StackVariableReachability.qll
Outdated
Show resolved
Hide resolved
…ty.qll Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
…loop variant condition refutes itself across loop iterations.
How does the path sensitivity scale with a large number of paths, or with particularly long paths? |
I'm guessing |
…ty.qll Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
|
This is safe to ignore. See the conversation here: #6149 (comment) |
There's a lot of code in this PR and some areas have received more attention than others - but I'm starting to feel overall happy with it and looking forward to getting the benefits. The test changes due to I'd like to check:
|
Thanks for moving this forward, @geoffw0.
I think the performance problems found in https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2110/ has been fixed by 43bbd4f. But I see that my most recent CPP-Differences to verify this (from before I went on vacation) failed. I've started a fresh one to remove any doubt: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2146/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two minor notes from my side that stood out when scrolling over the PR. I've not reviewed the code, it is more than I can grok 😺
My concern seems to have been addressed 👍
|
||
private predicate reaches0(ControlFlowNode source, SemanticStackVariable v, ControlFlowNode sink) { | ||
/* | ||
* Implementation detail: the predicates in this class are a generalization of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment talks about a class - should it be somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's present in the original file here: https://github.com/github/codeql/blob/main/cpp/ql/src/semmle/code/cpp/controlflow/StackVariableReachability.qll#L65. It's really a reminder to whoever is changing the implementation of the predicates. So I think it's in the right place.
One test failure
|
Ouch! Thanks for catching this. Fixed in 768b3c8. |
CPP-Differences failed again, but only on
I'll spin up another one and hope for the best 🤞. Edit: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2149/ |
Unfortunately I can't realistically do the |
Thanks for trying out the LGTM changes. I can see that it's problematic to test it there.
I do notice the performance impact as well when I test things locally. It doesn't seem to be a big deal on CPP-Differences, though: I think we can conclude that performance is fine (at least on the projects we include in CPP-Differences - who knows what the next dist upgrade will show). |
Yeah, most likely it's just that we aren't warming up the cache properly despite our efforts. It's probably fine. But we should keep an eye after this is merged. |
I want to get this merged. I'm still a bit nervous about the volume of code, for both review quality (I've done my best) and maintaining it (we've discussed possibly moving the I plan to merge this tomorrow morning if there's no further discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…riablereachability C++: Revert #6004
This PR adds path-sensitivity to
StackVariableReachability
. There is a slight performance regression since we now do the whole forwards-backward path-finding algorithm, followed by a second stage that checks path feasibility. It seems to remove quite a lot of false positives on LGTM for thecpp/use-after-free
query: https://lgtm.com/query/4215884668081020126/.It does require some join-order gymnastics, but I think I fixed all the bad joins. But let's see what Jenkins says. 🤞
CPP-Differences:
https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2110/<-- Bad join. https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2114/