-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Fix more FPs on cpp/invalid-pointer-deref
#12971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
Outdated
Show resolved
Hide resolved
I don't understand why this is correct. What cases are ruled out by switching to a strict dominator, and are we not accidentally excluding cases we do want to report? |
For some reason some phi nodes have phi edges that goto themselves. I checked whether this is a C/C++ specific issue by adding a consistency check here (which I didn't finish yet because I couldn't compile some of the other extractors 😭), and it looks like all languages have phi nodes that depend directly on themselves. Such phi instructions would be ruled out by using non-strict domination. I haven't seen any real-world example that's negatively impacted by the change. |
So this seems like a more fundamental problem (bug?). |
Mayybbbeeee. Or it shows that I'm not fully understanding phi edges 😂. There's a phi node void test(int i)
{
while (i)
{
--i;
}
} and that seems weird to me since I'd imagine the input of the phi to be from:
but it looks like the input is from:
So it's not like there's a redundant phi edge. |
That's weird. It's also not the story the aliased SSA is telling me:
|
That's the IR-based SSA analysis (which is not what we use for dataflow). Dataflow uses the shared SSA library, which has a few differences from the shared SSA library. Notably, the shared library introduces phi edges for SSA reads (in addition to SSA writes). And yeah, my expectation of what SSA is matches the output of the IR-based SSA. But not the shared SSA library's output 😄. |
Because, of course we have multiple ways to do SSA :tableflip:. |
Yeah, the IR-based SSA is really good for must-analyses like the one we need in range analysis and value numbering, but not at all what we want for dataflow 😭. For a while we wanted to improve the IR-based analysis to the point where it was usable for dataflow ... but that never happened, and I pulled in the shared SSA library to do the heavy lifting instead. |
Is there any easy way for me to see what the phi edges are for a particular example? |
The easiest way of doing this is probably just to query for import semmle.code.cpp.ir.dataflow.internal.DataFlowUtil
from SsaPhiNode n, boolean fromBackEdge
select n, n.getAnInput(fromBackEdge), fromBackEdge If we want to get a more visual representation, we could resurrect the old |
So will this change still be needed after #13059? |
Yeah, it looks like it. |
😕 Do we understand why? |
Let's wait until the other PR is merged |
This indeed still seems needed. Looking at the back-edges of: for (char* p = begin; p <= end; ++p) {
*p = 0; // BAD
} There's a back-edge from There's more I don't quite understand though. If I repeat the loop, there's also an edge from the |
Keep in mind that the location for phi nodes isn't super helpful: they get their location from the enclosing basic block. So just because the location overlaps with a specific variable access doesn't mean it's a phi for that variable. On the snippet above I get 3 back-edges:
|
I'm also seeing edges from |
Oops. I was running with strict domination instead of domination for my above example. Yeah, I'm seeing some additional back-edges other than the ones I mentioned above. Let's discuss it in a sync later today 🤔 |
…ead-phi and use it to restrict flow in 'cpp/invalid-pointer-deref'.
@jketema I've pushed the changes from our sync discussion. For everyone else: It turned out that changing the definition of back-edge from using dominators to strict dominators was not the right way to fix these FPs. Instead, we noticed that the nodes being marked as barriers by the end = ...
for(p = start; p < end; ++p) {
*p = 0;
} there's a phi-read node for end in the basic-block for Previously, this PR excluded this Now you may ask (as @jketema rightly did): Why do we even need to add a barrier in the configuration that just searches for a dereference of a pointer that's found to be out-of-bounds, and the reason is super subtle. Consider this snippet: void test19(unsigned len)
{
int *xs = new int[len];
int *end = xs + len;
for (int *x = xs; x <= end; x++)
{
int i = *x;
}
} The idea is that:
That is all fine, but now consider this example: void test19(unsigned len)
{
int *xs = new int[len];
int *end = xs + len;
for (int *x = xs; x < end; x++)
{
int i = *x;
}
} which is totally fine (because we're now using a strict relational operator instead). But (up until this PR) we were generating a FP here because:
The problem is that, while |
I think DCA looks good. Performance looks unchanged, but we need to double check that the lost results match the FPs we intended to remove 🤞. |
Let me check those results (unless you're already doing it). |
I haven't done so yet, no. Feel free to do that 🙂. |
I'll have a look tomorrow morning. Seems more sensible that I do it, as I need to dive into the FPs of this query anyway. |
As far as I can tell the DCA alerts that have disappeared are all of the form we expect. The only one I have trouble with is wireshark__wireshark: epan/dissectors/file-elf.c:1099:13:1099:35. However, as far as I can tell that is a FP, so I'm not too concerned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -229,6 +229,10 @@ module InvalidPointerToDerefConfig implements DataFlow::ConfigSig { | |||
|
|||
pragma[inline] | |||
predicate isSink(DataFlow::Node sink) { isInvalidPointerDerefSink(sink, _, _) } | |||
|
|||
predicate isBarrier(DataFlow::Node node) { | |||
node = any(DataFlow::SsaPhiNode phi | not phi.isPhiRead()).getAnInput(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can introduce this restriction in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wouldn't be surprised if we could construct some FN that would be removed if we did the same trick to the isBarrier2
in the product flow configuration.
Yeah, I guess there's some complex invariant about when |
It turns out that the back-edge detection predicate we have on
PhiNode
s wasn't correct for the purpose it was introduced for. It was meant to block back-edge flow (see #10593), but it turns out that it's blocking too much flow, which we're seeing when add the barrier to fix the testcases introduced in #12960.Commit-by-commit review recommended: The first commit fixes the FPs by adding said barrier, and the second commit fixes the FNs by fixing what we consider to be a back-edge.