C++: Fix pointer/pointee conflation #13191

MathiasVP · 2023-05-16T16:53:02Z

This PR fixes the conflation identified in #13182.

Turns out the problem was something we've actually seen before. Consider this code

void increment_and_call_sink(int** buf2) {
  *buf2 += 10;
  sink(buf2);
}

void call_increment_and_call_sink(int** buf1) {
  increment_and_call_sink(buf1);
}

void test_conflation_regression() {
  int* buf = source();
  call_increment_buf(&buf);
}

We got spurious flow from source() to sink(buf). This shouldn't happen since the tainted value is *buf. However, we were getting flow in this case because the post-update node corresponding to the value of buf1 after leaving increment_and_call_sink re-entered increment_and_call_sink and got dereferenced an additional time in the *buf2 += ... operation.

This PR fixes this problem by excluding SSA flow from a PostUpdateNodes to another node that's an argument to the same callable as the pre-update node's argument node's callable.

Commit-by-commit review recommended. The first commit slightly changes our dataflow tests so that we can distinguish indirect sinks from non-indirect sinks, and the second commit fixes the conflation issue.

…k into the function argument.

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaInternals.qll

-  exists(Node adjusted |
-    indirectConversionFlowStep*(adjusted, nodeFrom) and
-    nodeToDefOrUse(adjusted, defOrUse, uncertain) and
+private predicate adjustForPointerArith(PostUpdateNode pun, UseOrPhi use) {


jketema

At this point just some questions to further my understanding. A second pair of eyes would be useful.

It seems that before postUpdateFlow depended on ssaFlowImpl, while we have now completely disconnected the two, moving the relevant parts of ssaFlowImpl into postUpdateFlow. Is that correct? If so, why do we no longer need a restriction on PostUpdateNodes in ssaFlowImpl?

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaInternals.qll

MathiasVP · 2023-05-17T13:49:01Z

At this point just some questions to further my understanding. A second pair of eyes would be useful.

It seems that before postUpdateFlow depended on ssaFlowImpl, while we have now completely disconnected the two, moving the relevant parts of ssaFlowImpl into postUpdateFlow. Is that correct? If so, why do we no longer need a restriction on PostUpdateNodes in ssaFlowImpl?

Yep, that's correct. At the end, they both end up calling adjacentDefRead (which is the cached predicate from the SSA analysis), but the flow out of post-update nodes has an additional complication to handle flow out of arguments such as:

char* p = /* ... */;
write_to_argument(p + n);
// ...
use(p);

because p + n isn't a direct use of p, but rather a pointer operation whose left operand is a use of p. So the post-update node case needs walk the control-flow graph to find p.

If so, why do we no longer need a restriction on PostUpdateNodes in ssaFlowImpl?

By "a restriction on PostUpdateNodes" do you mean the nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() conjunct? This was there to distinguish between whether or not we were in the post-update. Because we on main implement flow out of post-update nodes simply as the use-use flow out of the post-update node's pre-update node, we didn't have the post-update node as a parameter in ssaFlowImpl, so ssaFlowImpl branched on whether nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() was true or not. If it was false we do what is now exactly the code in ssaFlowImpl, and if it's true we do what is now in postUpdateFlow (but with the added restriction that we don't flow from the post-update node and into an argument node).

Does that make sense?

jketema · 2023-05-17T14:13:39Z

By "a restriction on PostUpdateNodes" do you mean the nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() conjunct?

Indeed.

This was there to distinguish between whether or not we were in the post-update. Because we on main implement flow out of post-update nodes simply as the use-use flow out of the post-update node's pre-update node, we didn't have the post-update node as a parameter in ssaFlowImpl, so ssaFlowImpl branched on whether nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() was true or not. If it was false we do what is now exactly the code in ssaFlowImpl, and if it's true we do what is now in postUpdateFlow (but with the added restriction that we don't flow from the post-update node and into an argument node).

Does that make sense?

Not quite. My assumption was that we needed not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode(), otherwise we end up in the bad case described in the QLDoc of adjustForPointerArith, but that apparently an incorrect assumption?

MathiasVP · 2023-05-17T15:03:35Z

Not quite. My assumption was that we needed not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode(), otherwise we end up in the bad case described in the QLDoc of adjustForPointerArith, but that apparently an incorrect assumption?

Ah, I see. No, we shouldn't be able to hit the bad case from the QLDoc in adjustForPointerArith exactly because we don't count PointerArithmeticInstructions (with an operand that is a use of some SSA variable x) as a use of x.

jketema · 2023-05-17T15:35:31Z

So was not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() redundant before?

MathiasVP · 2023-05-17T15:39:20Z

So was not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() redundant before?

IIRC, I included that negation because the case where nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() was covered by the disjunction that did adjustForPointerArith, and so I added the extra guard to make sure we didn't hit both cases. But in retrospect I don't think this would be possible.

jketema · 2023-05-17T15:40:16Z

Ok, that clarifies things.

jketema · 2023-05-17T16:48:13Z

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaInternals.qll

-    then preUpdate = [nFrom, getAPriorDefinition(defOrUse)]
-    else preUpdate = nFrom
+    not exists(DataFlowCall call |
+      isArgumentOfCallable(call, preUpdate) and isArgumentOfCallable(call, nodeTo)


Final question from my side. I think I mostly follow along now, but would definitely appreciate @rdmarsh2's input.

What is the reason for ignoring the argument positions here? Assuming this is important, could we add a test for this?

It shouldn't be important, no. The main thing is that we want to disallow flow from the PostUpdateNode and back into the function as an argument (which violates the evaluation order). So it doesn't really matter what the argument order is since the PostUpdateNode always represents the value after we've returned from the function and the ArgumentNode always represents the value before we've entered the function.

With that said, I'd also be interested in knowing if I've missed something here (cc @rdmarsh2).

I think there would never be a correct step directly from the postupdate to a preupdate on the same call... That ought to only be possible in a loop, and there should be an intervening phi node in that case. If it does come up I think it's an IR inconsistency problem.

Yeah, I agree. The closest we can come to flow directly from a PostUpdateNode and back to the argument is something like:

int x = 0; // ... while(...) { write_to_arg(&x); }

where we'd have flow from x's post update note, to a phi node at the loop entry and back to &x. But as Robert says, there should always be a PhiNode here.

MathiasVP · 2023-05-17T16:55:26Z

I'm currently going through all the alert changes. Here are my notes so far:

The new vim/vim result on cpp/path-injection is a genuine TP (although it shows that we should probably add a barrier to block flow to "small" values like values of type char), and all the 8 lost cpp/path-injection results on vim/vim are cases where we exit cmd_main's at the PostUpdateNode for argv and then re-enter via argv again 🤦 in:

  result = cmd_main(argc, argv);

  trace2_cmd_exit(result);

  return result;
}

The lost 23 lost results for cpp/non-constant-format on nelson is all FPs such as this pattern:

margin_printf(outfile, length ? "/* %s */\n" : "\n", storage);

where we exit margin_printf through the second argument and into the same argument and claim that this is a non-constant argument to a formatting function 🤦. Thank God these will be fixed now!

I'll continue looking at the remaining changes tomorrow.

MathiasVP · 2023-05-18T12:08:14Z

I've now gone through most of the lost results, and all of the ones I've looked at have been cases where we re-entered a function we just exited from through a PostUpdateNode. The new results also all look like genuine TPs. So I'm calling this a win 🎉.

MathiasVP added 3 commits May 16, 2023 17:39

C++: Introduce 'indirect_sink' in dataflow tests.

35e91ba

C++: Fix looping flow that goes from the output argument node and bac…

150d4f3

…k into the function argument.

C++: Accept test changes.

c93a051

MathiasVP requested a review from a team as a code owner May 16, 2023 16:53

github-actions bot added the C++ label May 16, 2023

github-advanced-security bot found potential problems May 16, 2023

View reviewed changes

MathiasVP added the no-change-note-required This PR does not need a change note label May 16, 2023

C++: Accept query test changes.

402212b

jketema reviewed May 17, 2023

View reviewed changes

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaInternals.qll Show resolved Hide resolved

MathiasVP requested a review from rdmarsh2 May 17, 2023 15:03

jketema reviewed May 17, 2023

View reviewed changes

rdmarsh2 approved these changes May 17, 2023

View reviewed changes

MathiasVP merged commit 8cf25ba into github:main May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Fix pointer/pointee conflation #13191

C++: Fix pointer/pointee conflation #13191

Uh oh!

MathiasVP commented May 16, 2023

Uh oh!

Check warning

jketema left a comment

Uh oh!

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

jketema commented May 17, 2023

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

jketema commented May 17, 2023

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

jketema commented May 17, 2023

Uh oh!

jketema May 17, 2023

Uh oh!

MathiasVP May 17, 2023

Uh oh!

MathiasVP May 17, 2023

Uh oh!

rdmarsh2 May 17, 2023

Uh oh!

MathiasVP May 17, 2023 •

edited

Loading

Uh oh!

MathiasVP commented May 17, 2023 •

edited

Loading

Uh oh!

MathiasVP commented May 18, 2023 •

edited

Loading

Uh oh!

Uh oh!

C++: Fix pointer/pointee conflation #13191

C++: Fix pointer/pointee conflation #13191

Uh oh!

Conversation

MathiasVP commented May 16, 2023

Uh oh!

Check warning

jketema left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

jketema commented May 17, 2023

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

jketema commented May 17, 2023

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

jketema commented May 17, 2023

Uh oh!

jketema May 17, 2023

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 17, 2023

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 17, 2023

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 May 17, 2023

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MathiasVP commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MathiasVP commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MathiasVP May 17, 2023 •

edited

Loading

MathiasVP commented May 17, 2023 •

edited

Loading

MathiasVP commented May 18, 2023 •

edited

Loading