Skip to content

C++: Fix pointer/pointee conflation #13191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 18, 2023

Conversation

MathiasVP
Copy link
Contributor

This PR fixes the conflation identified in #13182.

Turns out the problem was something we've actually seen before. Consider this code

void increment_and_call_sink(int** buf2) {
  *buf2 += 10;
  sink(buf2);
}

void call_increment_and_call_sink(int** buf1) {
  increment_and_call_sink(buf1);
}

void test_conflation_regression() {
  int* buf = source();
  call_increment_buf(&buf);
}

We got spurious flow from source() to sink(buf). This shouldn't happen since the tainted value is *buf. However, we were getting flow in this case because the post-update node corresponding to the value of buf1 after leaving increment_and_call_sink re-entered increment_and_call_sink and got dereferenced an additional time in the *buf2 += ... operation.

This PR fixes this problem by excluding SSA flow from a PostUpdateNodes to another node that's an argument to the same callable as the pre-update node's argument node's callable.

Commit-by-commit review recommended. The first commit slightly changes our dataflow tests so that we can distinguish indirect sinks from non-indirect sinks, and the second commit fixes the conflation issue.

@MathiasVP MathiasVP requested a review from a team as a code owner May 16, 2023 16:53
@github-actions github-actions bot added the C++ label May 16, 2023
exists(Node adjusted |
indirectConversionFlowStep*(adjusted, nodeFrom) and
nodeToDefOrUse(adjusted, defOrUse, uncertain) and
private predicate adjustForPointerArith(PostUpdateNode pun, UseOrPhi use) {

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter

The QLDoc has no documentation for pun, but the QLDoc mentions source
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label May 16, 2023
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point just some questions to further my understanding. A second pair of eyes would be useful.

It seems that before postUpdateFlow depended on ssaFlowImpl, while we have now completely disconnected the two, moving the relevant parts of ssaFlowImpl into postUpdateFlow. Is that correct? If so, why do we no longer need a restriction on PostUpdateNodes in ssaFlowImpl?

@MathiasVP
Copy link
Contributor Author

At this point just some questions to further my understanding. A second pair of eyes would be useful.

It seems that before postUpdateFlow depended on ssaFlowImpl, while we have now completely disconnected the two, moving the relevant parts of ssaFlowImpl into postUpdateFlow. Is that correct? If so, why do we no longer need a restriction on PostUpdateNodes in ssaFlowImpl?

Yep, that's correct. At the end, they both end up calling adjacentDefRead (which is the cached predicate from the SSA analysis), but the flow out of post-update nodes has an additional complication to handle flow out of arguments such as:

char* p = /* ... */;
write_to_argument(p + n);
// ...
use(p);

because p + n isn't a direct use of p, but rather a pointer operation whose left operand is a use of p. So the post-update node case needs walk the control-flow graph to find p.

If so, why do we no longer need a restriction on PostUpdateNodes in ssaFlowImpl?

By "a restriction on PostUpdateNodes" do you mean the nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() conjunct? This was there to distinguish between whether or not we were in the post-update. Because we on main implement flow out of post-update nodes simply as the use-use flow out of the post-update node's pre-update node, we didn't have the post-update node as a parameter in ssaFlowImpl, so ssaFlowImpl branched on whether nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() was true or not. If it was false we do what is now exactly the code in ssaFlowImpl, and if it's true we do what is now in postUpdateFlow (but with the added restriction that we don't flow from the post-update node and into an argument node).

Does that make sense?

@jketema
Copy link
Contributor

jketema commented May 17, 2023

By "a restriction on PostUpdateNodes" do you mean the nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() conjunct?

Indeed.

This was there to distinguish between whether or not we were in the post-update. Because we on main implement flow out of post-update nodes simply as the use-use flow out of the post-update node's pre-update node, we didn't have the post-update node as a parameter in ssaFlowImpl, so ssaFlowImpl branched on whether nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() was true or not. If it was false we do what is now exactly the code in ssaFlowImpl, and if it's true we do what is now in postUpdateFlow (but with the added restriction that we don't flow from the post-update node and into an argument node).

Does that make sense?

Not quite. My assumption was that we needed not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode(), otherwise we end up in the bad case described in the QLDoc of adjustForPointerArith, but that apparently an incorrect assumption?

@MathiasVP
Copy link
Contributor Author

Not quite. My assumption was that we needed not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode(), otherwise we end up in the bad case described in the QLDoc of adjustForPointerArith, but that apparently an incorrect assumption?

Ah, I see. No, we shouldn't be able to hit the bad case from the QLDoc in adjustForPointerArith exactly because we don't count PointerArithmeticInstructions (with an operand that is a use of some SSA variable x) as a use of x.

@MathiasVP MathiasVP requested a review from rdmarsh2 May 17, 2023 15:03
@jketema
Copy link
Contributor

jketema commented May 17, 2023

So was not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() redundant before?

@MathiasVP
Copy link
Contributor Author

So was not nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() redundant before?

IIRC, I included that negation because the case where nodeFrom = any(PostUpdateNode pun).getPreUpdateNode() was covered by the disjunction that did adjustForPointerArith, and so I added the extra guard to make sure we didn't hit both cases. But in retrospect I don't think this would be possible.

@jketema
Copy link
Contributor

jketema commented May 17, 2023

Ok, that clarifies things.

then preUpdate = [nFrom, getAPriorDefinition(defOrUse)]
else preUpdate = nFrom
not exists(DataFlowCall call |
isArgumentOfCallable(call, preUpdate) and isArgumentOfCallable(call, nodeTo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final question from my side. I think I mostly follow along now, but would definitely appreciate @rdmarsh2's input.

What is the reason for ignoring the argument positions here? Assuming this is important, could we add a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be important, no. The main thing is that we want to disallow flow from the PostUpdateNode and back into the function as an argument (which violates the evaluation order). So it doesn't really matter what the argument order is since the PostUpdateNode always represents the value after we've returned from the function and the ArgumentNode always represents the value before we've entered the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that said, I'd also be interested in knowing if I've missed something here (cc @rdmarsh2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there would never be a correct step directly from the postupdate to a preupdate on the same call... That ought to only be possible in a loop, and there should be an intervening phi node in that case. If it does come up I think it's an IR inconsistency problem.

Copy link
Contributor Author

@MathiasVP MathiasVP May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. The closest we can come to flow directly from a PostUpdateNode and back to the argument is something like:

int x = 0;
// ...
while(...) {
  write_to_arg(&x);
}

where we'd have flow from x's post update note, to a phi node at the loop entry and back to &x. But as Robert says, there should always be a PhiNode here.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented May 17, 2023

I'm currently going through all the alert changes. Here are my notes so far:

  • The new vim/vim result on cpp/path-injection is a genuine TP (although it shows that we should probably add a barrier to block flow to "small" values like values of type char), and all the 8 lost cpp/path-injection results on vim/vim are cases where we exit cmd_main's at the PostUpdateNode for argv and then re-enter via argv again 🤦 in:
  result = cmd_main(argc, argv);

  trace2_cmd_exit(result);

  return result;
}
  • The lost 23 lost results for cpp/non-constant-format on nelson is all FPs such as this pattern:
margin_printf(outfile, length ? "/* %s */\n" : "\n", storage);

where we exit margin_printf through the second argument and into the same argument and claim that this is a non-constant argument to a formatting function 🤦. Thank God these will be fixed now!

I'll continue looking at the remaining changes tomorrow.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented May 18, 2023

I've now gone through most of the lost results, and all of the ones I've looked at have been cases where we re-entered a function we just exited from through a PostUpdateNode. The new results also all look like genuine TPs. So I'm calling this a win 🎉.

@MathiasVP MathiasVP merged commit 8cf25ba into github:main May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants