Skip to content

C++: Fix spurious reference flow #11254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Nov 14, 2022

On the use-use-flow feature-branch we were getting spurious flow like:

void local_references(int &source) {
  source = 0;
  sink(source); // clean, but was marked as tainted
}

when the source dataflow node was defined as source.asParameter().hasName("source"). This turns out to be because source.asParameter() defaults to source.asParameter(0), which means that the value of source was marked as the source instead of the value of *source (if we pretend that the source is of a pointer type instead of a reference type).

Ideally, the fix in 5e9e7da should be enough to fix this. However, the first commit is a necessary precondition for this. This first commit fixes two things:

  • It removes the ad-hoc dataflow we added between reference addresses and their dereferences. That's pretty simple to remove ✔️.
  • The second issue is a subtle bug in adjacentDefRead: Previously, when we were flowing from one def/use to an adjacent use we did the following:
defOrUse.asDefOrUse().hasIndexInBlock(bb1, i1, v)
adjacentDefRead(_, bb1, i1, bb2, i2) and
use.asDefOrUse().(UseImpl).hasIndexInBlock(bb2, i2, v)

which amounts to saying:

  1. A def/use of variable v is at position (bb1, i1)
  2. There's an adjacent-def/use-use relationship between the locations (bb1, i1) and (bb2, i2)
  3. A use of variable v is at position (bb2, i2).

However, note that point 2 doesn't talk about which variable is participating in the adjacent-def/use-use relationship. So in our example above, we'd have a use-use relation between source (note: not *source) in source = 0 and source in sink(source). And since source and *source are located at the same index in the same block we'd get use-use flow between *source in source = 0 and *source in sink(source).

The fix is very simple: instead of ignoring the first column of adjacentDefRead we simply check that the variable in question is actually v.

@MathiasVP MathiasVP requested a review from a team as a code owner November 14, 2022 13:33
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Nov 14, 2022
@github-actions github-actions bot added the C++ label Nov 14, 2022
@MathiasVP MathiasVP marked this pull request as draft November 14, 2022 16:09
@MathiasVP
Copy link
Contributor Author

This PR is ready for reviewing. I was hoping to split up this work into three PRs, but they all annoyingly depend on each other 😭. However, the PR is easy to review commit-by-commit:

"PR one": Disable reference/value conflation in dataflow

This is done in 7408931. This is necessary for 235a069, but introduces missing/spurious flow that will be fixed in 3b1b8cc.

2cebd5c accepts the changes (where the missing/spurious flow is shown).

"PR two": Make asParameter work as expected for references

This is done in 235a069. It does we discussed in yesterday's meeting 🙂.

3e5c66e accepts the changes (there are no query changes - just a dataflow test change).

"PR three": Fix SSA for array writes

This is done in 3b1b8cc. The initialization of SSA variables of array types were broken: it was concluding that the initialization of arr (where arr has an array type) was assigning to *arr, but it really is assigning to the arr object (since there are no indirections that need to be followed!)

4f2c2e6 accepts the changes. This fixes the test changes from "PR one".

@MathiasVP MathiasVP marked this pull request as ready for review November 16, 2022 14:40
Comment on lines 426 to 431
// The IR results for this test _is_ equivalent to the AST ones.
// The IR annotation is just "ir" because the sink of the unitialized source at
// 428:7 is value of `local`, but the sink of the source from `intPointerSource`
// value of `*local` (i.e., the indirection node of `local`). So unlike AST dataflow,
// each of these two sinks correspond to a unique source, and thus we don't need to
// attach a location annotation to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still provide the location annotations to guarantee that if these get conflated we'll have a test failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. I think this requires adding another tag though, right (since I don't think we can have tags with optional values)?

Copy link
Contributor Author

@MathiasVP MathiasVP Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to add this in another PR if that's okay with you, though (since this PR is already three PRs 😭).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 428:7 on l. 428 is incorrect now, because this comment was added.

I do have some trouble wrapping my head around the reasoning here. My expectation for int local[1] would be that the array is initialised, but the elements of the array are not, but that's apparently not the way to think about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you mean, @jketema. Given the following program:

void intPointerSource(int *ref_source);
void sink(int);
void sink(const int *);

void intPointerSourceCaller2() {
  int local[1];
  intPointerSource(local);
  sink(local);
  sink(*local);
}

You're saying that you'd expect:

  1. Flow from an uninitialized node (representing the elements of the array) to sink(*local), and
  2. Flow from the elements of the array after leaving intPointerSource to sink(*local).

Correct?

And what we're currently getting is:

  1. Flow from an initialized node (representing the array) to sink(local), and
  2. Flow from the elements of the array after leaving intPointerSource to sink(*local).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the AST library does:

  1. Flow from an initialized node to sink(local), and to sink(*local)
  2. Flow from the elements of the array after leaving intPointerSource to sink(local), and to sink(*local).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct?

Yes.

It looks like the AST library does:

I was assuming this is due to conflation problems the AST library has. Or does the AST dataflow library indicate that there may be usability reasons for doing this? I can imagine that it might make sense to consider int local[1] to be uninitialised for ease writing some of the sources/sinks that involve arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that UninitializedNode should probably be handled differently to work for array variables. I think this should be solved in a separate PR, though.

@MathiasVP
Copy link
Contributor Author

I just force-pushed away a commit I added by mistake. Sorry about that!

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two typos.

MathiasVP and others added 4 commits November 21, 2022 10:36
@MathiasVP MathiasVP merged commit 349c5cd into github:mathiasvp/replace-ast-with-ir-use-usedataflow Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants