-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Fix spurious reference flow #11254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Fix spurious reference flow #11254
Conversation
5e9e7da
to
972d243
Compare
972d243
to
4f2c2e6
Compare
This PR is ready for reviewing. I was hoping to split up this work into three PRs, but they all annoyingly depend on each other 😭. However, the PR is easy to review commit-by-commit: "PR one": Disable reference/value conflation in dataflowThis is done in 7408931. This is necessary for 235a069, but introduces missing/spurious flow that will be fixed in 3b1b8cc. 2cebd5c accepts the changes (where the missing/spurious flow is shown). "PR two": Make
|
// The IR results for this test _is_ equivalent to the AST ones. | ||
// The IR annotation is just "ir" because the sink of the unitialized source at | ||
// 428:7 is value of `local`, but the sink of the source from `intPointerSource` | ||
// value of `*local` (i.e., the indirection node of `local`). So unlike AST dataflow, | ||
// each of these two sinks correspond to a unique source, and thus we don't need to | ||
// attach a location annotation to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still provide the location annotations to guarantee that if these get conflated we'll have a test failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea. I think this requires adding another tag though, right (since I don't think we can have tags with optional values)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to add this in another PR if that's okay with you, though (since this PR is already three PRs 😭).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 428:7
on l. 428 is incorrect now, because this comment was added.
I do have some trouble wrapping my head around the reasoning here. My expectation for int local[1]
would be that the array is initialised, but the elements of the array are not, but that's apparently not the way to think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you mean, @jketema. Given the following program:
void intPointerSource(int *ref_source);
void sink(int);
void sink(const int *);
void intPointerSourceCaller2() {
int local[1];
intPointerSource(local);
sink(local);
sink(*local);
}
You're saying that you'd expect:
- Flow from an uninitialized node (representing the elements of the array) to
sink(*local)
, and - Flow from the elements of the array after leaving
intPointerSource
tosink(*local)
.
Correct?
And what we're currently getting is:
- Flow from an initialized node (representing the array) to
sink(local)
, and - Flow from the elements of the array after leaving
intPointerSource
tosink(*local)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the AST library does:
- Flow from an initialized node to
sink(local)
, and tosink(*local)
- Flow from the elements of the array after leaving
intPointerSource
tosink(local)
, and tosink(*local)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct?
Yes.
It looks like the AST library does:
I was assuming this is due to conflation problems the AST library has. Or does the AST dataflow library indicate that there may be usability reasons for doing this? I can imagine that it might make sense to consider int local[1]
to be uninitialised for ease writing some of the sources/sinks that involve arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that UninitializedNode
should probably be handled differently to work for array variables. I think this should be solved in a separate PR, though.
ff5f991
to
29f4b26
Compare
I just force-pushed away a commit I added by mistake. Sorry about that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two typos.
Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
349c5cd
into
github:mathiasvp/replace-ast-with-ir-use-usedataflow
On the use-use-flow feature-branch we were getting spurious flow like:
when the
source
dataflow node was defined assource.asParameter().hasName("source")
. This turns out to be becausesource.asParameter()
defaults tosource.asParameter(0)
, which means that the value ofsource
was marked as the source instead of the value of*source
(if we pretend that the source is of a pointer type instead of a reference type).Ideally, the fix in 5e9e7da should be enough to fix this. However, the first commit is a necessary precondition for this. This first commit fixes two things:
adjacentDefRead
: Previously, when we were flowing from one def/use to an adjacent use we did the following:which amounts to saying:
v
is at position(bb1, i1)
(bb1, i1)
and(bb2, i2)
v
is at position(bb2, i2)
.However, note that point 2 doesn't talk about which variable is participating in the adjacent-def/use-use relationship. So in our example above, we'd have a use-use relation between
source
(note: not*source
) insource = 0
andsource
insink(source)
. And sincesource
and*source
are located at the same index in the same block we'd get use-use flow between*source
insource = 0
and*source
insink(source)
.The fix is very simple: instead of ignoring the first column of
adjacentDefRead
we simply check that the variable in question is actuallyv
.