C++: Use-use flow through global variables#11171
C++: Use-use flow through global variables#11171MathiasVP merged 11 commits intogithub:mathiasvp/replace-ast-with-ir-use-usedataflowfrom
Conversation
…e-usedataflow' into global-flow Resolved trivial conflicts.
88298e2 to
b53963a
Compare
jketema
left a comment
There was a problem hiding this comment.
So if I understand this correctly, instead of having jump steps at arbitrary places in functions - which is how I understood the old code - they now just occur at the beginning and end of functions.
If the above if correct, this is probably a good approximation for now. I guess in general this is probably not completely correct, as we might be analysing a multi-threaded application. Assuming there are no data races (which is a separate problem), I guess we could always handle those in the future by introducing additional jump steps at places where synchronisation takes place.
| int getIndirectionIndex() { result = indirectionIndex } | ||
|
|
||
| /** Holds if this definition or use has index `index` in block `block`. */ | ||
| final predicate hasIndexInBlock(IRBlock block, int index) { |
There was a problem hiding this comment.
Should this also have an override? And can the QLDoc be dropped (as it is an override)?
There was a problem hiding this comment.
It's actually not an override because this class doesn't extend DefImpl. If GlobalDef did extend DefImpl then your proposal is very much correct, but because a DefImpl currently has to have an Operand representing the address of the definition a GlobalDef cannot be a DefImpl (since a GlobalDef is a kind of synthesized definition whose address isn't present in the IR).
The Use side of this has been refactored in a previous PR to handle this situation: We have a UseImpl class, and we have a OperandBasedUse to handle the common case that a Use has an operand that represents the address directly in the IR. But (unlike a DefImpl) a UseImpl doesn't have to be OperandBasedUse, and that's why the GlobalUse can extend UseImpl without requiring an Operand.
We could do a similar refactoring for DefImpl (i.e.,
remove the requirement that DefImpl has to have an Operand, and create a OperandBasedDef class and let most of the SSA definitions extend that class. Then we could have GlobalDef just extend the DefImpl class). It shouldn't be a difficult refactoring to do. I just chose to not do it in this PR to keep the diff as small as possible. But I'd be happy to do that refactoring if it helps you review the PR.
There was a problem hiding this comment.
Oh, right, I was mistakenly under the impression that it was extending more than TGlobalDef.
There was a problem hiding this comment.
No need to do the refactoring here.
| variableWriteCand(bb, i, v) and | ||
| ( | ||
| variableWriteCand(bb, i, v) or | ||
| sourceVariableIsGlobal(v, _, _, _) |
There was a problem hiding this comment.
Oh, that's a good point. I should add a comment to this. The reason is as follows:
variableWriteCand restricts the set of writes in the second SSA phase to those writes that target variables that are determined to be live by the first SSA phase. However, the first SSA phase doesn't know about global variables (as you can see, nothing in this PR touches the ssa0/SsaInternals.qll file). So it may be the case that a write to a global variable is determined to not be live by the first SSA phase (because there's no synthetic "final use of the global variable" at the end of the function body).
Now, there are two ways of handling this situation:
- Teach
ssa0/SsaInternals.qllabout global variables (i.e., by doing the same transformation this PR does, but also do it in thessa0/SsaInternals.qllfile), or - Exclude global variables from the pruning by saying "either the variable isn't pruned away in the first SSA phase, or it's a global variable (like we do in this PR).
The reason I chose the second approach is that I don't expect the first phase to prune much (if anything) away in the case of global variables (since, if you write to a global variable then there's a very very high chance that you also read it somewhere in the database). So it wouldn't really prune anything away, and it would only increase the amount of data the first SSA phase is working with.
Does that make sense?
There was a problem hiding this comment.
Ok, so that I mostly got from other parts of the code. What confuses me here is that we both have this addition and the following just a bit lower:
or
exists(GlobalDef global |
global.hasIndexInBlock(bb, i, v) and
certain = true
)Why isn't it sufficient to just have this latter bit?
There was a problem hiding this comment.
Oh! I think you're right 😂. Let me double check that I can remove that sourceVariableIsGlobal disjunct.
There was a problem hiding this comment.
Oh, wait. No, I think it's actually needed. Basically, there are two kinds of writes to global variables:
- An actual write to the variable in the body of the function. That's this part:
exists(DefImpl def | def.hasIndexInBlock(bb, i, v) |
if def.isCertain() then certain = true else certain = false
)- A synthesizes initial definition of the global variable. That's this part:
exists(GlobalDef global |
global.hasIndexInBlock(bb, i, v) and
certain = true
)Consider a program like:
void write() {
global = 42;
}
void read() {
use(global);
}the write to global is an instance of DefImpl, and so hits the first case. But we don't want to prune away that write (since there is a read of the variable. Just not in the same callable).
There was a problem hiding this comment.
Thanks for the explanation. That clarifies things.
| not exists(unique( | | v.getLocation())) and | ||
| result instanceof UnknownDefaultLocation |
There was a problem hiding this comment.
Not for this PR (and I don't know how relevant this is for the user), but if this is about parameters of functions shouldn't we just pick the one associated with the actual definition of the function (and not of any of its declarations)?
There was a problem hiding this comment.
Yes, that's probably a much better approach. I'll create an issue for this.
Exactly. And the new way is really the right way to do it. If we do global flow that jumps to arbitrary places, we get FPs like: int global;
void foo() {
sanitizer(&global);
sink(global); // <-- flow jumps directly from source() to sink(global) without considering the sanitizer.
}
void bar() {
global = source();
}IIRC, we had to exclude global variables because of this in some query exactly because of this reason, but I can't seem to find the PR (nor the query) were we had to do this 🤔. |
That's an excellent point, yes. I imagine this would be a good approach to handling multi-threaded dataflow in the future. |
4317381
into
github:mathiasvp/replace-ast-with-ir-use-usedataflow
|
I hadn't approve this one yet, although I'm fine with it 😄 . I assume we'll have a small follow-up PR that fixes the "spurious" issue we discussed internally. |
|
Oh, I'm sorry. I had somehow convinced myself that you'd approved it 😀. Let's do it as a follow up, indeed. |
|
No worries, as I had already told you I was fine with it except for the test case you added. |
This PR fixes global variable flow on the use-use feature branch. For an example like:
we will model this like:
and we'll have flow like:
I also did a drive-by fix for
getLocationonVariableNodes since I was seeing new consistency failures because we now allocate moreVariableNodes