C++: Use-use flow through global variables by MathiasVP · Pull Request #11171 · github/codeql

MathiasVP · 2022-11-08T17:17:23Z

This PR fixes global variable flow on the use-use feature branch. For an example like:

int global;
void setGlobal() {
  global = source();
}

void readGlobal() {
  sink(global);
}

we will model this like:

int global;
void setGlobal() {
  global = source();
  // implicit use of `global` inserted here
}

void readGlobal() {
  // implicit def of `global` inserted here
  sink(global);
}

and we'll have flow like:

source() -> global -> implicit use of global -> VariableNode(global) -> implicit def of global -> global

I also did a drive-by fix for getLocation on VariableNodes since I was seeing new consistency failures because we now allocate more VariableNodes

…bal-flow

…e-usedataflow' into global-flow Resolved trivial conflicts.

…bal-flow

jketema

So if I understand this correctly, instead of having jump steps at arbitrary places in functions - which is how I understood the old code - they now just occur at the beginning and end of functions.

If the above if correct, this is probably a good approximation for now. I guess in general this is probably not completely correct, as we might be analysing a multi-threaded application. Assuming there are no data races (which is a separate problem), I guess we could always handle those in the future by introducing additional jump steps at places where synchronisation takes place.

jketema · 2023-02-03T10:34:24Z

+  int getIndirectionIndex() { result = indirectionIndex }
+
+  /** Holds if this definition or use has index `index` in block `block`. */
+  final predicate hasIndexInBlock(IRBlock block, int index) {


Should this also have an override? And can the QLDoc be dropped (as it is an override)?

It's actually not an override because this class doesn't extend DefImpl. If GlobalDef did extend DefImpl then your proposal is very much correct, but because a DefImpl currently has to have an Operand representing the address of the definition a GlobalDef cannot be a DefImpl (since a GlobalDef is a kind of synthesized definition whose address isn't present in the IR).

The Use side of this has been refactored in a previous PR to handle this situation: We have a UseImpl class, and we have a OperandBasedUse to handle the common case that a Use has an operand that represents the address directly in the IR. But (unlike a DefImpl) a UseImpl doesn't have to be OperandBasedUse, and that's why the GlobalUse can extend UseImpl without requiring an Operand.

We could do a similar refactoring for DefImpl (i.e.,
remove the requirement that DefImpl has to have an Operand, and create a OperandBasedDef class and let most of the SSA definitions extend that class. Then we could have GlobalDef just extend the DefImpl class). It shouldn't be a difficult refactoring to do. I just chose to not do it in this PR to keep the diff as small as possible. But I'd be happy to do that refactoring if it helps you review the PR.

Oh, right, I was mistakenly under the impression that it was extending more than TGlobalDef.

No need to do the refactoring here.

jketema · 2023-02-03T10:35:01Z

-    variableWriteCand(bb, i, v) and
+    (
+      variableWriteCand(bb, i, v) or
+      sourceVariableIsGlobal(v, _, _, _)


Why is this change needed?

Oh, that's a good point. I should add a comment to this. The reason is as follows:

variableWriteCand restricts the set of writes in the second SSA phase to those writes that target variables that are determined to be live by the first SSA phase. However, the first SSA phase doesn't know about global variables (as you can see, nothing in this PR touches the ssa0/SsaInternals.qll file). So it may be the case that a write to a global variable is determined to not be live by the first SSA phase (because there's no synthetic "final use of the global variable" at the end of the function body).

Now, there are two ways of handling this situation:

Teach ssa0/SsaInternals.qll about global variables (i.e., by doing the same transformation this PR does, but also do it in the ssa0/SsaInternals.qll file), or

Exclude global variables from the pruning by saying "either the variable isn't pruned away in the first SSA phase, or it's a global variable (like we do in this PR).

The reason I chose the second approach is that I don't expect the first phase to prune much (if anything) away in the case of global variables (since, if you write to a global variable then there's a very very high chance that you also read it somewhere in the database). So it wouldn't really prune anything away, and it would only increase the amount of data the first SSA phase is working with.

Does that make sense?

Ok, so that I mostly got from other parts of the code. What confuses me here is that we both have this addition and the following just a bit lower:

or exists(GlobalDef global | global.hasIndexInBlock(bb, i, v) and certain = true )

Why isn't it sufficient to just have this latter bit?

Oh! I think you're right 😂. Let me double check that I can remove that sourceVariableIsGlobal disjunct.

Oh, wait. No, I think it's actually needed. Basically, there are two kinds of writes to global variables:

An actual write to the variable in the body of the function. That's this part:

exists(DefImpl def | def.hasIndexInBlock(bb, i, v) | if def.isCertain() then certain = true else certain = false )

A synthesizes initial definition of the global variable. That's this part:

exists(GlobalDef global | global.hasIndexInBlock(bb, i, v) and certain = true )

Consider a program like:

void write() { global = 42; } void read() { use(global); }

the write to global is an instance of DefImpl, and so hits the first case. But we don't want to prune away that write (since there is a read of the variable. Just not in the same callable).

Thanks for the explanation. That clarifies things.

jketema · 2023-02-03T10:37:28Z

+    not exists(unique( | | v.getLocation())) and
+    result instanceof UnknownDefaultLocation


Not for this PR (and I don't know how relevant this is for the user), but if this is about parameters of functions shouldn't we just pick the one associated with the actual definition of the function (and not of any of its declarations)?

Yes, that's probably a much better approach. I'll create an issue for this.

MathiasVP · 2023-02-03T11:31:21Z

So if I understand this correctly, instead of having jump steps at arbitrary places in functions - which is how I understood the old code - they now just occur at the beginning and end of functions.

Exactly. And the new way is really the right way to do it. If we do global flow that jumps to arbitrary places, we get FPs like:

int global;

void foo() {
  sanitizer(&global);
  sink(global); // <-- flow jumps directly from source() to sink(global) without considering the sanitizer.
}

void bar() {
  global = source();
}

IIRC, we had to exclude global variables because of this in some query exactly because of this reason, but I can't seem to find the PR (nor the query) were we had to do this 🤔.

MathiasVP · 2023-02-03T11:32:15Z

If the above if correct, this is probably a good approximation for now. I guess in general this is probably not completely correct, as we might be analysing a multi-threaded application. Assuming there are no data races (which is a separate problem), I guess we could always handle those in the future by introducing additional jump steps at places where synchronisation takes place.

That's an excellent point, yes. I imagine this would be a good approach to handling multi-threaded dataflow in the future.

jketema · 2023-02-03T19:29:16Z

I hadn't approve this one yet, although I'm fine with it 😄 . I assume we'll have a small follow-up PR that fixes the "spurious" issue we discussed internally.

MathiasVP · 2023-02-03T19:49:38Z

Oh, I'm sorry. I had somehow convinced myself that you'd approved it 😀. Let's do it as a follow up, indeed.

jketema · 2023-02-03T20:03:03Z

No worries, as I had already told you I was fine with it except for the test case you added.

C++: Add use-use flow through global variables.

f19b381

github-actions Bot added the C++ label Nov 8, 2022

MathiasVP and others added 5 commits January 11, 2023 15:42

Merge branch 'mathiasvp/replace-ast-with-ir-use-usedataflow' into glo…

cd24405

…bal-flow

C++: Better 'getType' for global variable nodes.

8c22442

C++: Fix global variable exclusion in DTT.

ee62f2a

Merge remote-tracking branch 'origin/mathiasvp/replace-ast-with-ir-us…

3648f26

…e-usedataflow' into global-flow Resolved trivial conflicts.

C++: fix UseImpl after merge conflict

6a91e85

MathiasVP commented Jan 27, 2023

View reviewed changes

Comment thread cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaInternals.qll Outdated

MathiasVP added 3 commits February 1, 2023 13:24

C++: Small cleanup by making 'GlobalUse' extend 'UseImpl'.

136b5d1

C++: Accept test changes. These all appear to be good changes.

0e1dcc8

Merge branch 'mathiasvp/replace-ast-with-ir-use-usedataflow' into glo…

702b10f

…bal-flow

MathiasVP marked this pull request as ready for review February 2, 2023 09:22

MathiasVP requested a review from a team as a code owner February 2, 2023 09:22

MathiasVP added the no-change-note-required This PR does not need a change note label Feb 2, 2023

C++: QLDoc.

b53963a

MathiasVP force-pushed the global-flow branch from 88298e2 to b53963a Compare February 2, 2023 11:49

jketema reviewed Feb 3, 2023

View reviewed changes

C++: Add a test with an indirect source.

ae774a6

MathiasVP merged commit 4317381 into github:mathiasvp/replace-ast-with-ir-use-usedataflow Feb 3, 2023

jketema mentioned this pull request Feb 10, 2023

C++: Do not mark global indirect flow as spurious in dataflow tests #12146

Merged

		not exists(unique( \| \| v.getLocation())) and
		result instanceof UnknownDefaultLocation

Conversation

MathiasVP commented Nov 8, 2022

Uh oh!

Uh oh!

jketema left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jketema Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MathiasVP commented Feb 3, 2023

Uh oh!

MathiasVP commented Feb 3, 2023

Uh oh!

jketema commented Feb 3, 2023

Uh oh!

MathiasVP commented Feb 3, 2023

Uh oh!

jketema commented Feb 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jketema Feb 3, 2023 •

edited

Loading