Skip to content

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Mar 4, 2021

Copy/paste from my comment here:


While looking through the CPP-differences changes for #3364 I found another source of field -> object taint in simpleInstructionLocalFlowStep in DataFlowUtil:

private predicate simpleInstructionLocalFlowStep(Operand opFrom, Instruction iTo) {
  // ...
  exists(ChiInstruction chi | chi = iTo |
      opFrom.getAnyDef() instanceof WriteSideEffectInstruction and
      chi.getPartialOperand() = opFrom and
      not chi.isResultConflated()
    )
  // ..
}

This rule ensures that we get flow from the WriteSideEffect after calling set_value to the argument to sink in this example:

void set_value(int*);

void test() {
  int n;
  set_value(&n);
  sink(n);
}

however, it also transfers flow from the WriteSideEffect to the sink argument in the following case:

void test() {
  struct { int field1, field2; } obj;
  set_value(&obj->field1);
  sink(obj);
}

this pattern gave rise to some field conflation on cpp/uncontrolled-allocation-size on OpenJDK and on cpp/unbounded-write on vim.

The fix is to restrict the rule in simpleInstructionLocalFlowStep to only propagate flow from the WriteSideEffect to the Chi when we know that the Chi isn't only a partial update. We should be able to use the getUpdatedInterval predicate on the ChiInstruction for this.


I was right about everything except for the very last line. It's not so easy to use getUpdatedInterval to detect whether the entire memory location is updated since (AFAIK) it's not possible to get the type of the memory location without digging into SSA. So I added a new predicate on ChiInstruction that forwards to a new predicate in SSAConstruction.qll which then does the actual work.

Unfortunately, we lose a couple of results in dataflow. This is due to field flow not looking "deep enough" through the destination address on write side effects (i.e., we only look check whether the destination instruction is a FieldAddress or not, but we should traverse through the entire chain of instructions like in #5127). So simpleInstructionLocalFlowStep expects field flow to handle this case, but it currently doesn't.

@dbartol / @rdmarsh2 I'd appreciate your thoughts on the first commit (i.e. 200d947).

(Part of https://github.com/github/codeql-c-analysis-team/issues/103.)

@MathiasVP MathiasVP added the C++ label Mar 4, 2021
@MathiasVP MathiasVP requested review from a team as code owners March 4, 2021 15:07
@github-actions github-actions bot added the C# label Mar 4, 2021
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Mar 4, 2021
@MathiasVP MathiasVP changed the title C++: Remove more field to object flow C++: Remove more field-to-object flow Mar 4, 2021
Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case where this improves the results, or is it only to make #3364 work correctly?

@@ -178,6 +178,22 @@ private module Cached {
)
}

/**
* Holds if the `ChiPartialOperand` totally, but not exactly, overlaps with the `ChiTotalOperand`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "partially overlaps with", I believe. A "total" overlap overwrites all bits but could be a larger write or with a different type, an "exact" overlap writes with the same type and overwrites all bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've fixed this in c86fc22.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Mar 5, 2021

Do we have a test case where this improves the results, or is it only to make #3364 work correctly?

Fixed in bd84240. I'm surprised the flow to sink(s.x) isn't caught, but at least it's not a regression (I tested it on main and for some reason, we don't catch it there either).

…esLocation when Alias::getEndBitOffset doesn't have known value.
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll merge this to unblock further work. @dbartol, do come back with a review if you have any reservations -- there is still time to change the interface or the implementation.

@jbj jbj merged commit 32f1da7 into github:main Mar 5, 2021
@MathiasVP
Copy link
Contributor Author

I forgot to link to the CPP-differences results. Here they are in case we need to look at them again later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C# C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants