C++: Operands as IPA types #352

dave-bartolomeo · 2018-10-23T21:21:03Z

@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new Operand IPA type that represents an operand of an instruction. OperandTag still exists, but is now an internal type used only in the IR implementation.

I'll post performance numbers when I have them.

@rdmarsh2

@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new `Operand` IPA type that represents an operand of an instruction. `OperandTag` still exists, but is now an internal type used only in the IR implementation.

jbj · 2018-10-24T07:09:56Z

I think this is an improvement overall, but it doesn't answer the question I have on #251, the port of the Java sign analysis library.

When we port existing libraries to the IR, we need an answer for what the IR equivalent of Expr is in almost every single predicate we translate. I thought initially that it was going to be Instruction, but #251 convinced me that Instruction is too different from Expr: it's possible to have more than one consumer of an Instruction, while an Expr has at most one consumer. This makes a difference in libraries like sign analysis, where there may be multiple uses of an SSA variable x under different guards (if (x > 0) f(x) else g(x)).

The new Operand type is close to being a replacement for Expr. As far as I can tell, it would correspond to Expr if it included instructions that were not used as operands anywhere. In other words, TOperand would get a third constructor TDiscardedValue(Instruction defInstr) { not defInstr = Construction::getInstructionOperand(_, _) }. Then we should name the type something else than Operand, but I can't think of a good name. Maybe IRExpr for consistency with IRVariable etc.

At the other end of the spectrum we have ValueNumber, where any value number can have multiple consumers, not just memory operands like for Instruction. That leaves me wondering whether an analysis should ever use Instruction, which sits in the middle and has all the complications of Operand and ValueNumber but none of their good properties. But the IR libraries are written such that everything goes through Instruction, so it's hard to avoid it in higher-level analyses. Could we go so far as to turn Instruction into an implementation detail and expose the IR API in terms of Operand and ValueNumber only? Equivalently, we could use global value numbering to implement common subexpression elimination for Instruction so there would be no ValueNumber type.

dave-bartolomeo · 2018-10-24T16:33:37Z

@jbj I think your question boils down to "how do we handle inference of new information about a value due to control flow". One approach that works for traditional compilers is to insert a new artificial definition of a value whenever you might learn something about that value due to control flow:

int* p = foo();
if (p != nullptr) {
  return *p;
}

becomes

int* p1 = foo();
if (p1 != nullptr) {
  int* p2 = infer(p1);
  return *p2;
}

The infer operator is semantically equivalent to assignment, but because it provides a new definition of p, it gives any analysis a place to attach information about what is known about the value of p at that point in the program (in this case, the fact that p is non-null).

Then, you want to handle cases like this:

int* p = foo();
int* q = p;
if (p != nullptr) {
  return *q;
}

Here, you'd like to infer the fact that q is non-null based on the test p != nullptr. Performing CSE before inserting inference operators gives you something like this:

int* temp1 = foo();
int* p = temp1;
int* q = temp1;
if (temp1 != nullptr) {
  int* temp2 = infer(temp1);
  return *temp2;
}

This makes the inference work correctly, but has the drawback that it becomes more challenging to get back to the original code in order to report a good alert message, because everything is in terms of the temps introduced by CSE.
Perhaps we can avoid this problem by leaving the original instructions intact, and instead overlay an alternate SSA graph where everything is in terms of ValueNumber rather than Instruction. The analysis would be done on this "value graph", but we'd still have the original underlying IR so we could report better messages.
It also may be that we can perform CSE directly on the IR, as long as we are able to preserve the mapping from each Instruction back to a unique AST node.

If we don't insert inference operators, we can associate state with individual Operands, to represent what we know about the value at the point of use. However, that requires the analysis to worry about figuring out which uses are guarded by which guards, rather than letting SSA construction figure it out.

rdmarsh2 · 2018-10-24T18:30:33Z

Perhaps we can avoid this problem by leaving the original instructions intact, and instead overlay an alternate SSA graph where everything is in terms of ValueNumber rather than Instruction. The analysis would be done on this "value graph", but we'd still have the original underlying IR so we could report better messages.

I'm in favor of this approach - we'll want to work with ValueNumber for bounds checks anyway, so integrating the inference nodes there will simplify handling of implications between guards.

It also may be that we can perform CSE directly on the IR, as long as we are able to preserve the mapping from each Instruction back to a unique AST node.

For the AST side of IR-based libraries, it's necessary that each Expression can be mapped to an Instruction that produces its result. I think it will be hard to preserve both properties while doing CSE on Instructions.

jonas-semmle

I think the overall approach is sound, and I'll leave the detailed reviewing/merging to @rdmarsh2. We can continue the discussion about CSE and Expr-correspondence over Hangouts.

rdmarsh2

A few small things, but overall it looks good.

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/Operand.qll

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/AliasAnalysis.qll

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/SSAConstruction.qll

jbj · 2018-10-29T09:38:08Z

I'll post performance numbers when I have them.

How does the IR perform after these changes, @dave-bartolomeo?

dave-bartolomeo · 2018-10-29T17:15:37Z

@jbj Initial runs over @geoffw0's AQTF snapshot show performance unchanged, but I'll run on a bigger snapshot today.

dave-bartolomeo · 2018-10-31T19:03:23Z

@jbj Running constant-func.ql over comdb2 shows a slowdown of about 4%, which seems acceptable. Looking at the evaluator logs doesn't show any hot spots. We just have to create a few million Operand objects, and that take a little time.

Add missing `DataFlowImpl2.qll` entry to `identical-files.json`

Kotlin: Use -Xopt-in=kotlin.RequiresOptIn when compiling

dave-bartolomeo added the C++ label Oct 23, 2018

dave-bartolomeo requested review from jbj and rdmarsh2 October 23, 2018 21:21

dave-bartolomeo force-pushed the dave/Operands branch from 4ee405a to f278f4f Compare October 23, 2018 22:15

jonas-semmle previously approved these changes Oct 24, 2018

View reviewed changes

rdmarsh2 suggested changes Oct 24, 2018

View reviewed changes

C++: Rename a couple predicates based on PR feedback

459b05d

dave-bartolomeo dismissed jonas-semmle’s stale review via 459b05d October 26, 2018 21:38

rdmarsh2 approved these changes Oct 26, 2018

View reviewed changes

jbj merged commit ea601b2 into github:master Nov 1, 2018

aibaars added a commit that referenced this pull request Oct 14, 2021

Merge pull request #352 from github/hvitved/dataflowimpl2-sync

dc8399f

Add missing `DataFlowImpl2.qll` entry to `identical-files.json`

smowton pushed a commit to smowton/codeql that referenced this pull request Apr 16, 2022

Merge pull request github#352 from github/igfoo/RequiresOptIn

9f20fd4

Kotlin: Use -Xopt-in=kotlin.RequiresOptIn when compiling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Operands as IPA types #352

C++: Operands as IPA types #352

Uh oh!

dave-bartolomeo commented Oct 23, 2018

Uh oh!

jbj commented Oct 24, 2018

Uh oh!

dave-bartolomeo commented Oct 24, 2018

Uh oh!

rdmarsh2 commented Oct 24, 2018

Uh oh!

jonas-semmle left a comment

Uh oh!

rdmarsh2 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbj commented Oct 29, 2018

Uh oh!

dave-bartolomeo commented Oct 29, 2018

Uh oh!

dave-bartolomeo commented Oct 31, 2018

Uh oh!

Uh oh!

C++: Operands as IPA types #352

C++: Operands as IPA types #352

Uh oh!

Conversation

dave-bartolomeo commented Oct 23, 2018

Uh oh!

jbj commented Oct 24, 2018

Uh oh!

dave-bartolomeo commented Oct 24, 2018

Uh oh!

rdmarsh2 commented Oct 24, 2018

Uh oh!

jonas-semmle left a comment

Choose a reason for hiding this comment

Uh oh!

rdmarsh2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbj commented Oct 29, 2018

Uh oh!

dave-bartolomeo commented Oct 29, 2018

Uh oh!

dave-bartolomeo commented Oct 31, 2018

Uh oh!

Uh oh!