Skip to content

C++: Operands as IPA types #352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 1, 2018
Merged

Conversation

dave-bartolomeo
Copy link
Contributor

@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new Operand IPA type that represents an operand of an instruction. OperandTag still exists, but is now an internal type used only in the IR implementation.

I'll post performance numbers when I have them.

@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new `Operand` IPA type that represents an operand of an instruction. `OperandTag` still exists, but is now an internal type used only in the IR implementation.
@jbj
Copy link
Contributor

jbj commented Oct 24, 2018

I think this is an improvement overall, but it doesn't answer the question I have on #251, the port of the Java sign analysis library.

When we port existing libraries to the IR, we need an answer for what the IR equivalent of Expr is in almost every single predicate we translate. I thought initially that it was going to be Instruction, but #251 convinced me that Instruction is too different from Expr: it's possible to have more than one consumer of an Instruction, while an Expr has at most one consumer. This makes a difference in libraries like sign analysis, where there may be multiple uses of an SSA variable x under different guards (if (x > 0) f(x) else g(x)).

The new Operand type is close to being a replacement for Expr. As far as I can tell, it would correspond to Expr if it included instructions that were not used as operands anywhere. In other words, TOperand would get a third constructor TDiscardedValue(Instruction defInstr) { not defInstr = Construction::getInstructionOperand(_, _) }. Then we should name the type something else than Operand, but I can't think of a good name. Maybe IRExpr for consistency with IRVariable etc.

At the other end of the spectrum we have ValueNumber, where any value number can have multiple consumers, not just memory operands like for Instruction. That leaves me wondering whether an analysis should ever use Instruction, which sits in the middle and has all the complications of Operand and ValueNumber but none of their good properties. But the IR libraries are written such that everything goes through Instruction, so it's hard to avoid it in higher-level analyses. Could we go so far as to turn Instruction into an implementation detail and expose the IR API in terms of Operand and ValueNumber only? Equivalently, we could use global value numbering to implement common subexpression elimination for Instruction so there would be no ValueNumber type.

@dave-bartolomeo
Copy link
Contributor Author

@jbj I think your question boils down to "how do we handle inference of new information about a value due to control flow". One approach that works for traditional compilers is to insert a new artificial definition of a value whenever you might learn something about that value due to control flow:

int* p = foo();
if (p != nullptr) {
  return *p;
}

becomes

int* p1 = foo();
if (p1 != nullptr) {
  int* p2 = infer(p1);
  return *p2;
}

The infer operator is semantically equivalent to assignment, but because it provides a new definition of p, it gives any analysis a place to attach information about what is known about the value of p at that point in the program (in this case, the fact that p is non-null).

Then, you want to handle cases like this:

int* p = foo();
int* q = p;
if (p != nullptr) {
  return *q;
}

Here, you'd like to infer the fact that q is non-null based on the test p != nullptr. Performing CSE before inserting inference operators gives you something like this:

int* temp1 = foo();
int* p = temp1;
int* q = temp1;
if (temp1 != nullptr) {
  int* temp2 = infer(temp1);
  return *temp2;
}

This makes the inference work correctly, but has the drawback that it becomes more challenging to get back to the original code in order to report a good alert message, because everything is in terms of the temps introduced by CSE.
Perhaps we can avoid this problem by leaving the original instructions intact, and instead overlay an alternate SSA graph where everything is in terms of ValueNumber rather than Instruction. The analysis would be done on this "value graph", but we'd still have the original underlying IR so we could report better messages.
It also may be that we can perform CSE directly on the IR, as long as we are able to preserve the mapping from each Instruction back to a unique AST node.

If we don't insert inference operators, we can associate state with individual Operands, to represent what we know about the value at the point of use. However, that requires the analysis to worry about figuring out which uses are guarded by which guards, rather than letting SSA construction figure it out.

@rdmarsh2
Copy link
Contributor

Perhaps we can avoid this problem by leaving the original instructions intact, and instead overlay an alternate SSA graph where everything is in terms of ValueNumber rather than Instruction. The analysis would be done on this "value graph", but we'd still have the original underlying IR so we could report better messages.

I'm in favor of this approach - we'll want to work with ValueNumber for bounds checks anyway, so integrating the inference nodes there will simplify handling of implications between guards.

It also may be that we can perform CSE directly on the IR, as long as we are able to preserve the mapping from each Instruction back to a unique AST node.

For the AST side of IR-based libraries, it's necessary that each Expression can be mapped to an Instruction that produces its result. I think it will be hard to preserve both properties while doing CSE on Instructions.

jonas-semmle
jonas-semmle previously approved these changes Oct 24, 2018
Copy link
Contributor

@jonas-semmle jonas-semmle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach is sound, and I'll leave the detailed reviewing/merging to @rdmarsh2. We can continue the discussion about CSE and Expr-correspondence over Hangouts.

Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things, but overall it looks good.

@jbj
Copy link
Contributor

jbj commented Oct 29, 2018

I'll post performance numbers when I have them.

How does the IR perform after these changes, @dave-bartolomeo?

@dave-bartolomeo
Copy link
Contributor Author

@jbj Initial runs over @geoffw0's AQTF snapshot show performance unchanged, but I'll run on a bigger snapshot today.

@dave-bartolomeo
Copy link
Contributor Author

@jbj Running constant-func.ql over comdb2 shows a slowdown of about 4%, which seems acceptable. Looking at the evaluator logs doesn't show any hot spots. We just have to create a few million Operand objects, and that take a little time.

@jbj jbj merged commit ea601b2 into github:master Nov 1, 2018
aibaars added a commit that referenced this pull request Oct 14, 2021
Add missing `DataFlowImpl2.qll` entry to `identical-files.json`
smowton pushed a commit to smowton/codeql that referenced this pull request Apr 16, 2022
Kotlin: Use -Xopt-in=kotlin.RequiresOptIn when compiling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants