Skip to content

Conversation

dbartol
Copy link

@dbartol dbartol commented Jun 3, 2020

Strongly suggest reviewing commit-by-commit
C# is currently broken. Will be fixed before I make this draft ready for review

Main changes

Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the TInstruction type for the next stage. This causes use to create roughly three times as many TInstruction objects as we actually need.

Now that IPA union types are supported in the compiler, we can share a single TInstruction IPA type across stages. We create a single TInstruction IPA type, with individual branches of this type for instructions created directly from the AST (TRawInstruction) and for instructions added by each stage of SSA construction (T*PhiInstruction, T*ChiInstruction, T*UnreachedInstruction). Each stage then defines a TStageInstruction type that is a union of all of the branches that can appear in that particular stage. The public Instruction class for each phase extends the TStageInstruction type for that stage.

The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:

  • The Raw module, exposed only by the original IR construction stage. This module identifies which functions have IR, which TRawInstructions exist, and which IRVariables exist.
  • The SSA module, exposed only by the two SSA construction stages. This identifiers which Phi, Chi, and Unreached instructions exist.
  • The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of MemoryOperands.

Similarly, there is now a single TIRFunction IPA type that is shared across all three stages. There is a single IRFunctionBase class that exposes the stage-indepdendent predicates; the IRFunction class for each stage extends IRFunctionBase.

Most of the other changes are largely mechanical.

Subsequent changes

Now that TInstruction is shared between IR stages, several of the per-stage IR construction predicates can now be moved into the Raw interface exposed only by the initial construction of IR from the ASTs. This also removed a couple predicates that were not used previously at all.

Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.

Performance tuning

The initial version of this PR slowed down RedundantNullCheckSimple.ql on Wireshark and ChakraCore by ~10%. Most of this slowdown was due to various large and expensive relations being recomputed in multiple stages (especially tuple numbering). I've forced the entire IR (all three phases) to be evaluated in the same phase, which gets back all of the lost performance plus a few percent more. I also made some small changes to ensure that the commonly-used predicates on Instruction are just trivial wrappers around cached predicates in the IR/SSA construction code.

Dave Bartolomeo added 3 commits June 1, 2020 11:15
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the `TInstruction` type for the next stage. This causes use to create roughly three times as many `TInstruction` objects as we actually need.

Now that IPA union types are supported in the compiler, we can share a single `TInstruction` IPA type across stages. We create a single `TInstruction` IPA type, with individual branches of this type for instructions created directly from the AST (`TRawInstruction`) and for instructions added by each stage of SSA construction (`T*PhiInstruction`, `T*ChiInstruction`, `T*UnreachedInstruction`). Each stage then defines a `TStageInstruction` type that is a union of all of the branches that can appear in that particular stage. The public `Instruction` class for each phase extends the `TStageInstruction` type for that stage.

The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:
- The `Raw` module, exposed only by the original IR construction stage. This module identifies which functions have IR, which `TRawInstruction`s exist, and which `IRVariable`s exist.
- The `SSA` module, exposed only by the two SSA construction stages. This identifiers which `Phi`, `Chi`, and `Unreached` instructions exist.
- The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of `MemoryOperand`s.

Similarly, there is now a single `TIRFunction` IPA type that is shared across all three stages. There is a single `IRFunctionBase` class that exposes the stage-indepdendent predicates; the `IRFunction` class for each stage extends `IRFunctionBase`.

Most of the other changes are largely mechanical.
Now that `TInstruction` is shared between IR stages, several of the per-stage IR construction predicates can now be moved into the `Raw` interface exposed only by the initial construction of IR from the ASTs. This also removed a couple predicates that were not used previously at all.
Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.
@dbartol dbartol added the C++ label Jun 3, 2020
@dbartol dbartol changed the title GitHub/codeql c analysis team/69 union C++: Share TInstruction across IR stages Jun 3, 2020
Dave Bartolomeo added 3 commits June 3, 2020 13:49
This updates C#'s IR to share `TInstruction` across stages the same way C++ does. The only interesting part is that, since we have not yet ported full alias analysis to C#, I stubbed out the required parts of the aliased SSA interface in `AliasedSSAStub.qll`.
@dbartol dbartol added the C# label Jun 3, 2020
Dave Bartolomeo added 10 commits June 3, 2020 16:10
Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage:

First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase.

Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.
There were a few places in the IR itself where we use `Instruction.getResultType()`, which returns the C++ `Type` of the result, instead of `Instruction.getResultIRType()`, which returns the language-neutral `IRType` of the result. By removing this usage, we can avoid evaluating `getResultType()` at all.

There are still other uses of `Instruction.getResultType()` in other libraries. We should switch those as well.
Most of the predicates on `Instruction` are thin wrappers around cached predicates in the `IRConstruction` or `SSAConstruction` modules. However, `getResultIRType()` has to join `Construction::getInstructionResultType()` with `LanguageType::getIRType()`. `getResultIRType()` is called frequently both within the IR code and by IR consumers, and that's a big join to have to repeat in multiple stages.

I looked at most of the other predicates in `Instruction.qll`, and didn't see any other predicates that met all of the criteria of "large, commonly called, and not already inline".
The four cached predicates used to access common properties of instructions took a `TStageInstruction` as a parameter. This requires the calling code, in `Instruction.qll`, to then join the results with `hasInstruction()` to filter out results for `TRawInstruction`s that were discarded as unreachable. By simply switching the parameter types to `Instruction`, we can force that join to happen in the cached predicate itself. This makes the various accessor predicates on `Instruction` trivially inlinable to the cached predicate, instead of being joins of two huge relations that might have to be recomputed in later stages.
The optimizer picked a terrible join order in `VirtualDispatch::DataSensitiveCall::flowsFrom()`. Telling it that `getAnOutNode()` has a unique result convinces it to join first on the `Callable`, rather than on the `ReturnKind`.
@dbartol
Copy link
Author

dbartol commented Jun 9, 2020

After fixing a big performance regression in virtual dispatch, I'm running what I hope will be my final CPP-Differences job. If that comes back without any regressions, this will be ready for review.

@dbartol dbartol marked this pull request as ready for review June 9, 2020 20:31
@dbartol dbartol requested review from a team as code owners June 9, 2020 20:31
@dbartol
Copy link
Author

dbartol commented Jun 9, 2020

CPP-Differences job came back with some pretty impressive (>15%) wins across the board in overall suite analysis time. A significant portion of that is likely to be from this PR accidentally getting rid of a caching issue that caused the IR to be reevaluated during dataflow analysis, but even the IR-only queries at least look no worse than they were before this change.

@jbj
Copy link
Contributor

jbj commented Jun 10, 2020

Lots of conflicts: the C# IR was moved to experimental.

@dbartol
Copy link
Author

dbartol commented Jun 11, 2020

Merge conflicts with the big C# move to experimental are now fixed.

@MathiasVP
Copy link
Contributor

MathiasVP commented Jun 12, 2020

I really like the changes! I think they all look good, the only thing I can't seem to understand is the small difference in instruction counts. I tried counting the number of instructions in git with this PR and with master:

Master:
PhiInstructions: 56452
ChiInstructions: 185212
Total:           1748874

This PR:
PhiInstructions: 56449
ChiInstructions: 185208
Total:           1748855

it's not a large difference, so I don't know if it's a sign of a bug somewhere. But do you know what the cause of this difference could be?

@dbartol
Copy link
Author

dbartol commented Jun 12, 2020

it's not a large difference, so I don't know if it's a sign of a bug somewhere. But do you know what the cause of this difference could be?

It could have something to do with how we handle multiple results for predicates that should always have a unique result. I'm not too worried, but it should be easy enough to investigate a few examples by comparing IR dumps for functions whose instruction count changed. I'll take a look.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are my comments after reviewing the first commit only. I've checked that all my comments apply to the PR as a whole, not just the first commit.

private import IRFunctionBaseInternal

private newtype TIRFunction =
MkIRFunction(Language::Function func) { IRConstruction::Raw::functionHasIR(func) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this isn't cached. But it wasn't cached before, so this PR is not the right time to experiment with its caching.

result = element.getInstructionResultSize(tag)
)
IRFunctionBase getInstructionEnclosingIRFunction(TStageInstruction instr) {
instr = TRawInstruction(result, _, _, _, _)
Copy link
Contributor

@jbj jbj Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's helpful for these trivial predicates to be cached. But we can measure that separately (after this PR), and it also interacts with the change I'm proposing to remove columns from TRawInstruction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer cached, part of the overall removal of caching of raw IR construction.

@dbartol
Copy link
Author

dbartol commented Jun 12, 2020

@MathiasVP Can you share your query for those instruction counts? I'm getting numbers that are in the same general range, but far enough off that I'm not sure I counted exactly the same way you did.

@dbartol
Copy link
Author

dbartol commented Jun 12, 2020

@MathiasVP The only IR diff I found locally in git was newly detected unreachable code because IntegerConstantInstruction now includes enums, thanks to checking getResultIRType() instead of getResultType(). Again, my numbers are different from yours for some reason.

@MathiasVP
Copy link
Contributor

@MathiasVP Can you share your query for those instruction counts? I'm getting numbers that are in the same general range, but far enough off that I'm not sure I counted exactly the same way you did.

My query was:

import semmle.code.cpp.ir.IR
select count(Instruction i), select count(PhiInstruction i), select count(ChiInstruction i)

on the snapshot from https://jenkins.internal.semmle.com/job/Changes/job/Build-Snapshots/13817/.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now read all commits in this PR, and it LGTM apart from these comments!

) {
element = getInstructionTranslatedElement(instruction) and
tag = getInstructionTag(instruction)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for this predicate now? I think instructionOrigin(i, e, t) can now be written as i = TRawInstruction(e, t). That results in a different column order, but at least the body of this predicate can be simplified if it's important to preserve the column order (and then pragma[noinline] should be unnecessary).

Perhaps it would be even better to inline getInstructionTranslatedElement and getInstructionTag (maybe the compiler does that already), and then callers of instructionOrigin can be changed to call those two inline predicates instead. If the compiler does everything right, two such calls should result in one join with TRawInstruction, and that should be just as performant as joining with instructionOrigin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing instructionOrigin() cleaned up the RA a bit. At basically every usage, it had the wrong column order and needed a SCAN to reorder the columns. TRawInstruction() has the right join order already. Net improvement:

  • Eliminated four SCANs of instructionOrigin()
  • Added one new SCAN of TRawInstruction()
  • Eliminated a materialized join_rhs predicate for TranslatedElement::hasInstruction().
  • Eliminated instructionOrigin() itself.

Changing those TRawInstruction() calls to just use getInstructionTranslatedElement()/getInstructionTag() like we do everywhere else had essentially no effect on the RA, but made everything look more consistent.

Dave Bartolomeo added 5 commits June 16, 2020 16:37
See #2272. I've added code comments in all of the places that future me will be tempted to hoist these overrides.
These predicates are only used within the new single IR stage, so there's no need to cache them beyond that. RA diffs are trivial. Where previously many of the predicate on `Instruction` were inline wrappers around cached predicates from `IRConstruction`, now the predicates from `IRConstruction` get inlined into the `Instruction` predicates, and the `Instruction` predicates get materialized. The net amount of work is the same, but now it's not getting cached unnecessarily.
This noopt predicate is no longer necessary. It's equivalent to `instruction = TRawInstruction(element, tag)`, which is already materialized and has a more favorable column order anyway.
Replace most direct calls to `TRawInstruction()` with calls to `getInstructionTranslatedElement()` and `getInstructionTag()`, matching existing practice. One tiny RA diff in an inconsequential join order in `getInstructionVariable`.
@dbartol
Copy link
Author

dbartol commented Jun 17, 2020

@MathiasVP OK, I get the same numbers as you when running on that snapshot. I was using the one from lgtm.com. The diffs are due to two instances of unreachable code elimination due to constant propagation of enums. One instance is the same one I saw on the lgtm.com snapshot, and the other appears to be in code that only exists in the Jenkins snapshot.

@dbartol
Copy link
Author

dbartol commented Jun 17, 2020

@jbj I think I've addressed everything that can reasonably be addressed in this PR. Please take a last look and merge when you get a chance.

@jbj jbj merged commit 09d7ed0 into github:master Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants