-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Share TInstruction
across IR stages
#3612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Share TInstruction
across IR stages
#3612
Conversation
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the `TInstruction` type for the next stage. This causes use to create roughly three times as many `TInstruction` objects as we actually need. Now that IPA union types are supported in the compiler, we can share a single `TInstruction` IPA type across stages. We create a single `TInstruction` IPA type, with individual branches of this type for instructions created directly from the AST (`TRawInstruction`) and for instructions added by each stage of SSA construction (`T*PhiInstruction`, `T*ChiInstruction`, `T*UnreachedInstruction`). Each stage then defines a `TStageInstruction` type that is a union of all of the branches that can appear in that particular stage. The public `Instruction` class for each phase extends the `TStageInstruction` type for that stage. The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces: - The `Raw` module, exposed only by the original IR construction stage. This module identifies which functions have IR, which `TRawInstruction`s exist, and which `IRVariable`s exist. - The `SSA` module, exposed only by the two SSA construction stages. This identifiers which `Phi`, `Chi`, and `Unreached` instructions exist. - The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of `MemoryOperand`s. Similarly, there is now a single `TIRFunction` IPA type that is shared across all three stages. There is a single `IRFunctionBase` class that exposes the stage-indepdendent predicates; the `IRFunction` class for each stage extends `IRFunctionBase`. Most of the other changes are largely mechanical.
Now that `TInstruction` is shared between IR stages, several of the per-stage IR construction predicates can now be moved into the `Raw` interface exposed only by the initial construction of IR from the ASTs. This also removed a couple predicates that were not used previously at all.
Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.
TInstruction
across IR stages
This updates C#'s IR to share `TInstruction` across stages the same way C++ does. The only interesting part is that, since we have not yet ported full alias analysis to C#, I stubbed out the required parts of the aliased SSA interface in `AliasedSSAStub.qll`.
…lysis-team/69_union
Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage: First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase. Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.
There were a few places in the IR itself where we use `Instruction.getResultType()`, which returns the C++ `Type` of the result, instead of `Instruction.getResultIRType()`, which returns the language-neutral `IRType` of the result. By removing this usage, we can avoid evaluating `getResultType()` at all. There are still other uses of `Instruction.getResultType()` in other libraries. We should switch those as well.
Most of the predicates on `Instruction` are thin wrappers around cached predicates in the `IRConstruction` or `SSAConstruction` modules. However, `getResultIRType()` has to join `Construction::getInstructionResultType()` with `LanguageType::getIRType()`. `getResultIRType()` is called frequently both within the IR code and by IR consumers, and that's a big join to have to repeat in multiple stages. I looked at most of the other predicates in `Instruction.qll`, and didn't see any other predicates that met all of the criteria of "large, commonly called, and not already inline".
The four cached predicates used to access common properties of instructions took a `TStageInstruction` as a parameter. This requires the calling code, in `Instruction.qll`, to then join the results with `hasInstruction()` to filter out results for `TRawInstruction`s that were discarded as unreachable. By simply switching the parameter types to `Instruction`, we can force that join to happen in the cached predicate itself. This makes the various accessor predicates on `Instruction` trivially inlinable to the cached predicate, instead of being joins of two huge relations that might have to be recomputed in later stages.
…lysis-team/69_union
The optimizer picked a terrible join order in `VirtualDispatch::DataSensitiveCall::flowsFrom()`. Telling it that `getAnOutNode()` has a unique result convinces it to join first on the `Callable`, rather than on the `ReturnKind`.
After fixing a big performance regression in virtual dispatch, I'm running what I hope will be my final CPP-Differences job. If that comes back without any regressions, this will be ready for review. |
CPP-Differences job came back with some pretty impressive (>15%) wins across the board in overall suite analysis time. A significant portion of that is likely to be from this PR accidentally getting rid of a caching issue that caused the IR to be reevaluated during dataflow analysis, but even the IR-only queries at least look no worse than they were before this change. |
Lots of conflicts: the C# IR was moved to |
Merge conflicts with the big C# move to |
cpp/ql/src/semmle/code/cpp/ir/implementation/unaliased_ssa/internal/SSAConstruction.qll
Outdated
Show resolved
Hide resolved
I really like the changes! I think they all look good, the only thing I can't seem to understand is the small difference in instruction counts. I tried counting the number of instructions in
it's not a large difference, so I don't know if it's a sign of a bug somewhere. But do you know what the cause of this difference could be? |
It could have something to do with how we handle multiple results for predicates that should always have a unique result. I'm not too worried, but it should be easy enough to investigate a few examples by comparing IR dumps for functions whose instruction count changed. I'll take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are my comments after reviewing the first commit only. I've checked that all my comments apply to the PR as a whole, not just the first commit.
cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/SSAConstruction.qll
Outdated
Show resolved
Hide resolved
cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/internal/SSAConstruction.qll
Outdated
Show resolved
Hide resolved
private import IRFunctionBaseInternal | ||
|
||
private newtype TIRFunction = | ||
MkIRFunction(Language::Function func) { IRConstruction::Raw::functionHasIR(func) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this isn't cached
. But it wasn't cached before, so this PR is not the right time to experiment with its caching.
cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll
Outdated
Show resolved
Hide resolved
result = element.getInstructionResultSize(tag) | ||
) | ||
IRFunctionBase getInstructionEnclosingIRFunction(TStageInstruction instr) { | ||
instr = TRawInstruction(result, _, _, _, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's helpful for these trivial predicates to be cached. But we can measure that separately (after this PR), and it also interacts with the change I'm proposing to remove columns from TRawInstruction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are no longer cached, part of the overall removal of caching of raw IR construction.
@MathiasVP Can you share your query for those instruction counts? I'm getting numbers that are in the same general range, but far enough off that I'm not sure I counted exactly the same way you did. |
@MathiasVP The only IR diff I found locally in |
My query was:
on the snapshot from https://jenkins.internal.semmle.com/job/Changes/job/Build-Snapshots/13817/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now read all commits in this PR, and it LGTM apart from these comments!
) { | ||
element = getInstructionTranslatedElement(instruction) and | ||
tag = getInstructionTag(instruction) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need for this predicate now? I think instructionOrigin(i, e, t)
can now be written as i = TRawInstruction(e, t)
. That results in a different column order, but at least the body of this predicate can be simplified if it's important to preserve the column order (and then pragma[noinline]
should be unnecessary).
Perhaps it would be even better to inline getInstructionTranslatedElement
and getInstructionTag
(maybe the compiler does that already), and then callers of instructionOrigin
can be changed to call those two inline predicates instead. If the compiler does everything right, two such calls should result in one join with TRawInstruction
, and that should be just as performant as joining with instructionOrigin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing instructionOrigin()
cleaned up the RA a bit. At basically every usage, it had the wrong column order and needed a SCAN
to reorder the columns. TRawInstruction()
has the right join order already. Net improvement:
- Eliminated four
SCAN
s ofinstructionOrigin()
- Added one new
SCAN
ofTRawInstruction()
- Eliminated a materialized
join_rhs
predicate forTranslatedElement::hasInstruction()
. - Eliminated
instructionOrigin()
itself.
Changing those TRawInstruction()
calls to just use getInstructionTranslatedElement()
/getInstructionTag()
like we do everywhere else had essentially no effect on the RA, but made everything look more consistent.
cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll
Outdated
Show resolved
Hide resolved
cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll
Outdated
Show resolved
Hide resolved
See #2272. I've added code comments in all of the places that future me will be tempted to hoist these overrides.
These predicates are only used within the new single IR stage, so there's no need to cache them beyond that. RA diffs are trivial. Where previously many of the predicate on `Instruction` were inline wrappers around cached predicates from `IRConstruction`, now the predicates from `IRConstruction` get inlined into the `Instruction` predicates, and the `Instruction` predicates get materialized. The net amount of work is the same, but now it's not getting cached unnecessarily.
This noopt predicate is no longer necessary. It's equivalent to `instruction = TRawInstruction(element, tag)`, which is already materialized and has a more favorable column order anyway.
Replace most direct calls to `TRawInstruction()` with calls to `getInstructionTranslatedElement()` and `getInstructionTag()`, matching existing practice. One tiny RA diff in an inconsequential join order in `getInstructionVariable`.
@MathiasVP OK, I get the same numbers as you when running on that snapshot. I was using the one from lgtm.com. The diffs are due to two instances of unreachable code elimination due to constant propagation of enums. One instance is the same one I saw on the lgtm.com snapshot, and the other appears to be in code that only exists in the Jenkins snapshot. |
@jbj I think I've addressed everything that can reasonably be addressed in this PR. Please take a last look and merge when you get a chance. |
Strongly suggest reviewing commit-by-commit
C# is currently broken. Will be fixed before I make this draft ready for reviewMain changes
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the
TInstruction
type for the next stage. This causes use to create roughly three times as manyTInstruction
objects as we actually need.Now that IPA union types are supported in the compiler, we can share a single
TInstruction
IPA type across stages. We create a singleTInstruction
IPA type, with individual branches of this type for instructions created directly from the AST (TRawInstruction
) and for instructions added by each stage of SSA construction (T*PhiInstruction
,T*ChiInstruction
,T*UnreachedInstruction
). Each stage then defines aTStageInstruction
type that is a union of all of the branches that can appear in that particular stage. The publicInstruction
class for each phase extends theTStageInstruction
type for that stage.The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:
Raw
module, exposed only by the original IR construction stage. This module identifies which functions have IR, whichTRawInstruction
s exist, and whichIRVariable
s exist.SSA
module, exposed only by the two SSA construction stages. This identifiers whichPhi
,Chi
, andUnreached
instructions exist.MemoryOperand
s.Similarly, there is now a single
TIRFunction
IPA type that is shared across all three stages. There is a singleIRFunctionBase
class that exposes the stage-indepdendent predicates; theIRFunction
class for each stage extendsIRFunctionBase
.Most of the other changes are largely mechanical.
Subsequent changes
Now that
TInstruction
is shared between IR stages, several of the per-stage IR construction predicates can now be moved into theRaw
interface exposed only by the initial construction of IR from the ASTs. This also removed a couple predicates that were not used previously at all.Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.
Performance tuning
The initial version of this PR slowed down
RedundantNullCheckSimple.ql
on Wireshark and ChakraCore by ~10%. Most of this slowdown was due to various large and expensive relations being recomputed in multiple stages (especially tuple numbering). I've forced the entire IR (all three phases) to be evaluated in the same phase, which gets back all of the lost performance plus a few percent more. I also made some small changes to ensure that the commonly-used predicates onInstruction
are just trivial wrappers around cached predicates in the IR/SSA construction code.