Skip to content

Conversation

dbartol
Copy link

@dbartol dbartol commented May 13, 2020

This is the first of several steps toward implementing https://github.com/github/codeql-c-analysis-team/issues/69.

The UnmodeledUse instruction has one operand for each unmodeled memory result in the function. This will make it slightly annoying when trying to reuse Operand objects across IR phases, but this instruction has been problematic from day one. It exists in order to provide a dummy "use" for every unmodeled memory result, with the intention being that analyses that care about finding uses of a result will not think that the result is dead just because we didn't try to model that result in SSA. However, this means that it's the only instruction whose uses might not be dominated by their respective definitions. It's also a fair bit of work just to hook up all those unmodeled results when building the IR. Finally, it comes at the cost of adding one instruction and a potentially large number of operands to each function, for each stage of the IR. It's been on my hit list for a long time, but now I think removing it will actually make future work easier.

This PR removes the UnmodeledUse instruction and its associated operands. Most of the changes are just about removing all references to Opcode::UnmodeledUse and UnmodeledUseOperand, but there are a few more significant changes:

In C++ IR construction, TranslatedElement::getInstructionOperand() is renamed to getInstructionRegisterOperand(), and now applies only to register operands. Memory operands are identified by seeing if TranslatedElement::getMemoryOperandType() has a result for that OperandTag.

I added a small hack to the ordering of blocks in IR dumps so that the block containing the EnterFunction instruction always comes first. Otherwise, for functions with exception handling, we get a block that starts with AliasedUse but has the same location as the block containing EnterFunction. Since we sort blocks within the same location by the name of their instruction tag, we would wind up with the AliasedUse block preceding the EnterFunction block, making the IR diffs more complicated and making looking at such IR dumps confusing.

I had to add one term to AliasAnalysis.qll to treat an address as escaped if it flowed to an unmodeled result. This is the only place in all of our code that actually cared about whether a result actually had a use. Note that any code that consumes the aliased SSA IR can't possibly be affected by this change, because aliased SSA IR never has unmodeled results anyway. The problem that UnmodeledUse was trying to solve only affected code analyzing intermediate stages of the IR, which we don't even expose outside of the standard library.

Next up: Removing UnmodeledDefinition, which is a bit trickier because it will leave unmodeled memory operands with no definition instruction.

@dbartol dbartol requested review from a team as code owners May 13, 2020 05:36
jbj
jbj previously approved these changes May 13, 2020
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can be merged when the failing test is accepted.

@dbartol dbartol merged commit ea2081c into github:master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants