-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Construct a CFG with QL #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This implements calculation of the control-flow graph in QL. The new code is not enabled yet as we'll need more extractor changes first. The `SyntheticDestructorCalls.qll` file is a temporary solution that can be removed when the extractor produces this information directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was something specific you wanted me and/or @rdmarsh2 to look at in this PR, wasn't there?
* ("before" and "after") and collapse the edges around them, we are left with | ||
* the correct CFG for `e`: | ||
* | ||
* e1 -> e2 -> e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent explanation.
* `getExpr()` is an access to `c` (with possible casts), and `getVariable` is | ||
* the variable `c`, which has an initializer `x < y`. | ||
* `getVariableAccess()` is an access to `c` (with possible casts), | ||
* `getVariable` is the variable `c`, which has an initializer `x < y`, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing as we're in a comma-separated list, I think it would be clearer to put which has an initializer x < y
in parentheses.
* the variable `c`, which has an initializer `x < y`. | ||
* `getVariableAccess()` is an access to `c` (with possible casts), | ||
* `getVariable` is the variable `c`, which has an initializer `x < y`, and | ||
* `getInitializingExpr` is `x < y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have ()
after some predicates but not others.
// have multiple predecessors. | ||
// - But after ReturnStmt, that may happen. | ||
/** | ||
* Describes a straight line of `SyntheticDestructorCall`s. Node that such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Node/Note/
or | ||
this instanceof MicrosoftTryExceptStmt | ||
or | ||
// Detecting exception edges out of a MicrosoftTryExceptStmt is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean MicrosoftTryFinallyStmt
?
* specifies the shape of the CFG for all known language constructs. The case | ||
* analysis is large but does _not_ contain recursion. Recursion is needed in | ||
* the second stage in order to collapse virtual nodes, but that recursion is | ||
* simply a transitive closure and can be fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"so is fast"?
* To produce all edges around each control-flow node without recursion, we | ||
* need to pre-compute the targets of exception sources (throw, propagating | ||
* handlers, ...) and short-circuiting operators (||, ? :, ...). This | ||
* pre-computation involves recursion, but it's quick to compute because in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it
* For many AST nodes, their control flow can be described in simpler terms | ||
* than the full generality of describing each of their individual sub-edges. | ||
* To add control flow for a new AST construct, one of the following predicates | ||
* can be used, listed roughly in order of increasing generality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence lost me at first. Perhaps something like: Many kinds of AST nodes share the same pattern of control flow. To add control flow for a new AST construct, it can often be easier to use one of the following predicates (listed roughly in order of increasing generality) than to define it directly.
private import semmle.code.cpp.controlflow.internal.SyntheticDestructorCalls | ||
|
||
/** A control-flow node. */ | ||
private class Node extends ControlFlowNodeBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the qldoc should explain how this fits in with ControlFlowNode
or | ||
result = this.(Stmt).getParent() | ||
or | ||
// An Initializer under a ConditionDeclExpr is not part of the CFG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's not as bad as it sounds. I'll clarify this comment.
* A constant that indicates the type of sub-node in a pair of `(Node, Pos)`. | ||
* See the comment block at the top of this file. | ||
*/ | ||
private class Pos extends int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better as an ADT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but I liked having it as int
during development because I'd get a compile error in any disjunct where I forgot to constrain it. I can check whether that can be done with bindingset
for an IPA type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to turn Pos
and Spec
into IPA types.
This is cleaner than extending `int` and working with magic numbers. Performance appears to be unaffected.
Post-release preparation for codeql-cli-2.9.0
This PR is the first major step towards CPP-257: constructing the CFG in QL. A library is added to compute a CFG from QL, and tests are added to validate that it computes the same CFG as the extractor does. The extractor-based CFG is not replaced by the QL-based CFG in the
ControlFlowNode
class, so all analyses are unaffected for now.When this is merged, I'll ask @ian-semmle to make the extractor produce the necessary db data to get rid of
SyntheticDestructorCalls.qll
, which means we'll no longer be relying on thesuccessors
relation for placing compiler-generated destructor calls. The data should be structured similarly togetDestructorCallAfterNode
.I've validated the implementation as follows.
*.c
and*.cpp
files from our tests into thelibrary-tests/qlcfg/
directory, wheretellDifferent.ql
checks that their QL-produced CFG is identical to their extractor-produced CFG. I've kept the files that showed a difference at some point, so they serve as regression tests. As you can see fromtellDifferent.expected
, there are still some problematic cases left. They are caused by CPP-314 and by shortcomings ofSyntheticDestructorCalls.qll
.tellDifferent.ql
on comdb2, linux, and Wireshark. Where there have been differences between the extractor CFG and the QL CFG, I've attempted to fix them and boil them down to regression tests inlibrary-tests/qlcfg
. There are some issues I haven't been able to work around, and those are filed as extractor bugs linked to CPP-257.cfg-enable
branch, which contains these changes plus the changes needed to makeControlFlowNode
use the QL-based CFG.before -> after
) are in this gist. The only significant regression is Linux, and I checked manually that the new CFG construction stage takes 156 seconds. Compare that to 187 seconds for the existing CFG pruning stage.reachable
base case #721.cfg-enable
branch and addressed the test differences that showed up. The remaining differences are benign and can be accepted when the time comes.