C++: Construct a CFG with QL #724

jbj · 2019-01-04T12:29:40Z

This PR is the first major step towards CPP-257: constructing the CFG in QL. A library is added to compute a CFG from QL, and tests are added to validate that it computes the same CFG as the extractor does. The extractor-based CFG is not replaced by the QL-based CFG in the ControlFlowNode class, so all analyses are unaffected for now.

When this is merged, I'll ask @ian-semmle to make the extractor produce the necessary db data to get rid of SyntheticDestructorCalls.qll, which means we'll no longer be relying on the successors relation for placing compiler-generated destructor calls. The data should be structured similarly to getDestructorCallAfterNode.

I've validated the implementation as follows.

I've copied most *.c and *.cpp files from our tests into the library-tests/qlcfg/ directory, where tellDifferent.ql checks that their QL-produced CFG is identical to their extractor-produced CFG. I've kept the files that showed a difference at some point, so they serve as regression tests. As you can see from tellDifferent.expected, there are still some problematic cases left. They are caused by CPP-314 and by shortcomings of SyntheticDestructorCalls.qll.
I've run tellDifferent.ql on comdb2, linux, and Wireshark. Where there have been differences between the extractor CFG and the QL CFG, I've attempted to fix them and boil them down to regression tests in library-tests/qlcfg. There are some issues I haven't been able to work around, and those are filed as extractor bugs linked to CPP-257.
I've run a CPP-Differences job on my cfg-enable branch, which contains these changes plus the changes needed to make ControlFlowNode use the QL-based CFG.
1. There were two changes, both in Boost: a FP in "Infinite loop with unsatisfiable exit condition" was removed, and a FP in "Missing return statement" was added. The latter FP is caused by ODASA-394.
2. Performance on Jenkins is hard to measure. The run time for the whole suite is within a factor of two on all seven sample projects. I think any slowdown due to CFG calculation are eclipsed by the differences due to hardware and concurrent jobs. The detailed timings (before -> after) are in this gist. The only significant regression is Linux, and I checked manually that the new CFG construction stage takes 156 seconds. Compare that to 187 seconds for the existing CFG pruning stage.
3. In my initial run of CPP-Differences, everything timed out. This was fixed by C++: Factor out reachable base case #721.
I ran Language-Test/CPP with my cfg-enable branch and addressed the test differences that showed up. The remaining differences are benign and can be accepted when the time comes.

Also expand the QLDoc.

This implements calculation of the control-flow graph in QL. The new code is not enabled yet as we'll need more extractor changes first. The `SyntheticDestructorCalls.qll` file is a temporary solution that can be removed when the extractor produces this information directly.

geoffw0

There was something specific you wanted me and/or @rdmarsh2 to look at in this PR, wasn't there?

geoffw0 · 2019-01-07T17:27:13Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+ * ("before" and "after") and collapse the edges around them, we are left with
+ * the correct CFG for `e`:
+ *
+ *     e1 -> e2 -> e


Excellent explanation.

ian-semmle · 2019-01-07T18:53:45Z

cpp/ql/src/semmle/code/cpp/exprs/Assignment.qll

- * `getExpr()` is an access to `c` (with possible casts), and `getVariable` is
- * the variable `c`, which has an initializer `x < y`.
+ * `getVariableAccess()` is an access to `c` (with possible casts),
+ * `getVariable` is the variable `c`, which has an initializer `x < y`, and


Seeing as we're in a comma-separated list, I think it would be clearer to put which has an initializer x < y in parentheses.

ian-semmle · 2019-01-07T18:54:01Z

cpp/ql/src/semmle/code/cpp/exprs/Assignment.qll

- * the variable `c`, which has an initializer `x < y`.
+ * `getVariableAccess()` is an access to `c` (with possible casts),
+ * `getVariable` is the variable `c`, which has an initializer `x < y`, and
+ * `getInitializingExpr` is `x < y`.


You have () after some predicates but not others.

ian-semmle · 2019-01-07T19:10:21Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/SyntheticDestructorCalls.qll

+//   have multiple predecessors.
+//   - But after ReturnStmt, that may happen.
+/**
+ * Describes a straight line of `SyntheticDestructorCall`s. Node that such


s/Node/Note/

ian-semmle · 2019-01-07T19:20:59Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/SyntheticDestructorCalls.qll

+    or
+    this instanceof MicrosoftTryExceptStmt
+    or
+    // Detecting exception edges out of a MicrosoftTryExceptStmt is not


Do you mean MicrosoftTryFinallyStmt?

ian-semmle · 2019-01-07T20:26:40Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+ * specifies the shape of the CFG for all known language constructs. The case
+ * analysis is large but does _not_ contain recursion. Recursion is needed in
+ * the second stage in order to collapse virtual nodes, but that recursion is
+ * simply a transitive closure and can be fast.


"so is fast"?

ian-semmle · 2019-01-07T20:38:14Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+ * To produce all edges around each control-flow node without recursion, we
+ * need to pre-compute the targets of exception sources (throw, propagating
+ * handlers, ...) and short-circuiting operators (||, ? :, ...). This
+ * pre-computation involves recursion, but it's quick to compute because in


because it

ian-semmle · 2019-01-07T20:41:40Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+ * For many AST nodes, their control flow can be described in simpler terms
+ * than the full generality of describing each of their individual sub-edges.
+ * To add control flow for a new AST construct, one of the following predicates
+ * can be used, listed roughly in order of increasing generality.


The first sentence lost me at first. Perhaps something like: Many kinds of AST nodes share the same pattern of control flow. To add control flow for a new AST construct, it can often be easier to use one of the following predicates (listed roughly in order of increasing generality) than to define it directly.

ian-semmle · 2019-01-07T20:45:09Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+private import semmle.code.cpp.controlflow.internal.SyntheticDestructorCalls
+
+/** A control-flow node. */
+private class Node extends ControlFlowNodeBase {


I think the qldoc should explain how this fits in with ControlFlowNode

ian-semmle · 2019-01-07T20:46:23Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+    or
+    result = this.(Stmt).getParent()
+    or
+    // An Initializer under a ConditionDeclExpr is not part of the CFG.


Is this a bug?

Yes, but it's not as bad as it sounds. I'll clarify this comment.

ian-semmle · 2019-01-07T20:54:48Z

cpp/ql/src/semmle/code/cpp/controlflow/internal/CFG.qll

+ * A constant that indicates the type of sub-node in a pair of `(Node, Pos)`.
+ * See the comment block at the top of this file.
+ */
+private class Pos extends int {


Would this be better as an ADT?

Possibly, but I liked having it as int during development because I'd get a compile error in any disjunct where I forgot to constrain it. I can check whether that can be done with bindingset for an IPA type.

I pushed a commit to turn Pos and Spec into IPA types.

This is cleaner than extending `int` and working with magic numbers. Performance appears to be unaffected.

Post-release preparation for codeql-cli-2.9.0

jbj added 3 commits January 4, 2019 10:24

C++: Add BuiltInIntAddr class for __INTADDR__

8f9849b

C++: LocalVariable docs

ca0517b

C++: Add ConditionDeclExpr convenience predicates

a47faa2

Also expand the QLDoc.

jbj added the C++ label Jan 4, 2019

jbj requested a review from a team as a code owner January 4, 2019 12:29

jbj force-pushed the cfg-pr branch from d963586 to 26f32f0 Compare January 4, 2019 12:34

pavgust changed the base branch from next to master January 7, 2019 09:55

geoffw0 reviewed Jan 7, 2019

View reviewed changes

ian-semmle reviewed Jan 7, 2019

View reviewed changes

jbj added 2 commits January 8, 2019 13:29

C++: Update comments based on PR feedback

dba3351

C++: Use IPA for Pos and Spec

1be91b5

This is cleaner than extending `int` and working with magic numbers. Performance appears to be unaffected.

ian-semmle approved these changes Jan 9, 2019

View reviewed changes

ian-semmle merged commit b3bcabf into github:master Jan 9, 2019

cklin pushed a commit that referenced this pull request Apr 26, 2022

Merge pull request #724 from github/post-release-prep/codeql-cli-2.9.0

b8165d4

Post-release preparation for codeql-cli-2.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Construct a CFG with QL #724

C++: Construct a CFG with QL #724

Uh oh!

jbj commented Jan 4, 2019

Uh oh!

geoffw0 left a comment

Uh oh!

geoffw0 Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

jbj Jan 8, 2019

Uh oh!

ian-semmle Jan 7, 2019

Uh oh!

jbj Jan 8, 2019

Uh oh!

jbj Jan 8, 2019

Uh oh!

Uh oh!

C++: Construct a CFG with QL #724

C++: Construct a CFG with QL #724

Uh oh!

Conversation

jbj commented Jan 4, 2019

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!