C++: Speedup product dataflow #13139

MathiasVP · 2023-05-12T08:40:29Z

This PR speeds up the experimental product flow library in three steps:

The first commit implements Schack's suggestion from C++: Experimental product flow library #9997 (comment)
The second commit applies manual magic to the transitive closures of the localPathStep calls in the forward direction,
... and the last commit does the same in the reverse direction.

This seems to speed up the analysis quite a lot. For example, running cpp/invalid-pointer-deref on https://github.com/ptitSeb/box64 goes from ~30 min to ~5 min. There's still something weird on that project, but at least it's a lot more manageable now.

I haven't run DCA on it yet, but I'll do so once Actions feels better 😅.

aschackmull · 2023-05-12T09:23:44Z

cpp/ql/lib/experimental/semmle/code/cpp/dataflow/ProductFlow.qll

      )
    }

+    pragma[assume_small_delta]
+    pragma[nomagic]
+    private predicate revReachableInterprocEntry(Flow1::PathNode node1, Flow2::PathNode node2) {


You are making the reverse calculation much more complicated than it has to be. Instead of mirroring the forward calculation, you can make things much simpler by using fwdIsSuccessor as your step relation. That way there will be no need to do anything with revLocalPathStepXXX. You'll get slightly different node-pairs, but that's because the code as-is doesn't calculate revReachableInterprocEntry as advertised but rather the exits, so your revReachableInterprocEntry confusingly isn't a subset of fwdReachableInterprocEntry. With this simplification the reverse flow will simply be a subset of fwdReachableInterprocEntry and the node pairs from the forward and reverse calculations will be consistent.

Thanks for the comments. I've force-pushed a version that hopefully implements the simplifications you had in mind 🤞.

…f product flow.

aschackmull

It was almost what I meant. But you don't need to inline isSuccessor 3 places if you instead just make it fwdIsSuccessor as before.

cpp/ql/lib/experimental/semmle/code/cpp/dataflow/ProductFlow.qll

aschackmull · 2023-05-12T12:20:49Z

cpp/ql/lib/experimental/semmle/code/cpp/dataflow/ProductFlow.qll

+    }
+
+    private predicate localPathStep1SuccPlus(Flow1::PathNode n1, Flow1::PathNode n2) =
+      fastTC(prunedLocalPathStep1/2)(n1, n2)


Wait a minute, are you doing recursion through fastTC? That's going to be slow.

I was actually planning on asking on Slack how to judge whether or not a fastTC was appropriate here 😂. How do you know that this will be slow?

https://github.com/github/codeql-core/issues/3098

Ah. TIL! Is it safe to replace the fastTC with a transitive closure operator? Or should I write out the TC manually to prevent the compiler from generating a fastTC?

Is it safe to replace the fastTC with a transitive closure operator?

If that's the only change, then no, that just puts the choice in the compilers hands about whether to use fastTC or unrolling. Manually writing out the TC might also not be the best, but that's hard to tell. At least it wouldn't be the first option I'd try. The main issue here is a proliferation of recursive dependencies in the pruning of your TCs. If you overapproximate these slightly then you may are able to do source-sink-restricted materialisations of the TCs that are independent of the recursion. Alternatively, the "speed up isSuccessor" commit that you've felt compelled to add, might not be needed at all if you tweak the join-order - currently the join-order is likely to be somewhat restricted, since the original two applications of TC are unlikely to distribute over the disjunction, which will invariably lead to a poor join-order.

Yeah, the motivation for that commit was that I was seeing some very large blowups on a particular project:

Tuple counts for ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor#4#ffff/4@i10#848b8wcn after 12m10s: 2500 ~0% {4} r1 = SCAN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdReachableInterprocEntry#2#ff#prev_delta OUTPUT In.0 'pred1', In.1 'pred2', In.0 'pred1', In.1 'pred2' 1498000 ~0% {4} r2 = JOIN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdReachableInterprocEntry#2#ff#prev_delta WITH #ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::localPathStep1#2Plus#ff ON FIRST 1 OUTPUT Rhs.1 'succ1', Lhs.1 'pred2', Lhs.0 'pred1', Lhs.1 'pred2' 2500 ~3% {3} r3 = SCAN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdReachableInterprocEntry#2#ff#prev_delta OUTPUT In.1 'pred2', In.0 'pred1', In.0 'pred1' 1211000 ~0% {3} r4 = JOIN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdReachableInterprocEntry#2#ff#prev_delta WITH #ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::localPathStep1#2Plus#ff ON FIRST 1 OUTPUT Lhs.1 'pred2', Lhs.0 'pred1', Rhs.1 'succ1' 1213500 ~0% {3} r5 = r3 UNION r4 3829668500 ~2% {4} r6 = JOIN r5 WITH #ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::localPathStep2#2Plus#ff ON FIRST 1 OUTPUT Lhs.2 'succ1', Rhs.1 'succ2', Lhs.1 'pred1', Lhs.0 'pred2' 3831166500 ~2% {4} r7 = r2 UNION r6 3831169000 ~2% {4} r8 = r1 UNION r7 0 ~0% {4} r9 = JOIN r8 WITH ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::isSinkPair#2#ff ON FIRST 2 OUTPUT Lhs.2 'pred1', Lhs.3 'pred2', Lhs.0 'succ1', Lhs.1 'succ2' 2000 ~0% {4} r10 = SCAN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdReachableInterprocEntry#2#ff#prev_delta OUTPUT In.0 'pred1', In.0 'pred1', In.1 'pred2', In.1 'pred2' 69000 ~0% {4} r11 = JOIN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdReachableInterprocEntry#2#ff#prev_delta WITH #ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::localPathStep1#2Plus#ff ON FIRST 1 OUTPUT Rhs.1 'succ1', Lhs.0 'pred1', Lhs.1 'pred2', Lhs.1 'pred2' 3828324000 ~3% {4} r12 = JOIN r5 WITH #ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::localPathStep2#2Plus#ff ON FIRST 1 OUTPUT Lhs.2 'succ1', Lhs.1 'pred1', Lhs.0 'pred2', Rhs.1 'succ2' 3828393000 ~3% {4} r13 = r11 UNION r12 3828395000 ~3% {4} r14 = r10 UNION r13 765712148 ~2% {7} r15 = JOIN r14 WITH ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::interprocEdge1#5#fffff_20134#join_rhs ON FIRST 1 OUTPUT Rhs.1, Rhs.2, Lhs.3, Rhs.4, Lhs.1 'pred1', Lhs.2 'pred2', Rhs.3 'succ1' 241499 ~9311% {4} r16 = JOIN r15 WITH ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::interprocEdge2#5#fffff_01243#join_rhs ON FIRST 4 OUTPUT Lhs.4 'pred1', Lhs.5 'pred2', Lhs.6 'succ1', Rhs.4 'succ2' 241499 ~9311% {4} r17 = r9 UNION r16 3016 ~18% {4} r18 = r17 AND NOT ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor#4#ffff#prev(Lhs.0 'pred1', Lhs.1 'pred2', Lhs.2 'succ1', Lhs.3 'succ2') return r18

and it was slightly mitigated by 594da1a. However, the following join order tweaks removes that problem altogether, and I don't even need that commit anymore 🎉

Pipeline standard for ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor#4#ffff@3f482yf4 was evaluated in 7 iterations totaling 122ms (delta sizes total: 144364). 688418 ~12154% {4} r1 = JOIN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor1#6#ffffff#prev_delta WITH ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor2#6#ffffff#prev ON FIRST 6 OUTPUT Lhs.0, Lhs.1, Lhs.4, Lhs.5 688418 ~12154% {4} r2 = JOIN ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor2#6#ffffff#prev_delta WITH ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor1#6#ffffff#prev ON FIRST 6 OUTPUT Lhs.0, Lhs.1, Lhs.4, Lhs.5 1376836 ~24407% {4} r3 = r1 UNION r2 160986 ~100% {4} r4 = r3 AND NOT ProductFlow#20473ca4::ProductFlow::GlobalWithState#InvalidPointerDeref#e13af12b::AllocToInvalidPointerConfig#::fwdIsSuccessor#4#ffff#prev(Lhs.0, Lhs.1, Lhs.2, Lhs.3) return r4

This reverts commit 594da1a.

aschackmull

LGTM now.

MathiasVP · 2023-05-17T12:40:33Z

DCA looks good. No result changes (as expected), and a 15% speedup on Linux 🚀

MathiasVP requested a review from a team as a code owner May 12, 2023 08:40

github-actions bot added the C++ label May 12, 2023

MathiasVP added the no-change-note-required This PR does not need a change note label May 12, 2023

aschackmull reviewed May 12, 2023

View reviewed changes

MathiasVP added 2 commits May 12, 2023 11:58

C++: Don't carry the sources around as columns during the main loop o…

e58b99d

…f product flow.

C++: Speedup 'isSuccessor'.

594da1a

MathiasVP force-pushed the faster-product-flow branch from 1947c88 to 594da1a Compare May 12, 2023 11:05

aschackmull reviewed May 12, 2023

View reviewed changes

C++: Push conjunct into 'isSuccessor' and rename it to 'fwdIsSuccessor'.

2458fa0

aschackmull reviewed May 12, 2023

View reviewed changes

Revert "C++: Speedup 'isSuccessor'."

3f01a21

This reverts commit 594da1a.

MathiasVP force-pushed the faster-product-flow branch from 1cf1d9c to 7b0afd1 Compare May 12, 2023 15:07

C++: Tweak join orders.

e1cc7dc

MathiasVP force-pushed the faster-product-flow branch from 7b0afd1 to e1cc7dc Compare May 12, 2023 15:38

aschackmull approved these changes May 15, 2023

View reviewed changes

MathiasVP merged commit 06d5a7f into github:main May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Speedup product dataflow #13139

C++: Speedup product dataflow #13139

Uh oh!

MathiasVP commented May 12, 2023 •

edited

Loading

Uh oh!

aschackmull May 12, 2023 •

edited

Loading

Uh oh!

MathiasVP May 12, 2023

Uh oh!

aschackmull left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aschackmull May 12, 2023

Uh oh!

MathiasVP May 12, 2023 •

edited

Loading

Uh oh!

aschackmull May 12, 2023

Uh oh!

MathiasVP May 12, 2023

Uh oh!

aschackmull May 12, 2023

Uh oh!

MathiasVP May 12, 2023 •

edited

Loading

Uh oh!

aschackmull left a comment

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

Uh oh!

C++: Speedup product dataflow #13139

C++: Speedup product dataflow #13139

Uh oh!

Conversation

MathiasVP commented May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aschackmull May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 12, 2023

Choose a reason for hiding this comment

Uh oh!

aschackmull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aschackmull May 12, 2023

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aschackmull May 12, 2023

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 12, 2023

Choose a reason for hiding this comment

Uh oh!

aschackmull May 12, 2023

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aschackmull left a comment

Choose a reason for hiding this comment

Uh oh!

MathiasVP commented May 17, 2023

Uh oh!

Uh oh!

MathiasVP commented May 12, 2023 •

edited

Loading

aschackmull May 12, 2023 •

edited

Loading

MathiasVP May 12, 2023 •

edited

Loading

MathiasVP May 12, 2023 •

edited

Loading