Dataflow: Performance fixes #9098

aschackmull · 2022-05-10T08:32:33Z

Two fixes:
The first commit fixes a bad join-order in nodeMayUseSummary:
Before:

Evaluated non-recursive predicate DataFlowImpl::nodeMayUseSummary#17aa0c36#fffb@cb08d6d9 in 25262ms (evaluation was halted due to CancellationException).
Tuple counts for DataFlowImpl::nodeMayUseSummary#17aa0c36#fffb@cb08d6d9:
           318500    ~0%    {4} r1 = SCAN project#DataFlowImpl::Stage4::revFlow#17aa0c36#2#ffffff#2 OUTPUT In.3, In.0, In.1, In.2
           318000    ~0%    {5} r2 = JOIN r1 WITH m#DataFlowImpl::countNodesUsingAccessPath#17aa0c36#fbf ON FIRST 1 OUTPUT Lhs.1, Rhs.0, Lhs.1, Lhs.2, Lhs.3
           317500    ~4%    {5} r3 = JOIN r2 WITH DataFlowImpl::NodeEx::getEnclosingCallable0#f0820431#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        250080500    ~2%    {5} r4 = JOIN r3 WITH project#DataFlowImpl::Stage4::parameterMayFlowThrough#17aa0c36#ffff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        250079500    ~1%    {6} r5 = JOIN r4 WITH num#DataFlowImpl::TAccessPathApproxSome#17aa0c36#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Rhs.1, Lhs.4, Lhs.1, Lhs.0
           856500  ~452%    {5} r6 = JOIN r5 WITH DataFlowImpl::Stage4::fwdFlow#17aa0c36#ffffff_013452#join_rhs ON FIRST 5 OUTPUT Rhs.5, Lhs.4, Lhs.0, Lhs.1, Lhs.5
           166500    ~8%    {4} r7 = JOIN r6 WITH DataFlowImplCommon::CallContextCall#class#2411651e#f ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1
                            return r7

After:

Evaluated non-recursive predicate DataFlowImpl::nodeMayUseSummary0#17aa0c36#fffff@2dedbbed in 1727ms (size: 2138162).
Tuple counts for DataFlowImpl::nodeMayUseSummary0#17aa0c36#fffff@2dedbbed:
        3831722    ~0%    {5} r1 = SCAN project#DataFlowImpl::Stage4::revFlow#17aa0c36#2#ffffff#2 OUTPUT In.0, In.0, In.1, In.2, In.3
        3831722    ~3%    {5} r2 = JOIN r1 WITH DataFlowImpl::NodeEx::getEnclosingCallable0#f0820431#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        3457633    ~3%    {5} r3 = JOIN r2 WITH project#DataFlowImpl::Stage4::parameterMayFlowThrough#17aa0c36#ffff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.0
        7786136  ~245%    {6} r4 = JOIN r3 WITH DataFlowImpl::Stage4::fwdFlow#17aa0c36#ffffff_014523#join_rhs ON FIRST 4 OUTPUT Rhs.4, Lhs.0, Lhs.1, Lhs.3, Lhs.4, Rhs.5
        2282765    ~9%    {5} r5 = JOIN r4 WITH DataFlowImplCommon::CallContextCall#class#2411651e#f ON FIRST 1 OUTPUT Lhs.5, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        2181543    ~0%    {5} r6 = JOIN r5 WITH num#DataFlowImpl::TAccessPathApproxSome#17aa0c36#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.4, Lhs.2, Rhs.1, Lhs.3
                          return r6

The second commit removes some tuples from consCand that belonged to unreachable access paths:

Consider the set of cons candidates accumulated during reverse flow in the predicate revFlowConsCand(Ap cons, Content c, Ap tail, Configuration config). These are the set of reads that are reached during reverse flow and are matched up with stores as those steps are subsequently reached. The subset that do reach a store is collected after the flow recursion in revConsCand (formerly consCand), but due to this filtering it now means that we can have cons candidates for access paths that are no longer possible to construct using just the remaining cons candidates in revConsCand, so we can do a bit of additional filtering (which is now done in this PR). In the example I looked at this filtered just 2 tuples in stage 4, but removing those two tuples allowed us to construct 2 million fewer AccessPaths for the final stage. This was because the expansion of AccessPathApprox to AccessPath assumed that the set of cons candidates didn't include such unreachable tuples and their existence invalidated the calculation of evalUnfold.

hvitved

LGTM, just one suggestion.

java/ql/lib/semmle/code/java/dataflow/internal/DataFlowImpl.qll

hvitved · 2022-05-11T09:06:07Z

Happy to report that this also appears to resolve a performance issue for C#, caused by the addition of new flow summaries (cc @michaelnebel). This was a similar situation for Java, right?

aschackmull added 3 commits May 9, 2022 16:21

Dataflow: Fix join order in nodeMayUseSummary.

1b0e9d5

Dataflow: Prune more cons-candidates.

135d7f6

Dataflow: Sync.

f85e06c

aschackmull added the no-change-note-required This PR does not need a change note label May 10, 2022

aschackmull requested review from a team as code owners May 10, 2022 08:32

github-actions bot added C# C++ Java Python Ruby labels May 10, 2022

hvitved reviewed May 11, 2022

View reviewed changes

java/ql/lib/semmle/code/java/dataflow/internal/DataFlowImpl.qll Outdated Show resolved Hide resolved

Dataflow: Review fix.

4884520

hvitved approved these changes May 11, 2022

View reviewed changes

MathiasVP approved these changes May 11, 2022

View reviewed changes

hvitved merged commit 46ab25b into github:main May 11, 2022

aschackmull deleted the dataflow/perf branch May 12, 2022 11:30

owen-mc added the DataFlow Library label Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataflow: Performance fixes #9098

Dataflow: Performance fixes #9098

Uh oh!

aschackmull commented May 10, 2022

Uh oh!

hvitved left a comment

Uh oh!

Uh oh!

hvitved commented May 11, 2022

Uh oh!

Uh oh!

Dataflow: Performance fixes #9098

Dataflow: Performance fixes #9098

Uh oh!

Conversation

aschackmull commented May 10, 2022

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hvitved commented May 11, 2022

Uh oh!

Uh oh!