Skip to content

Conversation

aschackmull
Copy link
Contributor

Two fixes:
The first commit fixes a bad join-order in nodeMayUseSummary:
Before:

Evaluated non-recursive predicate DataFlowImpl::nodeMayUseSummary#17aa0c36#fffb@cb08d6d9 in 25262ms (evaluation was halted due to CancellationException).
Tuple counts for DataFlowImpl::nodeMayUseSummary#17aa0c36#fffb@cb08d6d9:
           318500    ~0%    {4} r1 = SCAN project#DataFlowImpl::Stage4::revFlow#17aa0c36#2#ffffff#2 OUTPUT In.3, In.0, In.1, In.2
           318000    ~0%    {5} r2 = JOIN r1 WITH m#DataFlowImpl::countNodesUsingAccessPath#17aa0c36#fbf ON FIRST 1 OUTPUT Lhs.1, Rhs.0, Lhs.1, Lhs.2, Lhs.3
           317500    ~4%    {5} r3 = JOIN r2 WITH DataFlowImpl::NodeEx::getEnclosingCallable0#f0820431#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        250080500    ~2%    {5} r4 = JOIN r3 WITH project#DataFlowImpl::Stage4::parameterMayFlowThrough#17aa0c36#ffff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        250079500    ~1%    {6} r5 = JOIN r4 WITH num#DataFlowImpl::TAccessPathApproxSome#17aa0c36#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Rhs.1, Lhs.4, Lhs.1, Lhs.0
           856500  ~452%    {5} r6 = JOIN r5 WITH DataFlowImpl::Stage4::fwdFlow#17aa0c36#ffffff_013452#join_rhs ON FIRST 5 OUTPUT Rhs.5, Lhs.4, Lhs.0, Lhs.1, Lhs.5
           166500    ~8%    {4} r7 = JOIN r6 WITH DataFlowImplCommon::CallContextCall#class#2411651e#f ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.4, Lhs.1
                            return r7

After:

Evaluated non-recursive predicate DataFlowImpl::nodeMayUseSummary0#17aa0c36#fffff@2dedbbed in 1727ms (size: 2138162).
Tuple counts for DataFlowImpl::nodeMayUseSummary0#17aa0c36#fffff@2dedbbed:
        3831722    ~0%    {5} r1 = SCAN project#DataFlowImpl::Stage4::revFlow#17aa0c36#2#ffffff#2 OUTPUT In.0, In.0, In.1, In.2, In.3
        3831722    ~3%    {5} r2 = JOIN r1 WITH DataFlowImpl::NodeEx::getEnclosingCallable0#f0820431#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        3457633    ~3%    {5} r3 = JOIN r2 WITH project#DataFlowImpl::Stage4::parameterMayFlowThrough#17aa0c36#ffff ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.0
        7786136  ~245%    {6} r4 = JOIN r3 WITH DataFlowImpl::Stage4::fwdFlow#17aa0c36#ffffff_014523#join_rhs ON FIRST 4 OUTPUT Rhs.4, Lhs.0, Lhs.1, Lhs.3, Lhs.4, Rhs.5
        2282765    ~9%    {5} r5 = JOIN r4 WITH DataFlowImplCommon::CallContextCall#class#2411651e#f ON FIRST 1 OUTPUT Lhs.5, Lhs.1, Lhs.2, Lhs.3, Lhs.4
        2181543    ~0%    {5} r6 = JOIN r5 WITH num#DataFlowImpl::TAccessPathApproxSome#17aa0c36#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.4, Lhs.2, Rhs.1, Lhs.3
                          return r6

The second commit removes some tuples from consCand that belonged to unreachable access paths:

Consider the set of cons candidates accumulated during reverse flow in the predicate revFlowConsCand(Ap cons, Content c, Ap tail, Configuration config). These are the set of reads that are reached during reverse flow and are matched up with stores as those steps are subsequently reached. The subset that do reach a store is collected after the flow recursion in revConsCand (formerly consCand), but due to this filtering it now means that we can have cons candidates for access paths that are no longer possible to construct using just the remaining cons candidates in revConsCand, so we can do a bit of additional filtering (which is now done in this PR). In the example I looked at this filtered just 2 tuples in stage 4, but removing those two tuples allowed us to construct 2 million fewer AccessPaths for the final stage. This was because the expansion of AccessPathApprox to AccessPath assumed that the set of cons candidates didn't include such unreachable tuples and their existence invalidated the calculation of evalUnfold.

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label May 10, 2022
@aschackmull aschackmull requested review from a team as code owners May 10, 2022 08:32
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggestion.

@hvitved
Copy link
Contributor

hvitved commented May 11, 2022

Happy to report that this also appears to resolve a performance issue for C#, caused by the addition of new flow summaries (cc @michaelnebel). This was a similar situation for Java, right?

@hvitved hvitved merged commit 46ab25b into github:main May 11, 2022
@aschackmull aschackmull deleted the dataflow/perf branch May 12, 2022 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants