Skip to content

Dataflow: Misc performance fixes #15822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2024

Conversation

aschackmull
Copy link
Contributor

Two commits fixing two separate problems.

The first commit improves the join of returnFlowsThrough, flowIntoCallApaTaken, and fwdFlow. The join must happen in that order (digging into the git history shows prior join-order tweaks in this predicate documenting this), but it still has an unfortunate blow-up, which we can fortunately cut substantially down by including the argApa column in the first join.

Improves from

2024-03-04 13:29:20] Evaluated non-recursive predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@6dd478n9 in 126ms (size: 398332).
Evaluated relational algebra for predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@6dd478n9 with tuple counts:
              1  ~0%    {2} r1 = SCAN `DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::TAccessPathApproxNone#dom#04382804` OUTPUT _, _
              1  ~0%    {0}    | REWRITE WITH Tmp.0 := true, Tmp.1 := false, TEST Tmp.0 != Tmp.1 KEEPING 0
          83798  ~0%    {4}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::returnFlowsThrough/8#ffafcf14` CARTESIAN PRODUCT OUTPUT Rhs.0, Rhs.3, Rhs.1, Rhs.2
        4044102  ~3%    {7}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowIntoCallApaTaken/6#d989a8d1#cpe#12346_2013#join_rhs` ON FIRST 1 OUTPUT Rhs.2, Lhs.2, Lhs.3, Rhs.3, Lhs.1, Lhs.0, Rhs.1
         398332  ~3%    {6}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::fwdFlow/9#00ae2fc8#2` ON FIRST 4 OUTPUT Lhs.6, Lhs.0, Lhs.5, _, Lhs.2, Lhs.4
         398332  ~1%    {6}    | REWRITE WITH Out.3 := true
                        return r1

to

[2024-03-04 15:20:26] Evaluated non-recursive predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@97bd358u in 35ms (size: 398332).
Evaluated relational algebra for predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@97bd358u with tuple counts:
         83798   ~0%    {7} r1 = SCAN `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::returnFlowsThrough/9#53894c55` OUTPUT In.0, In.1, In.2, In.3, In.4, _, _
                        {5}    | REWRITE WITH Tmp.5 := true, Tmp.6 := false, TEST Tmp.5 != Tmp.6 KEEPING 5
         83798   ~3%    {5}    | SCAN OUTPUT In.0, In.3, In.4, In.1, In.2
        416847   ~2%    {7}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowIntoCallApaTaken/6#d989a8d1#cpe#12346_2301#join_rhs` ON FIRST 2 OUTPUT Rhs.3, Lhs.3, Lhs.4, Lhs.1, Lhs.2, Lhs.0, Rhs.2
        398332   ~3%    {6}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::fwdFlow/9#00ae2fc8#2` ON FIRST 4 OUTPUT Lhs.6, Lhs.0, Lhs.5, _, Lhs.2, Lhs.4
        398332   ~1%    {6}    | REWRITE WITH Out.3 := true
                        return r1

The second commit is a straight-forward join-order fix, where we need to join with the functional getApprox before filtering with revFlow, as we otherwise get a blowup.

Improves from
2024-03-04 13:29:20] Evaluated non-recursive predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@6dd478n9 in 126ms (size: 398332).
Evaluated relational algebra for predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@6dd478n9 with tuple counts:
              1  ~0%    {2} r1 = SCAN `DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::TAccessPathApproxNone#dom#04382804` OUTPUT _, _
              1  ~0%    {0}    | REWRITE WITH Tmp.0 := true, Tmp.1 := false, TEST Tmp.0 != Tmp.1 KEEPING 0
          83798  ~0%    {4}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::returnFlowsThrough/8#ffafcf14` CARTESIAN PRODUCT OUTPUT Rhs.0, Rhs.3, Rhs.1, Rhs.2
        4044102  ~3%    {7}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowIntoCallApaTaken/6#d989a8d1#cpe#12346_2013#join_rhs` ON FIRST 1 OUTPUT Rhs.2, Lhs.2, Lhs.3, Rhs.3, Lhs.1, Lhs.0, Rhs.1
         398332  ~3%    {6}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::fwdFlow/9#00ae2fc8#2` ON FIRST 4 OUTPUT Lhs.6, Lhs.0, Lhs.5, _, Lhs.2, Lhs.4
         398332  ~1%    {6}    | REWRITE WITH Out.3 := true
                        return r1
to
[2024-03-04 15:20:26] Evaluated non-recursive predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@97bd358u in 35ms (size: 398332).
Evaluated relational algebra for predicate DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowThroughIntoCall/6#b44155c7@97bd358u with tuple counts:
         83798   ~0%    {7} r1 = SCAN `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::returnFlowsThrough/9#53894c55` OUTPUT In.0, In.1, In.2, In.3, In.4, _, _
                        {5}    | REWRITE WITH Tmp.5 := true, Tmp.6 := false, TEST Tmp.5 != Tmp.6 KEEPING 5
         83798   ~3%    {5}    | SCAN OUTPUT In.0, In.3, In.4, In.1, In.2
        416847   ~2%    {7}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::flowIntoCallApaTaken/6#d989a8d1#cpe#12346_2301#join_rhs` ON FIRST 2 OUTPUT Rhs.3, Lhs.3, Lhs.4, Lhs.1, Lhs.2, Lhs.0, Rhs.2
        398332   ~3%    {6}    | JOIN WITH `project#DataFlowImpl::Impl<TaintedPath::TaintedPath::C>::Stage5::fwdFlow/9#00ae2fc8#2` ON FIRST 4 OUTPUT Lhs.6, Lhs.0, Lhs.5, _, Lhs.2, Lhs.4
        398332   ~1%    {6}    | REWRITE WITH Out.3 := true
                        return r1
Join with the functional getApprox before filtering with revFlow as this
is always better.
@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Mar 6, 2024
@aschackmull
Copy link
Contributor Author

dca is uneventful

@aschackmull aschackmull merged commit f3a381f into github:main Mar 7, 2024
@aschackmull aschackmull deleted the dataflow/perf-fixes branch March 7, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFlow Library no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants