Skip to content

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Mar 28, 2022

Makes this

(21s) Tuple counts for DataFlowPublic::import_star_read#ff/2@fcd5e6nr after 8.5s:
9743      ~6%     {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result'
9743      ~1%     {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result'
390808917 ~3%     {3} r3 = JOIN r2 WITH Flow::NameNode::getId_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result'
307       ~0%     {2} r4 = JOIN r3 WITH ImportStar::ImportStar::importStarResolvesTo#ff ON FIRST 2 OUTPUT Lhs.0, Lhs.2 'result'
307       ~0%     {2} r5 = JOIN r4 WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Rhs.1 'n', Lhs.1 'result'
                  return r5

become this

(17s) Tuple counts for DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic after 0ms:
307 ~0%     {3} r1 = JOIN ImportStar::ImportStar::importStarResolvesTo#ff WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'm', Rhs.1 'n'
307 ~0%     {3} r2 = JOIN r1 WITH Flow::NameNode::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'm', Rhs.1 'name', Lhs.2 'n'
            return r2
(17s) Registering DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic +  with content f29281ig38r98icro4ege09mrva
(17s)  >>> Created relation DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic with 307 rows.
(17s) Starting to evaluate predicate DataFlowPublic::import_star_read#ff/2@57b0c06e
(17s) Tuple counts for DataFlowPublic::import_star_read#ff/2@57b0c06e after 2ms:
9743 ~0%     {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result'
9743 ~0%     {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2 'result'
307  ~0%     {2} r3 = JOIN r2 WITH DataFlowPublic::resolved_import_star_module#fff ON FIRST 2 OUTPUT Rhs.2 'n', Lhs.2 'result'
             return r3

Needs performance evaluation. No change note required.

@tausbn tausbn added Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish no-change-note-required This PR does not need a change note labels Mar 28, 2022
@tausbn tausbn requested a review from a team as a code owner March 28, 2022 14:29
yoff
yoff previously approved these changes Mar 29, 2022
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solutions look reasonable, and numbers look very persuasive :-)

@@ -2125,7 +2125,7 @@ module Conditionals {
/** INTERNAL: Do not use. */
predicate declaredAttributeVar(PythonClassObjectInternal cls, string name, EssaVariable var) {
name = var.getName() and
var.getAUse() = cls.getScope().getANormalExit()
pragma[only_bind_into](pragma[only_bind_into](var).getAUse()) = cls.getScope().getANormalExit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting construction :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not the most elegant construction, but a single pragma just didn't do the trick.

@yoff
Copy link
Contributor

yoff commented Mar 29, 2022

Do you want to wait for a DCA run to see the full effect?

@tausbn
Copy link
Contributor Author

tausbn commented Mar 29, 2022

Do you want to wait for a DCA run to see the full effect?

Yeah. I don't expect these changes to have bad consequences, but there's no reason to chance it.

@tausbn
Copy link
Contributor Author

tausbn commented Apr 1, 2022

The performance data is in, and I didn't find any hidden surprises.

tausbn added 3 commits April 28, 2022 11:11
Makes this

```
(21s) Tuple counts for DataFlowPublic::import_star_read#ff/2@fcd5e6nr after 8.5s:
9743      ~6%     {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result'
9743      ~1%     {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result'
390808917 ~3%     {3} r3 = JOIN r2 WITH Flow::NameNode::getId_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result'
307       ~0%     {2} r4 = JOIN r3 WITH ImportStar::ImportStar::importStarResolvesTo#ff ON FIRST 2 OUTPUT Lhs.0, Lhs.2 'result'
307       ~0%     {2} r5 = JOIN r4 WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Rhs.1 'n', Lhs.1 'result'
                  return r5
```

become this

```
(17s) Tuple counts for DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic after 0ms:
307 ~0%     {3} r1 = JOIN ImportStar::ImportStar::importStarResolvesTo#ff WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'm', Rhs.1 'n'
307 ~0%     {3} r2 = JOIN r1 WITH Flow::NameNode::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'm', Rhs.1 'name', Lhs.2 'n'
            return r2
(17s) Registering DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic +  with content f29281ig38r98icro4ege09mrva
(17s)  >>> Created relation DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic with 307 rows.
(17s) Starting to evaluate predicate DataFlowPublic::import_star_read#ff/2@57b0c06e
(17s) Tuple counts for DataFlowPublic::import_star_read#ff/2@57b0c06e after 2ms:
9743 ~0%     {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result'
9743 ~0%     {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2 'result'
307  ~0%     {2} r3 = JOIN r2 WITH DataFlowPublic::resolved_import_star_module#fff ON FIRST 2 OUTPUT Rhs.2 'n', Lhs.2 'result'
             return r3
```
Before:
```
Tuple counts for PointsTo::declaredAttributeVar#fbf/3@99d5aenq after 1.1s:
451054   ~7%     {2} r1 = SCAN variable OUTPUT In.0, In.2 'name'
1296149  ~0%     {2} r2 = JOIN r1 WITH Essa::EssaVariable::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'name'
12179900 ~4%     {3} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.0 'var'
8028     ~2%     {3} r4 = JOIN r3 WITH Scope::Scope::getANormalExit_dispred#bf_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.2 'var'
8028     ~2%     {3} r5 = JOIN r4 WITH Classes::PythonClassObjectInternal::getScope_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'var'
                 return r5
```

After:
```
Tuple counts for PointsTo::declaredAttributeVar#fbf/3@cccf36hb after 4ms:
1450 ~0%     {2} r1 = SCAN Classes::PythonClassObjectInternal::getScope_dispred#ff OUTPUT In.1, In.0 'cls'
1450 ~7%     {2} r2 = JOIN r1 WITH Scope::Scope::getANormalExit_dispred#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls'
8028 ~0%     {2} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'cls'
8028 ~0%     {3} r4 = JOIN r3 WITH Essa::EssaVariable::getSourceVariable_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls', Lhs.0 'var'
8028 ~2%     {3} r5 = JOIN r4 WITH variable ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.2 'name', Lhs.2 'var'
return r5
```
Before:

```
Tuple counts for Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs/3@7bc202ij after 9s:
1        ~0%     {1} r1 = CONSTANT(unique int)[2]
4244385  ~2%     {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0'
4244352  ~3%     {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.0 'arg0', Rhs.2 'arg2'
66618690 ~3%     {5} r4 = JOIN r3 WITH AstGenerated::Call_::getNamedArg_dispred#ffb ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Lhs.2 'arg2', Rhs.1, Rhs.2
31187133 ~0%     {5} r5 = SELECT r4 ON In.3 < In.2 'arg2'
31187133 ~1%     {5} r6 = SCAN r5 OUTPUT In.4, 0, In.0 'arg0', In.1 'arg1', In.2 'arg2'
0        ~0%     {3} r7 = JOIN r6 WITH py_dict_items ON FIRST 2 OUTPUT Lhs.2 'arg0', Lhs.3 'arg1', Lhs.4 'arg2'
                 return r7

Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@1dc9468b after 421ms:
1       ~0%     {1} r1 = CONSTANT(unique int)[2]
4244385 ~2%     {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result'
4244352 ~0%     {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2
4244352 ~0%     {3} r4 = r3 AND NOT Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs(Lhs.0 'result', Lhs.1 'this', Lhs.2)
4244352 ~6%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'
                return r5
```

Oof. All that work to produce zero tuples. Luckily we can improve
matters somewhat.

Basically, there's no reason to test _all_ dictionary unpackings, since
we're only interested in a lower bound. Thus, we can use `min` instead
which is much more efficient. For convenience I factored this into its
own (private) helper predicate.

Now the tuple counts look as follows:

```
Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm after 1ms:
246 ~0%     {2} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1'
            return r1
Registering Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm +  with content 9ea2f123k8necpu015v6tpsc2t1
 >>> Created relation Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm with 246 rows.
Starting to evaluate predicate Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8
Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8 after 0ms:
246 ~2%     {3} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg2', Rhs.2 'arg2'
            return r1

Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@000a0alb after 906ms:
1       ~0%     {1} r1 = CONSTANT(unique int)[2]
4244385 ~2%     {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result'

4244352 ~0%     {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2
4244280 ~0%     {3} r4 = r3 AND NOT Exprs::Call::getMinimumUnpackingIndex_dispred#ff_0#antijoin_rhs(Lhs.1 'this')
4244280 ~6%     {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'

4244352 ~3%     {3} r6 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'this', Lhs.0 'result', Rhs.2
72      ~4%     {4} r7 = JOIN r6 WITH Exprs::Call::getMinimumUnpackingIndex_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'result', Lhs.0 'this', Lhs.2, Rhs.1
72      ~4%     {4} r8 = SELECT r7 ON In.2 <= In.3
72      ~0%     {2} r9 = SCAN r8 OUTPUT In.1 'this', In.0 'result'

4244352 ~6%     {2} r10 = r5 UNION r9
                return r10
```

This is not the perfect join order (note the similarity between `r3`
and `r6`) but overall it's a win.
@tausbn tausbn force-pushed the python-fix-bad-join-in-import_star_read branch from 979f47d to 95d2354 Compare April 28, 2022 11:23
@tausbn
Copy link
Contributor Author

tausbn commented Apr 28, 2022

This had gotten a bit stale, so I rebased and restarted the performance tests.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New evaluation looks fine, so ready to merge this.

@yoff yoff merged commit 7efb4ab into github:main Apr 29, 2022
@tausbn tausbn deleted the python-fix-bad-join-in-import_star_read branch June 2, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants