-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Fix bad join in import_star_read
#8581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Fix bad join in import_star_read
#8581
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solutions look reasonable, and numbers look very persuasive :-)
@@ -2125,7 +2125,7 @@ module Conditionals { | |||
/** INTERNAL: Do not use. */ | |||
predicate declaredAttributeVar(PythonClassObjectInternal cls, string name, EssaVariable var) { | |||
name = var.getName() and | |||
var.getAUse() = cls.getScope().getANormalExit() | |||
pragma[only_bind_into](pragma[only_bind_into](var).getAUse()) = cls.getScope().getANormalExit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting construction :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not the most elegant construction, but a single pragma
just didn't do the trick.
Do you want to wait for a DCA run to see the full effect? |
Yeah. I don't expect these changes to have bad consequences, but there's no reason to chance it. |
The performance data is in, and I didn't find any hidden surprises. |
Makes this ``` (21s) Tuple counts for DataFlowPublic::import_star_read#ff/2@fcd5e6nr after 8.5s: 9743 ~6% {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result' 9743 ~1% {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result' 390808917 ~3% {3} r3 = JOIN r2 WITH Flow::NameNode::getId_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2 'result' 307 ~0% {2} r4 = JOIN r3 WITH ImportStar::ImportStar::importStarResolvesTo#ff ON FIRST 2 OUTPUT Lhs.0, Lhs.2 'result' 307 ~0% {2} r5 = JOIN r4 WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Rhs.1 'n', Lhs.1 'result' return r5 ``` become this ``` (17s) Tuple counts for DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic after 0ms: 307 ~0% {3} r1 = JOIN ImportStar::ImportStar::importStarResolvesTo#ff WITH num#DataFlowPublic::TCfgNode#ff ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'm', Rhs.1 'n' 307 ~0% {3} r2 = JOIN r1 WITH Flow::NameNode::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'm', Rhs.1 'name', Lhs.2 'n' return r2 (17s) Registering DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic + with content f29281ig38r98icro4ege09mrva (17s) >>> Created relation DataFlowPublic::resolved_import_star_module#fff/3@f5e84aic with 307 rows. (17s) Starting to evaluate predicate DataFlowPublic::import_star_read#ff/2@57b0c06e (17s) Tuple counts for DataFlowPublic::import_star_read#ff/2@57b0c06e after 2ms: 9743 ~0% {3} r1 = SCAN num#DataFlowPublic::TModuleVariableNode#fff OUTPUT In.1, In.0, In.2 'result' 9743 ~0% {3} r2 = JOIN r1 WITH Variables::Variable::getId_dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.2 'result' 307 ~0% {2} r3 = JOIN r2 WITH DataFlowPublic::resolved_import_star_module#fff ON FIRST 2 OUTPUT Rhs.2 'n', Lhs.2 'result' return r3 ```
Before: ``` Tuple counts for PointsTo::declaredAttributeVar#fbf/3@99d5aenq after 1.1s: 451054 ~7% {2} r1 = SCAN variable OUTPUT In.0, In.2 'name' 1296149 ~0% {2} r2 = JOIN r1 WITH Essa::EssaVariable::getSourceVariable_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'name' 12179900 ~4% {3} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.0 'var' 8028 ~2% {3} r4 = JOIN r3 WITH Scope::Scope::getANormalExit_dispred#bf_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'name', Lhs.2 'var' 8028 ~2% {3} r5 = JOIN r4 WITH Classes::PythonClassObjectInternal::getScope_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'cls', Lhs.1 'name', Lhs.2 'var' return r5 ``` After: ``` Tuple counts for PointsTo::declaredAttributeVar#fbf/3@cccf36hb after 4ms: 1450 ~0% {2} r1 = SCAN Classes::PythonClassObjectInternal::getScope_dispred#ff OUTPUT In.1, In.0 'cls' 1450 ~7% {2} r2 = JOIN r1 WITH Scope::Scope::getANormalExit_dispred#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls' 8028 ~0% {2} r3 = JOIN r2 WITH Essa::EssaVariable::getAUse_dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'var', Lhs.1 'cls' 8028 ~0% {3} r4 = JOIN r3 WITH Essa::EssaVariable::getSourceVariable_dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'cls', Lhs.0 'var' 8028 ~2% {3} r5 = JOIN r4 WITH variable ON FIRST 1 OUTPUT Lhs.1 'cls', Rhs.2 'name', Lhs.2 'var' return r5 ```
Before: ``` Tuple counts for Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs/3@7bc202ij after 9s: 1 ~0% {1} r1 = CONSTANT(unique int)[2] 4244385 ~2% {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0' 4244352 ~3% {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.0 'arg0', Rhs.2 'arg2' 66618690 ~3% {5} r4 = JOIN r3 WITH AstGenerated::Call_::getNamedArg_dispred#ffb ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Lhs.2 'arg2', Rhs.1, Rhs.2 31187133 ~0% {5} r5 = SELECT r4 ON In.3 < In.2 'arg2' 31187133 ~1% {5} r6 = SCAN r5 OUTPUT In.4, 0, In.0 'arg0', In.1 'arg1', In.2 'arg2' 0 ~0% {3} r7 = JOIN r6 WITH py_dict_items ON FIRST 2 OUTPUT Lhs.2 'arg0', Lhs.3 'arg1', Lhs.4 'arg2' return r7 Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@1dc9468b after 421ms: 1 ~0% {1} r1 = CONSTANT(unique int)[2] 4244385 ~2% {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result' 4244352 ~0% {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2 4244352 ~0% {3} r4 = r3 AND NOT Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs(Lhs.0 'result', Lhs.1 'this', Lhs.2) 4244352 ~6% {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result' return r5 ``` Oof. All that work to produce zero tuples. Luckily we can improve matters somewhat. Basically, there's no reason to test _all_ dictionary unpackings, since we're only interested in a lower bound. Thus, we can use `min` instead which is much more efficient. For convenience I factored this into its own (private) helper predicate. Now the tuple counts look as follows: ``` Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm after 1ms: 246 ~0% {2} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1' return r1 Registering Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm + with content 9ea2f123k8necpu015v6tpsc2t1 >>> Created relation Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm with 246 rows. Starting to evaluate predicate Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8 Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8 after 0ms: 246 ~2% {3} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg2', Rhs.2 'arg2' return r1 Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@000a0alb after 906ms: 1 ~0% {1} r1 = CONSTANT(unique int)[2] 4244385 ~2% {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result' 4244352 ~0% {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2 4244280 ~0% {3} r4 = r3 AND NOT Exprs::Call::getMinimumUnpackingIndex_dispred#ff_0#antijoin_rhs(Lhs.1 'this') 4244280 ~6% {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result' 4244352 ~3% {3} r6 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'this', Lhs.0 'result', Rhs.2 72 ~4% {4} r7 = JOIN r6 WITH Exprs::Call::getMinimumUnpackingIndex_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'result', Lhs.0 'this', Lhs.2, Rhs.1 72 ~4% {4} r8 = SELECT r7 ON In.2 <= In.3 72 ~0% {2} r9 = SCAN r8 OUTPUT In.1 'this', In.0 'result' 4244352 ~6% {2} r10 = r5 UNION r9 return r10 ``` This is not the perfect join order (note the similarity between `r3` and `r6`) but overall it's a win.
979f47d
to
95d2354
Compare
This had gotten a bit stale, so I rebased and restarted the performance tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New evaluation looks fine, so ready to merge this.
Makes this
become this
Needs performance evaluation. No change note required.