-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable dynamic filtering by default #2793
enable dynamic filtering by default #2793
Conversation
7c796df
to
43e6420
Compare
Benchmarks (please take a look at CPU more as duration has rather high variability due to cloud env): Benchmarks comparison-dynamic_filtering.pdf Some notes:
|
@@ -405,7 +403,7 @@ public PlanOptimizers( | |||
new CheckSubqueryNodesAreRewritten(), | |||
new StatsRecordingPlanOptimizer( | |||
optimizerStats, | |||
new PredicatePushDown(metadata, typeAnalyzer, false)), | |||
new StatsRecordingPlanOptimizer(optimizerStats, new PredicatePushDown(metadata, typeAnalyzer, false, false))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using a new instance of StatsRecordingPlanOptimizer
for the 3 places where PredicatePushDown
rule is applied, it looks like this will reset the stats each time for this rule (Check OptimizerStatsRecorder#register
). Is that ok ? I think we would want to know the combined time taken of PredicatePushDown
rule in all the invocations or record them all separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will report to same optimizerStats
every time, so it accumulates stats for PredicatePushDown
43e6420
to
63c145b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments.
@@ -279,8 +279,6 @@ public PlanOptimizers( | |||
.add(new RemoveTrivialFilters()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in commit message: "Prediacate"
@@ -407,7 +405,7 @@ public PlanOptimizers( | |||
new CheckSubqueryNodesAreRewritten(), | |||
new StatsRecordingPlanOptimizer( | |||
optimizerStats, | |||
new PredicatePushDown(metadata, typeAnalyzer, false)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... this is going to have to change once we move PredicatePushdown to the iterative optimizer, as it will be interleaved with all other rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I think in this case it might be relatively easy to remove this dependency, so that dynamic filters don't change existing plans. I just didn't want to dig into this particular issue
@@ -2042,7 +2042,7 @@ private PhysicalOperation createLookupJoin( | |||
ImmutableList.Builder<OperatorFactory> factoriesBuilder = new ImmutableList.Builder<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some details to the commit message explaining the motivation? I..e, what about grouped execution makes it incompatible with dynamic filters?
@@ -13,15 +13,13 @@ | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some details to the commit message explaining the motivation?
@@ -117,7 +117,7 @@ public ConnectorPageSource createPageSource(ConnectorTransactionHandle transacti | |||
hiveSplit.getFileSize(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, can you add a comment in the commit message explaining the motivation?
Predicate pushdown pushes dynamic filters down the plan. At the end of planning, dynamic filters are removed from places where they are not valid (e.g build side, nested predicates). Because of this reorder joins should operate on plan that doesn't have dynamic filters as it might cause stats and cost misestimates.
When isEnableDynamicFiltering is disabled, then dynamic filters won't be added by planner.
Dynamic filtering assumes that join build side is evaulated fixed (task_concurrency) amount of times. This constraint is not satisfied when grouped execution is enabled.
Dynamic filter might filter data source entirely. In such case DELETE operation might fail if data source is not of UpdatablePageSource type.
Dynamic filter tuple domains might be relatively large and expensive to evaluate. They must be simplified so that dynamic filters do not incur additional CPU cost if they are not filtering much.
63c145b
to
3541ae2
Compare
No description provided.