Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable dynamic filtering by default #2793

Merged
merged 11 commits into from
Mar 2, 2020

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Feb 11, 2020

No description provided.

@sopel39 sopel39 added the WIP label Feb 11, 2020
@cla-bot cla-bot bot added the cla-signed label Feb 11, 2020
@sopel39 sopel39 force-pushed the ks/enable_dyn_filtering branch 10 times, most recently from 7c796df to 43e6420 Compare February 13, 2020 09:16
@sopel39 sopel39 changed the title [WIP] enable dynamic filtering by default enable dynamic filtering by default Feb 17, 2020
@sopel39 sopel39 removed the WIP label Feb 17, 2020
@sopel39
Copy link
Member Author

sopel39 commented Feb 17, 2020

Benchmarks (please take a look at CPU more as duration has rather high variability due to cloud env):

Benchmarks comparison-dynamic_filtering.pdf

Some notes:

  1. Queries that got considerable perf improvement: tpcds/q16, tpcds/q15, tpcds/q18, tpcds/q20, tpcds/q21, tpcds/q33, tpcds/q39_1, ... and more. Generated data is unstructured (non-partitioned, non-sorted) so there should be more improvements with structured data.
  2. dynamic filtering predicates also carry non null information for symbol. Therefore some outer joins were converted to inner joins (tpcds/q05, tpcds/q93). This explains speedup improvements to q93. We could potentially pushdown dynamic filters to all join branches, so that more outer joins get converted to inner joins. Dynamic filters are removed at the end of planning from places where they shouldn't be (e.b build side).
  3. Because of how plan optimizers work, some join symbols got unaliased differently (e.g right equi-join symbols got unaliased to left equi-join symbols). This isn't directly related to dynamic filters. PR proposal to make unaliasing more deterministic: Remove redundant remote exchanges due to missing unaliasing #2853
  4. There might be slight CPU degradation for tpcds/q23_1, tpcds/23_2. This could be due to different unaliasing in multi-join (using probe side equi symbols as outputs instead of build side equi-symbols). In this case join outputs dictionary blocks which might cause some additional overhead when processed by downstream operators. Preserve unnested dictionary in DictionaryBlock#unnest #2854 is a proposal to optimize dictionary blocks slightly.

@@ -405,7 +403,7 @@ public PlanOptimizers(
new CheckSubqueryNodesAreRewritten(),
new StatsRecordingPlanOptimizer(
optimizerStats,
new PredicatePushDown(metadata, typeAnalyzer, false)),
new StatsRecordingPlanOptimizer(optimizerStats, new PredicatePushDown(metadata, typeAnalyzer, false, false))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using a new instance of StatsRecordingPlanOptimizer for the 3 places where PredicatePushDown rule is applied, it looks like this will reset the stats each time for this rule (Check OptimizerStatsRecorder#register). Is that ok ? I think we would want to know the combined time taken of PredicatePushDown rule in all the invocations or record them all separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will report to same optimizerStats every time, so it accumulates stats for PredicatePushDown

@sopel39 sopel39 assigned findepi and martint and unassigned findepi Feb 22, 2020
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments.

@@ -279,8 +279,6 @@ public PlanOptimizers(
.add(new RemoveTrivialFilters())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in commit message: "Prediacate"

@@ -407,7 +405,7 @@ public PlanOptimizers(
new CheckSubqueryNodesAreRewritten(),
new StatsRecordingPlanOptimizer(
optimizerStats,
new PredicatePushDown(metadata, typeAnalyzer, false)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this is going to have to change once we move PredicatePushdown to the iterative optimizer, as it will be interleaved with all other rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I think in this case it might be relatively easy to remove this dependency, so that dynamic filters don't change existing plans. I just didn't want to dig into this particular issue

@@ -2042,7 +2042,7 @@ private PhysicalOperation createLookupJoin(
ImmutableList.Builder<OperatorFactory> factoriesBuilder = new ImmutableList.Builder<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some details to the commit message explaining the motivation? I..e, what about grouped execution makes it incompatible with dynamic filters?

@@ -13,15 +13,13 @@
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some details to the commit message explaining the motivation?

@@ -117,7 +117,7 @@ public ConnectorPageSource createPageSource(ConnectorTransactionHandle transacti
hiveSplit.getFileSize(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can you add a comment in the commit message explaining the motivation?

Predicate pushdown pushes dynamic filters down
the plan. At the end of planning, dynamic filters
are removed from places where they are not valid (e.g build side,
nested predicates). Because of this reorder joins should
operate on plan that doesn't have dynamic filters as it might
cause stats and cost misestimates.
When isEnableDynamicFiltering is disabled, then
dynamic filters won't be added by planner.
Dynamic filtering assumes that join build side is evaulated fixed
(task_concurrency) amount of times. This constraint is not satisfied
when grouped execution is enabled.
Dynamic filter might filter data source entirely. In such case
DELETE operation might fail if data source is not of UpdatablePageSource
type.
Dynamic filter tuple domains might be relatively large and expensive
to evaluate. They must be simplified so that dynamic filters do not incur
additional CPU cost if they are not filtering much.
@sopel39 sopel39 merged commit ddcbe76 into trinodb:master Mar 2, 2020
@sopel39 sopel39 deleted the ks/enable_dyn_filtering branch March 2, 2020 11:54
@sopel39 sopel39 mentioned this pull request Mar 2, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants