-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Support of Adaptive Optimization with Presto Unlimited #14675
Conversation
a5947a8
to
1d5edc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing the last three commits, but if the first two commits are ready sooner, we can merge them first.
presto-main/src/main/java/com/facebook/presto/cost/StatsUtil.java
Outdated
Show resolved
Hide resolved
|
||
Map<PlanFragment, PlanFragment> oldToNewFragment = stream(forTree(StreamingSubPlan::getChildren).depthFirstPreOrder(section.getPlan())) | ||
// filter leaf stages | ||
.filter(plan -> plan.getChildren().isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally all this logic to filter relevant plans should go into the optimizer rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about moving all the filtering logic inside the rule, but my concern is it seems not possible to easily tell from the iterative optimizer output if the returned plan is actually changed or not. If we are not able to tell, then for each ready section, we will waste time rebuilding the whole section and stageExecution/scheduler etc. even the optimizer rule is not fired at all.
@@ -91,6 +91,12 @@ public static HiveBasicStatistics reduce(HiveBasicStatistics first, HiveBasicSta | |||
|
|||
public static Map<String, HiveColumnStatistics> merge(Map<String, HiveColumnStatistics> first, Map<String, HiveColumnStatistics> second) | |||
{ | |||
// To correctly merge statistics during temporary table finish insertion. When ``first'' have exactly the same columns as ``second'' but all empty statistics, | |||
// then the ``first'' is the placeholder empty statistics left at temporary table creation and safe to directly return second. | |||
if (first.values().stream().allMatch(statistics -> statistics.equals(HiveColumnStatistics.empty())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arhimondr can you check this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about instead of adding this logic we set all statistics to 0 when creating a temporary table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First two commits:
Fix StatsUtil function ...
Enable statistics aggregation in temp
Some comments
Please make sure the commit message complies with our commit message style guidelines:
https://github.com/prestodb/presto/wiki/Review-and-Commit-guidelines#commit-formatting-and-pull-requests
https://chris.beams.io/posts/git-commit/
More specifically the commit summary should be ~50 characters. You can have mode descriptive message in the body. For example:
Enable statistics collection for temporary tables
Enable column level statistics collection when writing intermediate data
to temporary table for materialized exchanges. Collected statistics will
be used to automatically change join order in runtime.
presto-main/src/main/java/com/facebook/presto/cost/ConnectorFilterStatsCalculatorService.java
Outdated
Show resolved
Hide resolved
@@ -91,6 +91,12 @@ public static HiveBasicStatistics reduce(HiveBasicStatistics first, HiveBasicSta | |||
|
|||
public static Map<String, HiveColumnStatistics> merge(Map<String, HiveColumnStatistics> first, Map<String, HiveColumnStatistics> second) | |||
{ | |||
// To correctly merge statistics during temporary table finish insertion. When ``first'' have exactly the same columns as ``second'' but all empty statistics, | |||
// then the ``first'' is the placeholder empty statistics left at temporary table creation and safe to directly return second. | |||
if (first.values().stream().allMatch(statistics -> statistics.equals(HiveColumnStatistics.empty())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about instead of adding this logic we set all statistics to 0 when creating a temporary table?
@@ -91,6 +91,12 @@ public static HiveBasicStatistics reduce(HiveBasicStatistics first, HiveBasicSta | |||
|
|||
public static Map<String, HiveColumnStatistics> merge(Map<String, HiveColumnStatistics> first, Map<String, HiveColumnStatistics> second) | |||
{ | |||
// To correctly merge statistics during temporary table finish insertion. When ``first'' have exactly the same columns as ``second'' but all empty statistics, | |||
// then the ``first'' is the placeholder empty statistics left at temporary table creation and safe to directly return second. | |||
if (first.values().stream().allMatch(statistics -> statistics.equals(HiveColumnStatistics.empty())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/StatisticsAggregationPlanner.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/StatisticsAggregationPlanner.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/StatisticsAggregationPlanner.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java
Outdated
Show resolved
Hide resolved
It feels like on the high level collecting the stats for each and ever column is a little overkill. Given that the planner does the pruning of all unnecessary columns it is certain that all the columns will be used. Ideally we should simply collect the overall size for all the inputs and use it as an input for the CBO algorithm. It will not give the best results in case there are more nodes before the JoinNode, but historically we've seen that the estimates are not very reliable anyway. Thus I'm not very confident if it makes sense to pay the cost of collecting the detailed stats that are much more expensive, as usually there will be no additional nodes before the join, and if there are - we are not very confident in the estimates anyway. Anyway, changing this will go beyond the scope of this project. But it feels like it is something we should consider doing in the feature. |
71d7387
to
369b6dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable statistics aggregation for temporary table.
LGTM
b42a58c
to
3c33361
Compare
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
...in/src/main/java/com/facebook/presto/sql/planner/iterative/rule/RuntimeReorderJoinSides.java
Outdated
Show resolved
Hide resolved
...in/src/main/java/com/facebook/presto/sql/planner/iterative/rule/RuntimeReorderJoinSides.java
Outdated
Show resolved
Hide resolved
...in/src/main/java/com/facebook/presto/sql/planner/iterative/rule/RuntimeReorderJoinSides.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
Would this benefit from integration tests for SqlQueryScheduler and LegacySqlQueryScheduler? |
I suppose integration test could further verify the correctness of invoking runtime CBO. Ideally, we might also want to use some queries in prod that are currently ill-optimized to see if CBO can fix the join sides in runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/IterativeOptimizer.java
Show resolved
Hide resolved
...in/src/main/java/com/facebook/presto/sql/planner/iterative/rule/RuntimeReorderJoinSides.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
@@ -591,7 +706,7 @@ public BasicStageExecutionStats getBasicStageStats() | |||
public StageInfo getStageInfo() | |||
{ | |||
ListMultimap<StageId, SqlStageExecution> stageExecutions = getStageExecutions(); | |||
return buildStageInfo(plan, stageExecutions); | |||
return buildStageInfo(plan.get(), stageExecutions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to break statistics
CC: @rschlussel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pguofb is looking at that next. Right now the info in the live plan will be incorrect, but the final plan in the querycompletedevent and the stage info will be correct.
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
e3b6cd8
to
d03265a
Compare
Nit about commit header: No period at the end :). See the seven rules: https://chris.beams.io/posts/git-commit/#seven-rules 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Add a session property for runtime optimizer". LGTM. Note "adaptive optimizer" is probably a more sexy name than "runtime optimizer". 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Invoke CBO at SqlQueryScheduler for Join Swapping". Generally looks good to me. I will take a separate look into RuntimeReorderJoinSides
later. But don't be blocked on my pass if you think it's ready to merge.
Just to confirm my understanding: when adaptive optimization kicked in legacy scheduler, the logic of creating new sections actually leverage what implements in the new scheduler -- this makes sense since section execution abstraction is only introduced in new scheduler, and it will be much more difficult to do adaptive execution in legacy scheduler.
Now here is an interesting note, when legacy scheduler works together with adaptive execution, the behavior will be a mix of "legacy" and "new" scheduler right? i.e. the code path follows legacy scheduler when adaptive execution hasn't kicked in. But once adaptive execution kicked, the code logic is more like the section retry in the new scheduler. 😃
cc @rschlussel
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
outputBuffers = createDiscardingOutputBuffers(); | ||
locationsConsumer = (fragmentId, tasks, noMoreExchangeLocations) -> {}; | ||
} | ||
SectionExecution sectionExecution = sectionExecutionFactory.createSectionExecutions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel : Is SectionExecutionFactory
introduced for the new sql query scheduler? So essentially we are calling something in the new scheduler from the legacy scheduler? ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sectionexecutionfactory is already used in the legacySqlQueryScheduler (and the logic was originally extracted from there)
remoteTaskFactory, | ||
splitSourceFactory, | ||
0); | ||
addStateChangeListeners(sectionExecution); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to remove the state change listeners for old sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about this. When we created new stageExecutions and replace the old ones in this.stageExecutions
, these old stageExecutions will have nothing referencing to them and will be garbage collected eventually without getting triggered. Therefore, I'm not too worried about explicitly removing listeners. Besides, stateMachine class also does not provide methods to explicitly remove listeners, so I think it is ok to trust on the GC to do its own job :)
Map<StageId, StageExecutionAndScheduler> updatedStageExecutions = sectionExecution.getSectionStages().stream() | ||
.collect(toImmutableMap(execution -> execution.getStageExecution().getStageExecutionId().getStageId(), identity())); | ||
synchronized (this) { | ||
stageExecutions.putAll(updatedStageExecutions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: do we need to remove old stage executions from stageExecutions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Samewise, stageExecutions is a map from StageId -> StageExecutionAndScheduler. The rewritten stages have the same id from the old one (a bit different from retry because we optimize them right before we schedule & execute them), and so we actually replace the old ones, not simply inserting a batch of new ones.
presto-main/src/main/java/com/facebook/presto/execution/scheduler/LegacySqlQueryScheduler.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/execution/SqlQueryExecution.java
Outdated
Show resolved
Hide resolved
two high level questions:
|
Actually, according to my project plan, the next step (after enable runtime join swapping) is to fix the statistics show-up in the UI. And, I'm currently starting to look at this part.
|
The latest commit fixes the broken statistics in LivePlan. Now, both the the Query Completion Event as well as the Live Plan shown during runtime (essentially via periodically issuing a http GET to |
@@ -190,6 +196,7 @@ private LegacySqlQueryScheduler( | |||
this.queryStateMachine = requireNonNull(queryStateMachine, "queryStateMachine is null"); | |||
this.plan.compareAndSet(null, requireNonNull(plan, "plan is null")); | |||
this.session = requireNonNull(session, "session is null"); | |||
this.metadata = requireNonNull(metadata, "metadata is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just get the functionmanager from the metadata since that's all you need (and same below)
@wenleix @arhimondr did you have any other comments or concerns? |
@pguofb :
Sounds good. What about the plan json in the query completion event? :) Update: sorry , just see you latest comment (#14675 (comment)) . Nice work! |
thanks @rschlussel and @pguofb . I have no other comments :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits
private final SectionExecutionFactory sectionExecutionFactory; | ||
private final RemoteTaskFactory remoteTaskFactory; | ||
private final SplitSourceFactory splitSourceFactory; | ||
private static final Logger log = Logger.get(LegacySqlQueryScheduler.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Usually we define log
as a very first field in the class and separate it from the other fields
public class RuntimeReorderJoinSides | ||
implements Rule<JoinNode> | ||
{ | ||
private final Logger log = Logger.get(RuntimeReorderJoinSides.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final
swapped.getRightHashVariable(), | ||
swapped.getDistributionType()); | ||
|
||
log.debug("Probe size: " + leftOutputSizeInBytes + " is smaller than Build size: " + rightOutputSizeInBytes + " => invoke runtime join swapping on JoinNode ID: " + newJoinNode.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pattern Probe size: %s is smaller than build size: %s => ...
45f9a57
to
8752111
Compare
Looks great! can you squash the last 4 commits together? |
SubPlan fragmentedPlan = planFragmenter.createSubPlans(stateMachine.getSession(), plan, false, idAllocator, stateMachine.getWarningCollector()); | ||
// the variableAllocator is finally passed to SqlQueryScheduler for runtime cost-based optimizations | ||
variableAllocator.set(new PlanVariableAllocator(plan.getTypes().allVariables())); | ||
SubPlan fragmentedPlan = planFragmenter.createSubPlans(stateMachine.getSession(), plan, false, idAllocator, variableAllocator.get(), stateMachine.getWarningCollector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to pass the variableAllocator here - createSubPlans creates a variableallocator in the exact same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, createSubPlans internally create a variableAllocator this way, but we want this variableAllocator to be exposed after creating the subplans, and then use it for runtime optimizers. Therefore, we moved the creation outside here, pass it in as an argument, and later on can feed it in SqlQueryScheduler.
cad2c36
to
e60d5a3
Compare
- Pass in CBO and make it invokable at [Legacy/]SqlQueryScheduler during runtime. - Rename get() method of PlanOptimizers to getPlanningTimeOptimizers() to distinguish getRuntimeOptimizers() - Adjust IterativeOptimizer optimize() function to return the original plan when the plan is not changed, instead of always calling memo.extract() to rebuild a new one. - Create a join swapping rule (RuntimeReorderJoinSides) based on the probe and build side statistics, and adjust the local exchange when necessary (add at build side, remove at probe side). - Rebuild the section, re-generate stageExecutionAndSchedulers of the section, and adjust the overall subplan when the join is swapped to reflect the correct statistics to web UI and QueryCompletionEvent.
Background: Presto Unlimited will materialize exchange outputs into temporary tables (see #12387) , which creates an opportunity to invoke CBO during runtime on later stages based on temporary table statistics generated by previous stages. This will result in more reliable optimizations for complex queries whose later stages often have less accurate estimated statistics.
Specifically, this PR achieves the following goals.
Basic testing:
== NO RELEASE NOTE ==