New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set optimizer.force-single-node-output to false by default #13217
Set optimizer.force-single-node-output to false by default #13217
Conversation
b1d550f
to
87056bb
Compare
Stage execution could remain in the same SCHEDULING state while already running some tasks and not transition the state further until more tasks are scheduled.
With this setting enabled the optimizer will insert an additional exchange to make sure the coordinator always consumes query results from a single task. The exchange client used by the coordinator can consume results from as many tasks and stages as needed. The setting was introduced in 2017 by trinodb@6143485 as a transitional setting to migrate from introducing an additional exchange but has never been set to false by default.
87056bb
to
32d1f29
Compare
Will it cause coordinator to perform actual computations (e.g. final aggregations?). If so, then IMO we should keep it as is. |
I agree. @arhimondr can you verify that after the change, no computations (aggregation, non-identity projection, filter, etc.) get scheduled on the coordinator? |
@sopel39 @findepi When the
However coordinator can consume query results from any number of nodes with no issues. Here's how the plan looks like for an identical query when
For queries with the top most stage of
Here is the plan when the
|
The |
@@ -1042,6 +1040,9 @@ public void schedule() | |||
ImmutableMultimap.of()); | |||
stageExecution.schedulingComplete(); | |||
remoteTask.ifPresent(task -> coordinatorTaskManager.addSourceTaskFailureListener(task.getTaskId(), failureReporter)); | |||
if (queryStateMachine.getQueryState() == STARTING && remoteTask.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why just STARTING
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of states is STARTING
-> RUNNING
-> FINSHING -> FINISHED | FAILED
. If it's already in a later stage there's no need to transition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically you do not need if (queryStateMachine.getQueryState() == STARTING)
at all as queryStateMachine.transitionToRunning
does the check (it does queryState.setIf(RUNNING, currentState -> currentState.ordinal() < RUNNING.ordinal());
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second commit ok. I need some explanation on the first one.
@@ -1577,6 +1575,10 @@ public void schedule() | |||
ScheduleResult result = stageSchedulers.get(stageExecution.getStageId()) | |||
.schedule(); | |||
|
|||
if (stateMachine.getState() == DistributedStagesSchedulerState.PLANNED && stageExecution.getAllTasks().size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here: if (stateMachine.getState() == DistributedStagesSchedulerState.PLANNED
is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid calling stageExecution.getAllTasks()
every time as it has to acquire a list of tasks under a lock and made the similar code in the coordinator scheduler consistent with this. Likely it won't be a problem, but at the same time I wonder if it introduces enough confusion to justify extra CPU cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Keep it then
Description
With this setting enabled the optimizer will insert an additional exchange
to make sure the coordinator always consumes query results from a single
task. The exchange client used by the coordinator can consume results from
as many tasks and stages as needed. The setting was introduced in 2017
by 6143485
as a transitional setting to migrate from introducing an additional
exchange but has never been set to false by default.
Improvement
Core engine
Non user visiblle
Related issues, pull requests, and links
6143485
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: