Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support grouped execution for eligible table scans #12934

Merged
merged 1 commit into from
Jun 17, 2019

Conversation

shixuan-fan
Copy link
Contributor

@shixuan-fan shixuan-fan commented Jun 12, 2019

This is controlled by session property grouped_execution_for_all_eligible_table_scans.

Related to #12124

@rschlussel
Copy link
Contributor

So is this to say that we should always do grouped execution for eligible tablescans even if the nodes above it don't care about grouped execution? Can you give more context about why/when we want this behavior?

@wenleix
Copy link
Contributor

wenleix commented Jun 12, 2019

@rschlussel : This allows partial recovery for ScanFilterProject-only queries (reading from bucketed table).

Also, today reading from bucketed table with ungrouped execution is somewhat hacky, for example, it requires (https://github.com/prestodb/presto/wiki/HiveSplitSource-and-Grouped-Execution#old-days):

  • the underlying connector to return the split in a round-robin fashion
  • A special BucketedSplitPlacmentPolicy to be used, which is still aware of bucketed table.

So popularizing grouped execution seems to be the right direction in the long-term -- i.e. the stage scheduler is aware that he is scheduling split in a bucket-aware way.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @rschlussel do you mind also taking a look?

Also maybe explain in the commit message about the motivation (for recoverable grouped execution).

if (ImmutableList.of(NOT_PARTITIONED).equals(partitionHandles)) {
return new GroupedExecutionProperties(false, false, ImmutableList.of(), 1, recoveryEligible);
return new GroupedExecutionProperties(false, useful, ImmutableList.of(), 1, recoveryEligible);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel GroupedExecutionProperties#useful should be renamed. But don't have to do it in this PR.

@wenleix wenleix assigned shixuan-fan and unassigned wenleix Jun 12, 2019
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more context to the commit message

@shixuan-fan shixuan-fan force-pushed the table_scan branch 2 times, most recently from 325d20a to 68333b1 Compare June 13, 2019 20:14
@shixuan-fan shixuan-fan changed the title Support grouped execution for all eligible table scans Support grouped execution for eligible table scans Jun 13, 2019
This is controlled by session property grouped_execution_for_eligible_table_scans.
This could help enable recoverable grouped execution for ScanFilterProject-only
queries that read from partitioned table (for example, bucketed table in Hive).
@shixuan-fan shixuan-fan merged commit 40e4dd7 into prestodb:master Jun 17, 2019
@shixuan-fan shixuan-fan deleted the table_scan branch June 17, 2019 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants