Improve query plan when Hive tables has compatible bucket number #11749

haozhun · 2018-10-19T23:29:38Z

When two Hive tables have same bucketing key, but different bucket number,
remote exchange is not necessary to join them when the bucket numbers are
compatible.

wenleix

Some initial quick comment. I would like to understand the design together with evolving bucket feature (done in #10312 )

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

wenleix · 2018-10-23T00:39:25Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

+            // must be evenly divisible
+            return Optional.empty();
+        }
+        if (Integer.bitCount(largerBucketCount / smallerBucketCount) != 1) {


This doesn't seem necessary.

I agree that this isn't necessary. But this is a reasonable restriction, and the existing bucket # evolution also has restriction.

wenleix · 2018-10-23T00:48:00Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

+        }
+
+        OptionalInt maxCompatibleBucketCount = min(leftHandle.getMaxCompatibleBucketCount(), rightHandle.getMaxCompatibleBucketCount());
+        if (maxCompatibleBucketCount.isPresent() && maxCompatibleBucketCount.getAsInt() < smallerBucketCount) {


So, when maxCompatibleBucketCount is not provided, there is no max compatible bucket count...

This is true. I suppose you are concerned about this.

wenleix · 2018-10-23T00:54:25Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

+            //TODO! commented out to run travis. uncomment before merge
+            //return Optional.empty();
+        }
+        int largerBucketCount = Math.max(leftHandle.getBucketCount(), rightHandle.getBucketCount());


Can we refactor to use the existing HiveSplitManager.isBucketCountCompatible ?

From the abstraction perspective: Even though both functions take two integers, the semantics is completely different. One takes table bucket count and partition bucket, the other takes two table bucket count (and order doesn't matter).

From the practical perspective: On the other hand, while the code looks similar, the logic is sufficiently different. While code reuse is still possible, it won't be particularly straightforward.

Overall, I think the two functions should be kept separate.

wenleix · 2018-10-23T00:56:38Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveTableLayoutHandle.java

@@ -125,6 +125,11 @@ public SchemaTableName getSchemaTableName()
    @Override
    public String toString()
    {
-        return schemaTableName.toString();
+        StringBuilder result = new StringBuilder();


nit: Can MoreObjects.toStringHelper be used here?

toString of ConnectorTableLayoutHandle is used for EXPLAIN. As a result, using toStringHelper would lead to too much boilerplate.

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

presto-main/src/main/java/com/facebook/presto/metadata/Metadata.java

wenleix · 2018-10-23T04:52:52Z

presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/ActualProperties.java

        {
-            return nodePartitioning.isPresent() && nodePartitioning.get().equals(partitioning) && this.nullsAndAnyReplicated == nullsAndAnyReplicated;
+            return nodePartitioning.isPresent() && nodePartitioning.get().isCompatibleWith(partitioning, metadata, session) && this.nullsAndAnyReplicated == nullsAndAnyReplicated;


Shall we consider rename the variable name? (nodePartitioning) since it's referring to table partitioning here?

I believe this would be a reasonable rename to make. I don't think it fits in this PR though.

wenleix

Commit "Improve query plan when Hive tables has compatible bucket number" generally looks good to me. Some comments left.

@kokosing and @martint , would you mind also making a pass to

the first commit,
the changes to PlanFragmenter.java ?

Thanks!

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

wenleix · 2018-10-25T18:41:53Z

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

+                throw new PrestoException(
+                        NOT_SUPPORTED,
+                        "The bucket filter cannot be satisfied. There are restrictions on the bucket filter when all the following is true: " +
+                                "1. a table has a different buckets count as at least one of its partitions that is read in this query; " +


This happens when tableBucket > max(readBucket, partitionBucket) and $bucket is used. So I can see why tableBucket != partitionBucket is a necessary condition.

However, this condition combination looks very complicated for user to understand and reason when they can use $bucket, when they cannot.

Another thinking would be to simply disable compatible bucket when $bucket is used. This seems easier to reason the engine behavior.

wenleix · 2018-10-25T19:20:00Z

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

-            int partitionBucketNumber = bucketNumber % partitionBucketCount; // physical
-            int tableBucketNumber = bucketNumber % tableBucketCount; // logical
-            if (bucketSplitInfo.isBucketEnabled(tableBucketNumber)) {
+        for (int bucketNumber = 0; bucketNumber < Math.max(readBucketCount, partitionBucketCount); bucketNumber++) {


Depends on the comparison relationship between readBucketCount and partitionBucketCount , the loop subject can be readBucketNumber or partitionBucketNumber.

One thought is to fix the loop subject to be readBucketNumber, and it may create 1 or multiple splits.

There is a desire here to make sure that the splits are produced in an order such that it round robins (1 split at a time for each "read bucket id").

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

wenleix · 2018-10-25T21:09:19Z

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

+
+        PartitioningHandleReassigner partitioningHandleReassigner = new PartitioningHandleReassigner(fragment.getPartitioning(), metadata, session);
+        PlanNode newRoot;
+        if (fragment.getPartitioning().isSingleNode()) {


Should the condition here and the condition on 153 the same?

I don't think so

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

kokosing

Just skimmed so far

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java

presto-hive/src/main/java/com/facebook/presto/hive/HiveBucketHandle.java

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java

kokosing · 2018-10-26T05:00:21Z

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

@@ -314,13 +314,13 @@ private void invokeNoMoreSplitsIfNecessary()
        if (partition.getPartition().isPresent()) {


What does it mean bucket numbers are compatible?

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

kokosing · 2018-10-26T05:08:55Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

@@ -98,6 +98,20 @@ default boolean schemaExists(ConnectorSession session, String schemaName)

    ConnectorTableLayout getTableLayout(ConnectorSession session, ConnectorTableLayoutHandle handle);

+    default ConnectorTableLayoutHandle getAlternativeLayoutHandle(ConnectorSession session, ConnectorTableLayoutHandle tableLayoutHandle, ConnectorPartitioningHandle partitioningHandle)


This surely missing a comment.

Maybe getPartitionedTableLayoutHandle? Maybe it could return Optional, then I suspect that below method would not need to be necessary.

I added comment.

I don't think returning an Optional would be helpful in removing the other method.

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

kokosing · 2018-10-26T05:25:24Z

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

+        return reassignPartitioningHandleIfNecessaryHelper(session, metadata, subPlan, subPlan.getFragment().getPartitioning());
+    }
+
+    private SubPlan reassignPartitioningHandleIfNecessaryHelper(Session session, Metadata metadata, SubPlan subPlan, PartitioningHandle newOutputPartitioningHandle)


remove Helper part from the method name

I prefer keeping the suffix as It is a helper method which is necessary in order to do recursion

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

kokosing · 2018-10-26T05:34:57Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

@@ -98,6 +98,20 @@ default boolean schemaExists(ConnectorSession session, String schemaName)

    ConnectorTableLayout getTableLayout(ConnectorSession session, ConnectorTableLayoutHandle handle);

+    default ConnectorTableLayoutHandle getAlternativeLayoutHandle(ConnectorSession session, ConnectorTableLayoutHandle tableLayoutHandle, ConnectorPartitioningHandle partitioningHandle)


Maybe getPartitionedTableLayoutHandle? Maybe it could return Optional, then I suspect that below method would not need to be necessary.

wenleix

"Add assertion about plan for tests involving grouped execution" .

Looks good except minor comment.

wenleix · 2018-11-07T21:20:01Z

presto-tests/src/main/java/com/facebook/presto/tests/QueryAssertions.java

        log.info("FINISHED in presto: %s", nanosSince(start));

+        if (planAssertion.isPresent()) {


We can also write as

planAssertion.ifPresent(assertion -> assertion.accept(queryPlan));

ditto for line 150.

I don't have strong option here, though. Given we anyway have to have the if-statement from line 64-72

I'll leave this as is.

wenleix

"Rename and add comments to methods in BucketSplitInfo" looks good.

wenleix · 2018-11-07T21:23:42Z

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

+         * A bucket predicate can be present in two cases:
+         * <ul>
+         * <li>Filter on "$bucket" column. e.g. {@code "$bucket" between 0 and 100}
+         * <li>Single-value equality filter on all bucket columns. e.g. for a table with two bucketing columns,


This is good to learn! How is this done? (bucketFilter is created by seeking "$bucket" in effectivePredicate, does TupleDomain tries to interpret bucketCol = xxx as $bucket = yyy ?

See HiveBucketing.getHiveBucket

wenleix

"Add plan partitioning sanity check for TableWriter in PlanFragmenter"

Looks good.

wenleix

LGTM.

wenleix · 2018-11-08T01:52:31Z

presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java

+                        "The bucket filter cannot be satisfied. There are restrictions on the bucket filter when all the following is true: " +
+                                "1. a table has a different buckets count as at least one of its partitions that is read in this query; " +
+                                "2. the table has a different but compatible bucket number with another table in the query; " +
+                                "3. some buckets of the table is filtered out from the query, most likely using a filter on \"$bucket\". " +


According to the comment in 9d3cd72#diff-20dee960e1c124aae6c511152416a860R547

Filtering just on the bucket columns can have this issue right?

Yes, that is correct.

When two Hive tables have same bucketing key, but different bucket number, the tables are considered to have compatible bucketing. Remote exchange is not necessary to join them when the bucket numbers are compatible. Bucket numbers are considered compatible if the one is the multiple of the other. In the current implementation, the multiplier is required to be a power of two. This power-of-two constraint is not strictly necessary and can be removed. This change also applies when one is reading from a bucketed table and writing into another. It also applies when group by is applied on union all of two or more tables.

wenleix · 2018-11-10T01:34:35Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java

+                    "  test_mismatch_bucketingN\n" +
+                    "ON key16=keyN";
+
+            assertUpdate(withoutMismatchOptimization, writeToTableWithMoreBuckets, 15000, assertRemoteExchangesCount(4));


What about adding a test case about writeToTableWithNoBuckets ? :)

wenleix · 2018-11-10T01:43:10Z

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

@@ -315,34 +361,44 @@ public FragmentProperties setSingleNodeDistribution()
            return this;
        }

-        public FragmentProperties setDistribution(PartitioningHandle distribution)
+        public FragmentProperties setDistribution(PartitioningHandle distribution, Metadata metadata, Session session)


What about call it coalesceDistribution? Or in the comment mention it now will coalesce fragment distribution to a common one

haozhun · 2018-11-10T01:43:55Z

Merged #11749. Sorry that I didn't mark the PR earlier.

wenleix · 2018-11-10T01:45:44Z

presto-main/src/main/java/com/facebook/presto/sql/planner/PlanFragmenter.java

@@ -375,28 +431,39 @@ public FragmentProperties setCoordinatorOnlyDistribution()
            return this;
        }

-        public FragmentProperties addSourceDistribution(PlanNodeId source, PartitioningHandle distribution)
+        public FragmentProperties addSourceDistribution(PlanNodeId source, PartitioningHandle distribution, Metadata metadata, Session session)


Now two methods will coalesce fragment's distribution:

setDistribution

addSourceDistribution

Maybe consider to have some common method to do coalesce. This might help reason the code ? (e.g. when is fragment coalesce happen)

haozhun assigned martint and wenleix Oct 19, 2018

haozhun requested review from martint and wenleix October 19, 2018 23:29

haozhun force-pushed the compatible-bucket branch from b1b4122 to 4b9f464 Compare October 19, 2018 23:36

facebook-github-bot added the CLA Signed label Oct 19, 2018

haozhun force-pushed the compatible-bucket branch 2 times, most recently from 180a2a4 to 317438a Compare October 22, 2018 21:27

wenleix reviewed Oct 23, 2018

View reviewed changes

wenleix reviewed Oct 25, 2018

View reviewed changes

wenleix requested a review from kokosing October 25, 2018 21:19

wenleix assigned haozhun and unassigned wenleix Oct 25, 2018

kokosing reviewed Oct 26, 2018

View reviewed changes

haozhun force-pushed the compatible-bucket branch from 317438a to f82a7bf Compare November 6, 2018 22:47

haozhun assigned wenleix and unassigned haozhun Nov 6, 2018

wenleix reviewed Nov 7, 2018

View reviewed changes

wenleix reviewed Nov 8, 2018

View reviewed changes

wenleix approved these changes Nov 8, 2018

View reviewed changes

haozhun added 4 commits November 8, 2018 15:58

Add assertion about plan for tests involving grouped execution

0488ade

Rename and add comments to methods in BucketSplitInfo

4f6fc1a

Add plan partitioning sanity check for TableWriter in PlanFragmenter

03c5e14

haozhun force-pushed the compatible-bucket branch from f82a7bf to c223aec Compare November 8, 2018 23:58

wenleix reviewed Nov 10, 2018

View reviewed changes

haozhun closed this Nov 10, 2018

wenleix reviewed Nov 10, 2018

View reviewed changes

wenleix mentioned this pull request Feb 13, 2019

Collocated Join + Mismatched Bucket Optimization may produce wrong result #12329

Closed

wenleix mentioned this pull request Apr 8, 2019

Support partial merge pushdown #12611

Merged

16pierre mentioned this pull request Nov 10, 2023

Avoid exchange for Connectors that support multiple layouts #21366

Open

		@@ -314,13 +314,13 @@ private void invokeNoMoreSplitsIfNecessary()
		if (partition.getPartition().isPresent()) {

		@@ -98,6 +98,20 @@ default boolean schemaExists(ConnectorSession session, String schemaName)

		ConnectorTableLayout getTableLayout(ConnectorSession session, ConnectorTableLayoutHandle handle);

		default ConnectorTableLayoutHandle getAlternativeLayoutHandle(ConnectorSession session, ConnectorTableLayoutHandle tableLayoutHandle, ConnectorPartitioningHandle partitioningHandle)

		log.info("FINISHED in presto: %s", nanosSince(start));

		if (planAssertion.isPresent()) {

Improve query plan when Hive tables has compatible bucket number #11749

Improve query plan when Hive tables has compatible bucket number #11749

Conversation

haozhun commented Oct 19, 2018

wenleix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenleix left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenleix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenleix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenleix left a comment

Choose a reason for hiding this comment

wenleix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haozhun commented Nov 10, 2018 • edited Loading

Choose a reason for hiding this comment

wenleix left a comment •

edited

Loading

haozhun commented Nov 10, 2018 •

edited

Loading