Retrieve the partition storage location from the metastore #11864

findinpath · 2022-04-08T05:13:44Z

Description

Follow the guideline outlined in Hive's source code:

Users should always use the MetaStore API to get the path
name for a partition.
Users should not directly take partition values and turn
it into a path name by themselves, because this internal
logic below may change in the future.

See org.apache.hadoop.hive.common.FileUtils.charToEscape for details.

Using the guideline outlined above comes in handy when dealing with
characters that need escaping in the path of the partition either
on HMS or on HDFS side.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Hive connector

How would you describe this change to a non-technical end user or system administrator?

Fix the comparison mechanism for the partition names in the system.sync_partition_metadata stored procedure.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Hive
* Fix the comparison mechanism for the partition names in the `system.sync_partition_metadata` stored procedure.

...trino-product-tests/src/main/java/io/trino/tests/product/hive/TestSyncPartitionMetadata.java

ebyhr · 2022-04-08T05:25:29Z

Does this PR resolve the same issue of #11719?

findepi · 2022-04-08T07:52:14Z

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java

+            List<String> partitionsInMetastore = metastore.getPartitionsByNames(schemaName, tableName, partitionsNamesInMetastore).values().stream()
+                    .filter(Optional::isPresent).map(Optional::get)
+                    .map(partition -> new Path(partition.getStorage().getLocation()).toUri())
+                    .map(uri -> tableLocation.toUri().relativize(uri).getPath())


sync_partition_metadata assumes the partitions names match locations
is it still well-defined if they don't?

I'm not sure I understand the question.
The logic here is relatively similar to what is being done in io.trino.plugin.hive.HiveSplitManager where the partition names are retrieved first and then the corresponding partitions are retrieved afterwards by names.

Yes, sync_partition_metadata relies on certain naming convention.
Split manager shouldn't and doesn't rely on that convention.

What changes here? Were the partition names returned as escaped or unescaped? In the new test you added, the locations in HDFS are already escaped.

findinpath · 2022-04-08T07:58:35Z

@nineinchnick I seem to have added a duplicate for #11719 . Sorry for not checking before whether there is already a PR for this issue in progress. :(

Thank you @ebyhr for the heads up.

findepi · 2022-04-08T08:39:20Z

@nineinchnick I seem to have added a duplicate for #11719 . Sorry for not checking before whether there is already a PR for this issue in progress. :(

the two are significantly different.

nineinchnick

#11719 tries to address a different problem, where the locations read from the file system are not escaped, but HMS is returning them escaped and sync_partition_metadata tries to add them again on the second run and fails with an AlreadyExists exception.

nineinchnick · 2022-04-08T12:45:01Z

...trino-product-tests/src/main/java/io/trino/tests/product/hive/TestSyncPartitionMetadata.java

@@ -64,6 +64,33 @@ public void testAddPartition()
        cleanup(tableName);
    }

+    @Test(groups = {HIVE_PARTITIONING, SMOKE, TRINO_JDBC})
+    @Flaky(issue = ERROR_COMMITTING_WRITE_TO_HIVE_ISSUE, match = ERROR_COMMITTING_WRITE_TO_HIVE_MATCH)
+    public void testAddPartitionContainingCharactersThatNeedUrlEncoding()


There are already tests for adding and dropping partitions, maybe it's enough to add new dirs in the prepare() method instead of adding more tests?

Indeed. There are already tests for adding/dropping partitions. However, the newly added tests deal with special characters which are getting encoded in HMS (and on HDFS).

nineinchnick · 2022-04-08T12:47:46Z

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java

+            List<String> partitionsInMetastore = metastore.getPartitionsByNames(schemaName, tableName, partitionsNamesInMetastore).values().stream()
+                    .filter(Optional::isPresent).map(Optional::get)
+                    .map(partition -> new Path(partition.getStorage().getLocation()).toUri())
+                    .map(uri -> tableLocation.toUri().relativize(uri).getPath())


What changes here? Were the partition names returned as escaped or unescaped? In the new test you added, the locations in HDFS are already escaped.

Follow the guideline outlined in Hive's source code: > Users should always use the MetaStore API to get the path > name for a partition. > Users should not directly take partition values and turn > it into a path name by themselves, because this internal > logic below may change in the future. See `org.apache.hadoop.hive.common.FileUtils.charToEscape` for details. Using the guideline outlined above comes in handy when dealing with characters that need escaping in the path of the partition either on HMS or on HDFS side.

cla-bot bot added the cla-signed label Apr 8, 2022

github-actions bot added the tests:hive label Apr 8, 2022

findinpath requested review from findepi, ebyhr, losipiuk and dain April 8, 2022 05:21

findinpath commented Apr 8, 2022

View reviewed changes

...trino-product-tests/src/main/java/io/trino/tests/product/hive/TestSyncPartitionMetadata.java Show resolved Hide resolved

findepi reviewed Apr 8, 2022

View reviewed changes

findinpath force-pushed the hive-sync-partition-metadata branch from 9209583 to c234aeb Compare April 8, 2022 08:18

nineinchnick reviewed Apr 8, 2022

View reviewed changes

findinpath force-pushed the hive-sync-partition-metadata branch from c234aeb to 5a03b22 Compare April 11, 2022 19:51

findinpath requested a review from findepi April 11, 2022 20:09

findinpath force-pushed the hive-sync-partition-metadata branch from 5a50c80 to 05cae88 Compare April 13, 2022 04:08

findepi merged commit fbde685 into trinodb:master Apr 13, 2022

github-actions bot added this to the 377 milestone Apr 13, 2022

findepi mentioned this pull request Apr 13, 2022

Release notes for 377 #11863

Closed

mosabua mentioned this pull request Apr 13, 2022

Add Trino 377 release notes #11859

Merged

guyco33 mentioned this pull request May 24, 2022

Hive sync_partition_metadata fails with timeout when table has many partitions #12525

Closed

findepi mentioned this pull request Jun 3, 2022

sync_partition_metadata procedure takes significant long time to run #12685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve the partition storage location from the metastore #11864

Retrieve the partition storage location from the metastore #11864

findinpath commented Apr 8, 2022

ebyhr commented Apr 8, 2022

findepi Apr 8, 2022

findinpath Apr 8, 2022

findepi Apr 8, 2022

nineinchnick Apr 8, 2022

findinpath commented Apr 8, 2022

findepi commented Apr 8, 2022

nineinchnick left a comment

nineinchnick Apr 8, 2022

findinpath Apr 11, 2022

nineinchnick Apr 8, 2022

Retrieve the partition storage location from the metastore #11864

Retrieve the partition storage location from the metastore #11864

Conversation

findinpath commented Apr 8, 2022

Description

Related issues, pull requests, and links

Documentation

Release notes

ebyhr commented Apr 8, 2022

findepi Apr 8, 2022

Choose a reason for hiding this comment

findinpath Apr 8, 2022

Choose a reason for hiding this comment

findepi Apr 8, 2022

Choose a reason for hiding this comment

nineinchnick Apr 8, 2022

Choose a reason for hiding this comment

findinpath commented Apr 8, 2022

findepi commented Apr 8, 2022

nineinchnick left a comment

Choose a reason for hiding this comment

nineinchnick Apr 8, 2022

Choose a reason for hiding this comment

findinpath Apr 11, 2022

Choose a reason for hiding this comment

nineinchnick Apr 8, 2022

Choose a reason for hiding this comment