-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve the partition storage location from the metastore #11864
Retrieve the partition storage location from the metastore #11864
Conversation
...trino-product-tests/src/main/java/io/trino/tests/product/hive/TestSyncPartitionMetadata.java
Show resolved
Hide resolved
Does this PR resolve the same issue of #11719? |
List<String> partitionsInMetastore = metastore.getPartitionsByNames(schemaName, tableName, partitionsNamesInMetastore).values().stream() | ||
.filter(Optional::isPresent).map(Optional::get) | ||
.map(partition -> new Path(partition.getStorage().getLocation()).toUri()) | ||
.map(uri -> tableLocation.toUri().relativize(uri).getPath()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync_partition_metadata
assumes the partitions names match locations
is it still well-defined if they don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the question.
The logic here is relatively similar to what is being done in io.trino.plugin.hive.HiveSplitManager
where the partition names are retrieved first and then the corresponding partitions are retrieved afterwards by names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sync_partition_metadata
relies on certain naming convention.
Split manager shouldn't and doesn't rely on that convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes here? Were the partition names returned as escaped or unescaped? In the new test you added, the locations in HDFS are already escaped.
@nineinchnick I seem to have added a duplicate for #11719 . Sorry for not checking before whether there is already a PR for this issue in progress. :( Thank you @ebyhr for the heads up. |
9209583
to
c234aeb
Compare
the two are significantly different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11719 tries to address a different problem, where the locations read from the file system are not escaped, but HMS is returning them escaped and sync_partition_metadata
tries to add them again on the second run and fails with an AlreadyExists
exception.
@@ -64,6 +64,33 @@ public void testAddPartition() | |||
cleanup(tableName); | |||
} | |||
|
|||
@Test(groups = {HIVE_PARTITIONING, SMOKE, TRINO_JDBC}) | |||
@Flaky(issue = ERROR_COMMITTING_WRITE_TO_HIVE_ISSUE, match = ERROR_COMMITTING_WRITE_TO_HIVE_MATCH) | |||
public void testAddPartitionContainingCharactersThatNeedUrlEncoding() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already tests for adding and dropping partitions, maybe it's enough to add new dirs in the prepare()
method instead of adding more tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. There are already tests for adding/dropping partitions. However, the newly added tests deal with special characters which are getting encoded in HMS (and on HDFS).
List<String> partitionsInMetastore = metastore.getPartitionsByNames(schemaName, tableName, partitionsNamesInMetastore).values().stream() | ||
.filter(Optional::isPresent).map(Optional::get) | ||
.map(partition -> new Path(partition.getStorage().getLocation()).toUri()) | ||
.map(uri -> tableLocation.toUri().relativize(uri).getPath()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes here? Were the partition names returned as escaped or unescaped? In the new test you added, the locations in HDFS are already escaped.
Follow the guideline outlined in Hive's source code: > Users should always use the MetaStore API to get the path > name for a partition. > Users should not directly take partition values and turn > it into a path name by themselves, because this internal > logic below may change in the future. See `org.apache.hadoop.hive.common.FileUtils.charToEscape` for details. Using the guideline outlined above comes in handy when dealing with characters that need escaping in the path of the partition either on HMS or on HDFS side.
c234aeb
to
5a03b22
Compare
Follow the guideline outlined in Hive's source code: > Users should always use the MetaStore API to get the path > name for a partition. > Users should not directly take partition values and turn > it into a path name by themselves, because this internal > logic below may change in the future. See `org.apache.hadoop.hive.common.FileUtils.charToEscape` for details. Using the guideline outlined above comes in handy when dealing with characters that need escaping in the path of the partition either on HMS or on HDFS side.
5a50c80
to
05cae88
Compare
Description
Follow the guideline outlined in Hive's source code:
See
org.apache.hadoop.hive.common.FileUtils.charToEscape
for details.Using the guideline outlined above comes in handy when dealing with
characters that need escaping in the path of the partition either
on HMS or on HDFS side.
Fix
Hive connector
Fix the comparison mechanism for the partition names in the
system.sync_partition_metadata
stored procedure.Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: