Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Hadoop KMS #997

Merged
merged 9 commits into from Jul 6, 2019

Conversation

5 participants
@findepi
Copy link
Member

commented Jun 14, 2019

Based on #1002

@cla-bot cla-bot bot added the cla-signed label Jun 14, 2019

Show resolved Hide resolved pom.xml Outdated

@findepi findepi force-pushed the findepi:findepi/hive-compat branch from 4657c57 to b428fd5 Jun 14, 2019

@findepi findepi force-pushed the findepi:findepi/hive-compat branch 2 times, most recently from 0195cc8 to 1392bd6 Jun 15, 2019

@findepi

This comment has been minimized.

Copy link
Member Author

commented Jun 15, 2019

Ready for review.

@findepi findepi force-pushed the findepi:findepi/hive-compat branch 2 times, most recently from 4545f86 to c8ee4bd Jun 15, 2019

@findepi findepi requested a review from electrum Jul 1, 2019

@findepi findepi force-pushed the findepi:findepi/hive-compat branch from c8ee4bd to 4773424 Jul 1, 2019

@electrum

This comment has been minimized.

Copy link
Member

commented Jul 4, 2019

There are other open calls in HdfsParquetDataSource and BackgroundHiveSplitLoader.

Also, what about all the other calls for getFileSystem()? Maybe we need to replace them with a withFileSystem() call (like doAs but provides the FS)?

@electrum

This comment has been minimized.

Copy link
Member

commented Jul 4, 2019

This commit title sounds strange:

Update Hive-plugin configuration for KMS key provider cache TTL

How about

Allow configuring Hadoop KMS key provider cache TTL

<configuration>
<property>
<name>dfs.client.key.provider.cache.expiry</name>
<value>0</value><!-- disables cache -->

This comment has been minimized.

Copy link
@electrum

electrum Jul 4, 2019

Member

Why is this needed for the test? What happens in normal environments that have a cache?

This comment has been minimized.

Copy link
@findepi

findepi Jul 5, 2019

Author Member

Caches can have two problems:

  • it can contain a stale entry, which would prevent us from reaching KMS server (this is addressed by forcing cache expiration)
  • it can contain a valid entry which current thread would not be otherwise able to calculate (e.g. you can read from Parquet only after you read from TEXTFILE table) -- this is being addressed here

I will add comment in the code.

Show resolved Hide resolved ...c-catalogs/singlenode-kerberos-kms-hdfs-no-impersonation/hive.properties Outdated
Show resolved Hide resolved ...c-catalogs/singlenode-kerberos-kms-hdfs-no-impersonation/hive.properties Outdated
Show resolved Hide resolved ...-tests/src/main/java/io/prestosql/tests/hive/TestHiveStorageFormats.java Outdated
Show resolved Hide resolved ...-tests/src/main/java/io/prestosql/tests/hive/TestHiveStorageFormats.java Outdated
Show resolved Hide resolved ...src/main/java/io/prestosql/plugin/hive/HdfsConfigurationInitializer.java Outdated
Show resolved Hide resolved ...src/main/java/io/prestosql/plugin/hive/HdfsConfigurationInitializer.java Outdated
Show resolved Hide resolved presto-hive/src/main/java/io/prestosql/plugin/hive/HiveConfig.java Outdated
Show resolved Hide resolved presto-hive/src/main/java/io/prestosql/plugin/hive/HiveConfig.java Outdated
.orElse(false);
if (isAvroTableWithSchemaSet) {
List<FieldSchema> schema = delegate.getFields(databaseName, tableName).get();
fromMetastoreApiPartition = partition -> fromMetastoreApiPartition(partition, schema);

This comment has been minimized.

Copy link
@electrum

electrum Jul 4, 2019

Member

The static import here makes this confusing, since the name is the same as the variable. Let's remove the static import.

This comment has been minimized.

Copy link
@findepi

findepi Jul 5, 2019

Author Member

Yeah, this looks ambiguous on Github. Fortunately, in IDE this gets colored clearly.

@findepi

This comment has been minimized.

Copy link
Member Author

commented Jul 5, 2019

There are other open calls in HdfsParquetDataSource

The code is unused, I am removing it.

and BackgroundHiveSplitLoader.

This is code for symlinks. Do you know how can I test this?

@findepi findepi force-pushed the findepi:findepi/hive-compat branch from 4773424 to 89e0118 Jul 5, 2019

@findepi

This comment has been minimized.

Copy link
Member Author

commented Jul 5, 2019

AC. Hopefully everything still works 😉

@findepi findepi force-pushed the findepi:findepi/hive-compat branch from 89e0118 to e3d2243 Jul 5, 2019

@findepi findepi force-pushed the findepi:findepi/hive-compat branch from e3d2243 to b492632 Jul 5, 2019

@findepi findepi merged commit 16e7781 into prestosql:master Jul 6, 2019

2 checks passed

Travis CI - Pull Request Build Passed
Details
verification/cla-signed
Details

@findepi findepi deleted the findepi:findepi/hive-compat branch Jul 6, 2019

@findepi findepi added this to the 316 milestone Jul 6, 2019

@findepi findepi referenced this pull request Jul 6, 2019

Closed

Release notes for 316 #1000

5 of 6 tasks complete
@sajjoseph

This comment has been minimized.

Copy link

commented Jul 7, 2019

I was waiting for these changes to be merged. Hive tables with avro schema file hosted within a kerberized hdfs location was not working before and I can confirm that this PR is fixing that issue.

Thanks @findepi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.