Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Hive metastore impersonation #1441

Merged
merged 1 commit into from Sep 23, 2019
Merged

Conversation

@ebyhr
Copy link
Contributor

ebyhr commented Sep 4, 2019

Fix #43

@cla-bot cla-bot bot added the cla-signed label Sep 4, 2019
@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch 2 times, most recently from a937203 to 1299349 Sep 4, 2019
@electrum

This comment has been minimized.

Copy link
Member

electrum commented Sep 6, 2019

Would it be sufficient to call set_ugi() after connecting to the metastore? The delegation token support says it is there for map/reduce tasks, where the tasks would run on a different machine (and thus wouldn't have access to the original credentials).

@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch from 1299349 to 72a6bdc Sep 6, 2019
@ebyhr

This comment has been minimized.

Copy link
Contributor Author

ebyhr commented Sep 6, 2019

Thank you for the advise. I think it's sufficient too and replaced the delegation token logic with set_ugi().

@ebyhr ebyhr marked this pull request as ready for review Sep 7, 2019
@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch 4 times, most recently from 75aa6e9 to d6623de Sep 7, 2019
@ebyhr ebyhr requested a review from electrum Sep 9, 2019
@AnupGS

This comment has been minimized.

Copy link

AnupGS commented Sep 13, 2019

Hi, impersonation support is missed out for get table operation. is there any plan to add support for that as well?

@electrum

This comment has been minimized.

Copy link
Member

electrum commented Sep 16, 2019

@AnupGS What is the use case for impersonation for read calls? We could certainly add those, but I'm not aware of any reason to do so. When using SQL standard authorization, Presto performs all the security checks using the security information provided by the metastore.

@findepi

This comment has been minimized.

Copy link
Member

findepi commented Sep 16, 2019

@electrum this is required because HMS may impose additional checks on get_table call.
It may be, for example, that Presto user is allowed to list tables only, but it's not allowed to read from any of them.
With storage-based auth enabled in HMS, Presto user won't be able to do get_table on any table.

@ebyhr @electrum I will share with you exact impersonation rules we have implemented.

@ebyhr thanks for working on this! And apologies for not opening our impersonation impl earlier, as I promised.
It was dependent on several other changes we made.

Copy link
Member

electrum left a comment

A few minor comments, otherwise looks good. Apologies for the delay in reviewing. I'd like to merge this for the next release.

@findepi or @kokosing do you want to look at the product tests? They seem fine to me, but I'm not an expert on them.

@electrum

This comment has been minimized.

Copy link
Member

electrum commented Sep 16, 2019

@findepi Thanks for explaining. @ebyhr It sounds like we need impersonation for the read calls. If you have time to update this PR to include those, great, otherwise we can merge this as-is and follow up with those later.

@ebyhr

This comment has been minimized.

Copy link
Contributor Author

ebyhr commented Sep 16, 2019

@findepi @electrum Thanks for your reviews and comments. I'm just adding impersonation for get_table and the dependent methods. I'll push it later.

@AnupGS

This comment has been minimized.

Copy link

AnupGS commented Sep 16, 2019

Hi @electrum, @findepi has explained it perfectly. presto user is unable to get table if storage based auth is enabled.

@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch from d6623de to c808dee Sep 16, 2019
@findepi findepi self-requested a review Sep 16, 2019
Copy link
Member

findepi left a comment

A couple of comments. Well done!

set -xeuo pipefail

presto-product-tests/bin/run_on_docker.sh \
singlenode-hive-impersonation \

This comment has been minimized.

Copy link
@findepi

findepi Sep 16, 2019

Member

According to

this.impersonationEnabled = thriftConfig.isImpersonationEnabled() && hiveMetastoreAuthentication instanceof KerberosHiveMetastoreAuthentication;

this.impersonationEnabled = thriftConfig.isImpersonationEnabled() && hiveMetastoreAuthentication instanceof KerberosHiveMetastoreAuthentication;

the impersonation is supported only when kerberos is enabled
i think with setUGI, we should support impersonation with and without kerberos

-- but this also signals we don't have a direct test for an impersonation, something that would fail when impersonation does not work

This comment has been minimized.

Copy link
@findepi

findepi Sep 17, 2019

Member

I think this should be visible if you add authorization here. (It should be included here anyway, it contains TestRoles, this is important to run here)

@findepi

This comment has been minimized.

Copy link
Member

findepi commented Sep 16, 2019

When applying changes, please separate them from rebase (or delay rebase), so that the changes themselves can be viewed on GH ui.

Copy link
Member

electrum left a comment

Wow, adding impersonation for get calls was more complicated than I thought. Thanks for doing that.

set -xeuo pipefail

presto-product-tests/bin/run_on_docker.sh \
singlenode-hive-impersonation \

This comment has been minimized.

Copy link
@findepi

findepi Sep 17, 2019

Member

I think this should be visible if you add authorization here. (It should be included here anyway, it contains TestRoles, this is important to run here)


presto-product-tests/bin/run_on_docker.sh \
singlenode-kerberos-hive-impersonation \
-g storage_formats,cli,hdfs_impersonation,authorization,hive_file_header

This comment has been minimized.

Copy link
@findepi

findepi Sep 17, 2019

Member

Any particular reason to include cli and hive_file_header tests?
(@kokosing do you know why do we run cli tests so often?)

This comment has been minimized.

Copy link
@ebyhr

ebyhr Sep 17, 2019

Author Contributor

No particular reason to include those tests. I followed existing hdfs-impersonation tests.

@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch from c808dee to 525e514 Sep 17, 2019
@ebyhr

This comment has been minimized.

Copy link
Contributor Author

ebyhr commented Sep 17, 2019

Let me push updated code once. Below comments are not yet applied.
#1441 (comment) & #1441 (comment)

@ebyhr

This comment has been minimized.

Copy link
Contributor Author

ebyhr commented Sep 19, 2019

Pushed so that we can share the current status.

@@ -86,6 +87,7 @@
implements HiveMetastore
{
protected final HiveMetastore delegate;
private final boolean impersonationEnabled;

This comment has been minimized.

Copy link
@electrum

electrum Sep 21, 2019

Member

We can remove this field if we add isImpersonationEnabled() to HiveMetastore. Then updateIdentity() can simply call delegate.isImpersonationEnabled().

This comment has been minimized.

Copy link
@ebyhr

ebyhr Sep 21, 2019

Author Contributor

Should we always return false in FileHiveMetastore and throw an exception in GlueHiveMetastore? At least, we can't inject ThriftHiveMetastoreConfig into FileHiveMetastore. We can do it in GlueHiveMetastore.

This comment has been minimized.

Copy link
@findepi

findepi Sep 22, 2019

Member

Yes, always false. we don't support impersonation there

@electrum

This comment has been minimized.

Copy link
Member

electrum commented Sep 21, 2019

I had a few comments, mostly around the usage of ThriftHiveMetastoreConfig, which is actually the cause of the Travis test failures. This should be easy to fix. After that, this PR should be ready to go. Thanks for all your hard work on this. It's much more involved than I imagined.

@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch 2 times, most recently from 9736171 to ed32e58 Sep 22, 2019
Copy link
Member

findepi left a comment

Minor comments.
Thanks @ebyhr


private HiveIdentity()
{
this(new ConnectorIdentity("dummy_identity", Optional.empty(), Optional.empty()));

This comment has been minimized.

Copy link
@findepi

findepi Sep 22, 2019

Member

Simply:

Suggested change
this(new ConnectorIdentity("dummy_identity", Optional.empty(), Optional.empty()));
this.username = "dummy_identity";

This comment has been minimized.

Copy link
@findepi

findepi Sep 22, 2019

Member

Still, I'd prefer the username be Optional<String> in this class.
There is only one place where getUsername() is called, so it's not a big deal and improves clarify.

private HiveIdentity updateIdentity(HiveIdentity identity)
{
// remove identity if not doing impersonation
return delegate.isImpersonationEnabled() ? identity : HiveIdentity.none();

This comment has been minimized.

Copy link
@findepi

findepi Sep 22, 2019

Member

HiveIdentity.none() is going to be used as a cache key many times at the same time, so using a singleton would be nice.

@electrum

This comment has been minimized.

Copy link
Member

electrum commented Sep 22, 2019

One more thing: we can remove AVRO_SCHEMA_URL from TestGroups in product-tests

@ebyhr ebyhr force-pushed the ebyhr:hms-impersonation branch from ed32e58 to 2f64cbf Sep 23, 2019
@electrum electrum merged commit b5a6a8f into prestosql:master Sep 23, 2019
2 checks passed
2 checks passed
Travis CI - Pull Request Build Passed
Details
verification/cla-signed
Details
@martint martint added this to the 320 milestone Oct 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.