-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assume AWS Role in PrestoS3FileSystem #2640
Conversation
@electrum can you please take a look? |
@electrum would you please take a look? |
@electrum rebased with the current master |
@electrum comments addressed |
This will cause the creation of many instances of Do we need our own cache (based on role?) to share clients between FS instances? |
@electrum yes, exactly, did see many instances of AmazonS3Client, and a large number of timeout or connection reset. We did cache of AmazonS3Client, based on the provider name, suffixed by role name. Was planning to submit as a separate PR. Now appended here. Your comments are appreciated. |
d1de28e
to
c43c39c
Compare
@electrum kindly ping |
This is a very useful feature for us too, it would be great this can be merged to trunk, would you please take a look @electrum? Thank you, guys. |
if (useInstanceCredentials) { | ||
return new AmazonS3Client(new InstanceProfileCredentialsProvider(), clientConfig, METRIC_COLLECTOR); | ||
if (useInstanceCredentials) { | ||
credentialsProvider = new InstanceProfileCredentialsProvider(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
credentialsProvider here overrides the assignment on line 576, so forth it breaks the test com.facebook.presto.hive.TestPrestoS3FileSystem.testStaticCredentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dayzzz thank you, it is a bug related to defaults config, get it fixed
Does AWS support Kerberos SSO? If so, a better approach will be to use the Kerberos authentication that @martint just checked in. Currently this is not passed down to connectors for authorization, but that this being worked on next. Generally, I don't think we should be passing credentials as generic session properties and instead attach the |
@dain @martint @electrum |
I think this does not conflict with Kerberos, it is just a AWS S3 feature support. What do you think? |
kindly ping |
I'm still a bit lost on how this works. This seems to be similar to Hive's storage-based security where instead of checking the Metastore for grants, you interrogate the storage (directory) permissions. In this case the user "principal" would be the IAM credentials and instead of an explicit authorization check, we would let the S3 client throw exceptions. It that how this is intended to work (at a high level)? |
@dain yes, as you said, instead of checking Metastore, we have all permission settings at S3 storage, and users need to specify their aws iam role to access the specified files. If permission denied, S3 client will throw exceptions for unauthorized access. |
I get it. I'm sure this is something we can support but I have a bunch of follow up questions. Hopefully, the AWS folks will jump in with their opinion on how this should work. How do we assure the IAM string is valid? How do we authenticate it? How do we validate the IAM string is allowed to run queries as the specified Presto username? Presto username is used for queue management and for functions, so it is important to restrict a person to only use one Presto username string. For example, can we verify that the IAM string ends with How should How will views work? A SQL view executes as the person that created the view. Where should we store the IAM string from the view owner? Or do we simply disallow view creation? Does this mean that when we add How do you envision this working for people using a mixed S3 and HDFS installation on AWS? For HDFS we do not support storage-based security. This would be confusing to users and maybe impossible in the code. |
We may want to model this as another authentication scheme, where the AWS IAM credentials are captured in a |
Thank you @dain @martint for the comments. Yes, this seems more like a hacking, did not consider the scenario to work with CREATE/INSERT, GRANT, or working with HDFS. |
@dain @martint
The AWS Credential Provider will throw exception if the IAM string is invalid, and will deny access if the role does not have permission to do so. Say, if a user assuming IamReadRole trying to write files(CreateTable), it will fail, and get permission denied exception.
If I understand the question correctly, different users could assume the same IAM role, and the same users could assume different IAM roles in his different sessions. The username is not part of IAM role. It is the role that user is accessing S3.
Create Table and Insert definitely needs to assume IamWriteRole, if not they will fail and get permission denied exceptions.
Our current usage is, if a view is created by a user assuming valid IamWriteRole in the sensitive database, then all users assuming IamReadRole could read from the view.
I am thinking, when Presto has GRANT, maybe could just have this S3 IAM role continue working with the GRANT, say, if the data warehouse is on S3, then GRANT does not guarantee permission to access the warehouse, also need to assume appropriate IAM roles to accessing it.
Yes, this is confusing. Seems like these are two models, the metastore GRANT model, which HDFS follows, and the storage based model, which S3 IAM role follows. I am thinking, maybe make HDFS follow the storage based model, or make S3 follow the metastore GRANT model, are confusing and difficult. Making the S3 IAM role a session property is kind of assuming metastore GRANT model is the ideal model to Presto, but just make it OK to work with S3 IAM role. Your comments and suggestions are appreciated. |
@zhenxiao any plans to finalize this PR? |
@martint @dain @zhenxiao Just want to check with you guys whether it's OK to build this feature with the following steps as another authentication scheme:
|
thanks for the update @nezihyigitbasi |
@nezihyigitbasi is this one still needed? |
Nope, one can get this functionality with a custom credentials provider (#5667), so closing. |
Users could use AWS role to do per-session access.
eg. use AWS role to query some limit-access tables:
./presto --server coordinator:8080 --catalog hive --debug --session hive.aws_iam_role="arn:aws:iam::example:role/access"