Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create Hive+S3 managed table #9942

Closed
ktonga opened this issue Feb 12, 2018 · 12 comments
Closed

Cannot create Hive+S3 managed table #9942

ktonga opened this issue Feb 12, 2018 · 12 comments

Comments

@ktonga
Copy link

ktonga commented Feb 12, 2018

I have a single node cluster with presto version 0.187 (version supported by EMR).

Hive connector has following configuration:

hive.s3.connect-timeout=2m
hive.s3.max-backoff-time=10m
hive.s3.max-error-retries=50
hive.metastore-refresh-interval=1m
hive.s3.max-connections=500
hive.s3.max-client-retries=50
connector.name=hive-hadoop2
hive.s3.socket-timeout=2m
hive.metastore.uri=thrift://...
hive.metastore-cache-ttl=20m
hive.s3.staging-directory=/mnt/tmp
hive.s3.use-instance-credentials=true
hive.non-managed-table-writes-enabled = true
hive.s3.sse.enabled = true

I'm trying to create a table in a S3 bucket which EC2 instance has r/w access to, it requires --sse always enabled, as you can see i've turned the flag on in the config.

Presto seems to be picking up instance credentials correctly as i can create an external table with location s3a://..... and query its content, however, when i try to create a managed table it fails with S3 Access Denied exception, the only details i have about the error is the stacktrace returned in presto-cli with --debug:

presto> create table hive.test.blah (meh varchar);
Query 20180212_022814_00026_gmvrs failed: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: csv/ott/N5m7o+/...), S3 Extended Request ID: csv/ott/N5m7o+/...
com.facebook.presto.spi.PrestoException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: csv/ott/N5m7o+/...), S3 Extended Request ID: csv/ott/N5m7o+/...
	at com.facebook.presto.hive.metastore.ThriftHiveMetastore.createTable(ThriftHiveMetastore.java:399)
	at com.facebook.presto.hive.metastore.BridgingHiveMetastore.createTable(BridgingHiveMetastore.java:146)
	at com.facebook.presto.hive.metastore.CachingHiveMetastore.createTable(CachingHiveMetastore.java:427)
	at com.facebook.presto.hive.metastore.CachingHiveMetastore.createTable(CachingHiveMetastore.java:427)
	at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore$CreateTableOperation.run(SemiTransactionalHiveMetastore.java:2069)
	at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore$Committer.executeAddTableOperations(SemiTransactionalHiveMetastore.java:1145)
	at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore$Committer.access$1000(SemiTransactionalHiveMetastore.java:887)
	at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.commitShared(SemiTransactionalHiveMetastore.java:834)
	at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:739)
	at com.facebook.presto.hive.HiveMetadata.commit(HiveMetadata.java:1523)
	at com.facebook.presto.hive.HiveConnector.commit(HiveConnector.java:177)
	at com.facebook.presto.transaction.TransactionManager$TransactionMetadata$ConnectorTransactionMetadata.commit(TransactionManager.java:577)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: csv/ott/N5m7o+/...), S3 Extended Request ID: csv/ott/N5m7o+/...
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_result$create_table_resultStandardScheme.read(ThriftHiveMetastore.java:28833)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_result$create_table_resultStandardScheme.read(ThriftHiveMetastore.java:28801)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_result.read(ThriftHiveMetastore.java:28727)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table(ThriftHiveMetastore.java:1042)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table(ThriftHiveMetastore.java:1029)
	at com.facebook.presto.hive.ThriftHiveMetastoreClient.createTable(ThriftHiveMetastoreClient.java:117)
	at com.facebook.presto.hive.metastore.ThriftHiveMetastore.lambda$createTable$13(ThriftHiveMetastore.java:387)
	at com.facebook.presto.hive.metastore.HiveMetastoreApiStats.lambda$wrap$0(HiveMetastoreApiStats.java:42)
	at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
	at com.facebook.presto.hive.metastore.ThriftHiveMetastore.createTable(ThriftHiveMetastore.java:385)
	... 19 more

presto> 

As you can see no information about what S3 operation is being executed but the user should have all the required permissions.

As a side note, test schema creation fails with similar error but i managed to sort it out by creating the S3 folder for its location in advance.

How can I debug/solve this problem? I've tried many many things in the course of the last few days and nothing worked. Honestly i'm running out of ideas.

Thank you so much for such an awesome tool!
Cheers,

Gaston.

@ktonga
Copy link
Author

ktonga commented Feb 12, 2018

Ok, here I have some further information after a few experiments i performed.

If i run create schema and create table against a more permissive bucket (no encryption enforcement) all goes great. But, if the bucket has the following content in the policy (something we learned from here):

{
    "Version": "2012-10-17",
    "Id": "PutObjPolicy",
    "Statement": [
        {
            "Sid": "DenyUnEncryptedObjectUploads",
            "Effect": "Deny",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::my-bucket/*",
            "Condition": {
                "StringNotEquals": {
                    "s3:x-amz-server-side-encryption": "AES256"
                }
            }
        }
    ]
}

Both create statements fail, I can workaround the schema one by creating the S3 "folder" but if I try to do the same with the table one it then fails with "folder already exists" error.

What i think is happening here is that the policy in our prod-like buckets requires SSE enabled for EVERY S3 put action, and presto is only setting it on files-like S3 paths but not doing so for folder-like paths. Am I right?

We really need this working to proceed with the integration of Presto into our data pipeline, i'd love to get this problem fixed, so I'm offering myself to help in whatever I can.

Thanks.

@findepi
Copy link
Contributor

findepi commented Feb 12, 2018

cc @sopel39

@kokosing
Copy link
Contributor

It seems to very similar issue to the one we recently experienced. Can you try to run CREATE TABLE AS SELECT and post the stacktrace here? If we post a PR with a potential fix, would you be able to test it?

@sopel39
Copy link
Contributor

sopel39 commented Feb 12, 2018

It is similar to: #9916, but not quite. Apparently Metastore is not able to create table in destination directory. I would guess that some configuration on Hive/HDFS side is lacking. I don't know which Hadoop distro you use, but you could try enabling SSE in Hive/Hdfs similarly as here:
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/encryption.html

@sopel39
Copy link
Contributor

sopel39 commented Feb 12, 2018

Can you create table with Hive only?

@ktonga
Copy link
Author

ktonga commented Feb 12, 2018

@kokosing as i was expecting create table as worked, i guess it's b/c no intermediate step is performed to create the empty "folder" to represent the table, instead, only the file with the result of the select is created using SSE together with the whole "path".

@ktonga
Copy link
Author

ktonga commented Feb 12, 2018

@sopel39 same thing with Hive, I can only create external tables from existing files in S3, trying to create a writable table using hive directly failed with the same error. But I didn't bother to try to configure hive since I was counting on presto flag for SSE to override hive config, which happens only in the case of creating a non-empty table using create table as

@kokosing
Copy link
Contributor

Have you checked if you are able create empty s3 directory with other tools?

@ktonga
Copy link
Author

ktonga commented Feb 13, 2018

@kokosing I can using aws s3 if I include the --sse flag. The thing is that in s3 there are no folders, everything is just an object with a path-ish identifier. Even when I create a "folder" using AWS Console I have to create it with encryption, otherwise it's rejected by the security policy i mentioned before.

@kokosing
Copy link
Contributor

As far I know Presto do not create any directory for table during CREATE TABLE. Presto creates table in hive metastore and it looks like hive is trying to create a directory for table in s3. Try to investigate this problem in hive.

Have you tried to run CREATE TABLE ... AS SELECT ... WITH NO DATA?

@nezihyigitbasi
Copy link
Contributor

Did you figure out the problem? This error is coming from the Hive metastore (ThriftHiveMetastore$Client.recv_create_table receives the result of the RPC call from the metastore service and the result is the S3 error). You should find a way to configure the Hive side with sse support.

@ktonga
Copy link
Author

ktonga commented Mar 1, 2018

Yeah, sorry. I started using create as select kinda queries so not having this problem anymore. But i was researching a bit before that and i think just adding

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
</property>

To core-site would solve the problems.

Closing the issue since it's not presto's fault.

Thank everyone for the support.

@ktonga ktonga closed this as completed Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants