Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict check of the s3 directory object content type. #20310

Closed
bkyryliuk opened this issue Jul 14, 2023 · 2 comments · Fixed by #20603 or #21027
Closed

Strict check of the s3 directory object content type. #20310

bkyryliuk opened this issue Jul 14, 2023 · 2 comments · Fixed by #20603 or #21027

Comments

@bkyryliuk
Copy link

It would be great to relax the content type requirement for the folder here: https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java#L374C31-L374C31

Currently as I understand databricks unity creates the directory s3 objects with application/octet-stream metadata, not application/x-directory and that leads to the failure of creating the external delta tables and accessing those through presto.

Presto error msg:
presto error: External location must be a directory

I've confirmed that it is a root cause via copying the directory object and changing it's content type. Doing that and copying other files to it results in a successful creation of the external table and ability to query it.

Example

CREATE TABLE bogdankyryliuk.presto_delta_table_from_prod (number_of_rows INT)
WITH (external_location = 's3a://{bucket}/metastore/f3cba8bc-0d7d-47b8-b16c-ee9ec0682680/tables/60652537-cf0a-4ddb-8f18-3146838a90d0');
-- fails with presto error: External location must be a directory
@huhlig
Copy link

huhlig commented Jul 25, 2023

Was able to reproduce the above. S3 Browser when uploading folders of data also creates directories with application/octet-stream. This makes presto break with the same error as above. If you change the folder types to application/x-directory it then works fine. S3 Browser appears to be using different metadata to determine files from folder as well so this likely needs to be loosened up a bit for user ergonomics.

@aaneja
Copy link
Contributor

aaneja commented Jul 27, 2023

cc : @imjalpreet @agrawalreetika for visibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
4 participants