Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add option to list all objects under an S3 location prefix. #3221
Using hive.recursive-directories=true results in one S3 listObjects call per subdirectory under a partition location. These calls are made serially, which can be significant for short duration queries. Setting
Rather than leaking details of S3 into the split loader, we should switch to this FS call that does recursive natively and implement it in S3 FS: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#listFiles-org.apache.hadoop.fs.Path-boolean-
With this call, we should be able to simplify the recursive code, since it’s all done by the file system.