-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default value of Glue GetPartitions MaxResults #3024
Add default value of Glue GetPartitions MaxResults #3024
Conversation
Discussed in: #2996 |
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/glue/GlueHiveMetastore.java
Show resolved
Hide resolved
Thanks a lot @l1x. We are personally hit hard by this issue since we want to move to Presto from Athena but the planning performance was abysmal for tables with significant number of partitions (>10k). Just wanted to let you know I appreciate this. PS: I came here when I myself was digging into this issue and came to the same conclusion that you did. |
AWS doesn't recommend any value but in practice from a small script I made to test I can see that there are no downsides to setting this to the max of 1000. The total time with a smaller value like 128 vs the max value is appreciable (3x slower for fetching almost 8k partitions). You can simulate this in a very crude way using the aws glue get-partitions --database-name db --table-name table --expression "dt>='2020-02-01'" --page-size 128
aws glue get-partitions --database-name db --table-name table --expression "dt>='2020-02-01'" --page-size 1000 The aws cli internally implements the |
@hashhar I am glad I could help! Let me know how it goes. We could bump the number even higher or make it configurable. |
@l1x We are now running PrestoSQL on production and it haven't seen any practical issues with the smaller page size for queries scanning upto 50k partitions. Will keep an eye out. |
Cherry pick of trinodb/trino#3024 and trinodb/trino#4938 Co-authored-by: Istvan <istvan@lambdainsight.com> Co-authored-by: Ashhar Hasan <hashhar_dev@outlook.com>
Cherry pick of trinodb/trino#3024 and trinodb/trino#4938 Co-authored-by: Istvan <istvan@lambdainsight.com> Co-authored-by: Ashhar Hasan <hashhar_dev@outlook.com>
Fixes #2996
As per our discussion with @findepi and @ebyhr here is the fix for using batched requests with AWS Glue.