You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from slack:
Hi team, we have recently performed an upgrade from Facebook Presto 215 to PrestoSQL 327, after the migration, we found one of the queries scanning 2~3x more data. Any idea why?
martin 1 month ago
Can you provide the output of EXPLAIN for both queries?
david 1 month ago
And EXPLAIN ANALYZE as well, if possible, as that should show where it is scanning more data. It's possible this is an issue with how the accounting/reporting happens (but we should understand why)
martin 1 month ago
Also, what format are you using for your data? What’s your storage backend, S3, HDFS or other?
Jiawei Zhang 1 month ago
We are using parquet data from S3, metastore is hive.
Jiawei Zhang 1 month ago
Cluster is using all default config
sopel39 1 month ago
In PrestoSQL INPUT data is actually data after decompression. You can check Query JSON to actually see how much physical data was read
Jiawei Zhang 1 month ago
I checked the query json, decompressed data size is also doubled in the 327 version:
Jiawei Zhang 1 month ago
JavaScript/JSON
327.json 327.json.txt
Jiawei Zhang 1 month ago
JavaScript/JSON
215.json 215.json.txt
Jiawei Zhang 1 month ago
We also noticed that some tasks scan rows with a speed of 1/4 or less than other tasks. That does not happen on specific worker node.
sopel39 1 month ago
indeed, 327 reads more data (409.45GB vs 243.32GB) . Is your file format Parquet?
sopel39 1 month ago
what is the query duration?
Jiawei Zhang 1 month ago
Yes file is parquet format. Query lasted ~40s on FB v215 and ~10min on 327
martin 1 month ago
Can you try with Presto 324? I’d like to see if this might be related to some changes that went into version 325.
Jiawei Zhang 1 month ago
I could try it in next few days.
Jiawei Zhang 17 days ago
Not yet. Still getting a new cluster.
The text was updated successfully, but these errors were encountered:
from slack:
Hi team, we have recently performed an upgrade from Facebook Presto 215 to PrestoSQL 327, after the migration, we found one of the queries scanning 2~3x more data. Any idea why?
martin 1 month ago
Can you provide the output of EXPLAIN for both queries?
david 1 month ago
And EXPLAIN ANALYZE as well, if possible, as that should show where it is scanning more data. It's possible this is an issue with how the accounting/reporting happens (but we should understand why)
martin 1 month ago
Also, what format are you using for your data? What’s your storage backend, S3, HDFS or other?
Jiawei Zhang 1 month ago
We are using parquet data from S3, metastore is hive.
Jiawei Zhang 1 month ago
explain analyze 215
Jiawei Zhang 1 month ago
explain analyze 327
Jiawei Zhang 1 month ago
Cluster is using all default config
sopel39 1 month ago
In PrestoSQL INPUT data is actually data after decompression. You can check Query JSON to actually see how much physical data was read
Jiawei Zhang 1 month ago
I checked the query json, decompressed data size is also doubled in the 327 version:
Jiawei Zhang 1 month ago
JavaScript/JSON
327.json
327.json.txt
Jiawei Zhang 1 month ago
JavaScript/JSON
215.json
215.json.txt
Jiawei Zhang 1 month ago
We also noticed that some tasks scan rows with a speed of 1/4 or less than other tasks. That does not happen on specific worker node.
sopel39 1 month ago
indeed, 327 reads more data (409.45GB vs 243.32GB) . Is your file format Parquet?
sopel39 1 month ago
what is the query duration?
Jiawei Zhang 1 month ago
Yes file is parquet format. Query lasted ~40s on FB v215 and ~10min on 327
martin 1 month ago
Can you try with Presto 324? I’d like to see if this might be related to some changes that went into version 325.
Jiawei Zhang 1 month ago
I could try it in next few days.
Jiawei Zhang 17 days ago
Not yet. Still getting a new cluster.
The text was updated successfully, but these errors were encountered: