-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate query digests to enable caching query results #2645
Conversation
denizdemir
commented
Apr 7, 2015
- Query digest is generated from an optimized plan, and returned as part of the query results. If two digests are the same, the query results when executed must be identical.
- If the generated digest matches the digest provided by X-Presto-Digest, the query state machine is terminated with DIGEST_MATCHED state, which is added to QueryState as a new terminal state.
- Connectors compute the digest for the partitions or the table. Hive connector computes the digest based on the names and the last modification timestamp of either the partitions or the table, depending on the number of partitions that it needs to fetch the metadata.
- There are some randomizations in the plan generation for certain types of queries that result in different digests even for the same query, especially with joins.
- Query digests are logged as part of query completion event.
The commit messages are too long. Please use the standard format: http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html |
I tried this PR, but I can not query any unpartitioned table in hive. |
@yuananf, do you have any log lines? |
presto:orc> select * from lineitem limit 10; |
thanks @yuananf. I'll fix it. |
This needs to be rebased and migrated to the new TableLayout API |
Closing. We'll probably tackle this once the new optimizer is in place. I'm keeping the code under https://github.com/martint/presto/tree/query-digest for reference. |