-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use CTEs instead of UNIONs to batch queries. #638
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We also know only batch structurally exact queries, which means all queries that batched together will touch the same tables, have the came join conditions, etc., they only differ in the parameters in the WHERE clauses. This capability is faciliated by the ParsedFindQuery, which gives us an easy-to-work with AST that we can drop values from, then JSON.stringify it, to group together similar-structure-differnet-value queries.
stephenh
changed the title
feat: Use CTE instead of UNIONs to batch queries.
feat: Use CTEs instead of UNIONs to batch queries.
May 13, 2023
stephenh
pushed a commit
that referenced
this pull request
May 16, 2023
# [1.77.0](v1.76.3...v1.77.0) (2023-05-16) ### Features * Use CTEs instead of UNIONs to batch queries. ([#638](#638)) ([b37f61a](b37f61a))
🎉 This PR is included in version 1.77.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR changes Joist's "auto-batch em.find/SELECT queries" from using UNION ALLs to a CTE of VALUES.
Previously Joist would batch two
em.find
s into a single SQL query that the two potentially random queries (other than being for the same entity) stitched together with aUNION ALL
:The
__tag
syntax was used to tell whichem.find
a returned row belonged two, i.e. if we had threeem.find
s in this batch, we'd want to know the 2ndunion all
's rows should go to the 2ndem.find
s result set.This
UNION ALL
approach seemed fine in theory, but resulted in pretty gnarly queries, and indeed in practice we found it didn't scale once N got to ~50 or so. It makes sense that auto-batch 1,000 queries at once is probably a bad idea, but 50 seems reasonable.One good aspect of the
UNION ALL
approach is that it let each query have its own order by, limit, and offset, because each of those would be applied within eachUNION ALL
.In the new approach, we don't use
UNION
s, and instead leverage the recentparseFindQuery
infra to only batch queries by entity (as before) but also by entity + by shape of the query (new to this approach). With this constraint, that all queries in the batch have the same tables + joins + filters, we can more effectively make them "one query", but with the aggregatedWHERE
clause represented as aJOIN
:The reason for the CTE of
VALUES
+JOIN
, instead of justOR
-ing all of the parameters is that, withOR
, if a row matches multipleem.find
s, it will only be returned once; but b/c we've addedtag
as a column to theVALUES
, and eachem.find
caller gets its own tag, the same row will be returned multiples times (although wearray_agg
the tag to avoid actually returning the row twice), once per time it matched one of theem.find
s.Surprisingly enough, operators like
<=
even work inJOIN
clauses, using a feature called "non-equi joins".But, if callers want to use limits and offsets, they'll have to use the new
findPaginated
, which will not auto-batch and will N+1 if called in a loop. This should be fine b/c typicallyfindPaginated
calls would be at the top-most level of an API endpoint/graphql query.Fixes #441