Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use CTEs instead of UNIONs to batch queries. #638

Merged
merged 30 commits into from
May 16, 2023
Merged

Conversation

stephenh
Copy link
Collaborator

@stephenh stephenh commented May 10, 2023

This PR changes Joist's "auto-batch em.find/SELECT queries" from using UNION ALLs to a CTE of VALUES.

Previously Joist would batch two em.finds into a single SQL query that the two potentially random queries (other than being for the same entity) stitched together with a UNION ALL:

  (select *, -1 as __tag, -1 as __row from "authors" where "id" = $1)
  union all (select "a".*, 0 as __tag, row_number() over () as __row from "authors" as "a" where "a"."id" = $2 order by "a"."id" ASC, "a"."id" asc limit $3)
  union all (select "a".*, 1 as __tag, row_number() over () as __row from "authors" as "a" where "a"."id" = $4 order by "a"."age" DESC, "a"."id" asc limit $5) order by "__tag" asc",

The __tag syntax was used to tell which em.find a returned row belonged two, i.e. if we had three em.finds in this batch, we'd want to know the 2nd union all's rows should go to the 2nd em.finds result set.

This UNION ALL approach seemed fine in theory, but resulted in pretty gnarly queries, and indeed in practice we found it didn't scale once N got to ~50 or so. It makes sense that auto-batch 1,000 queries at once is probably a bad idea, but 50 seems reasonable.

One good aspect of the UNION ALL approach is that it let each query have its own order by, limit, and offset, because each of those would be applied within each UNION ALL.

In the new approach, we don't use UNIONs, and instead leverage the recent parseFindQuery infra to only batch queries by entity (as before) but also by entity + by shape of the query (new to this approach). With this constraint, that all queries in the batch have the same tables + joins + filters, we can more effectively make them "one query", but with the aggregated WHERE clause represented as a JOIN:

"WITH _find (tag, arg0) AS (VALUES ($1::int, $2::int), ($3, $4) )
SELECT array_agg(_find.tag) as _tags, "a".*
FROM authors as a
JOIN _find ON a.id = _find.arg0 GROUP BY "a".id
ORDER BY a.first_name DESC LIMIT 10000;",

The reason for the CTE of VALUES + JOIN, instead of just OR-ing all of the parameters is that, with OR, if a row matches multiple em.finds, it will only be returned once; but b/c we've added tag as a column to the VALUES, and each em.find caller gets its own tag, the same row will be returned multiples times (although we array_agg the tag to avoid actually returning the row twice), once per time it matched one of the em.finds.

Surprisingly enough, operators like <= even work in JOIN clauses, using a feature called "non-equi joins".

But, if callers want to use limits and offsets, they'll have to use the new findPaginated, which will not auto-batch and will N+1 if called in a loop. This should be fine b/c typically findPaginated calls would be at the top-most level of an API endpoint/graphql query.

Fixes #441

We also know only batch structurally exact queries, which means
all queries that batched together will touch the same tables, have
the came join conditions, etc., they only differ in the parameters
in the WHERE clauses.

This capability is faciliated by the ParsedFindQuery, which gives us
an easy-to-work with AST that we can drop values from, then
JSON.stringify it, to group together similar-structure-differnet-value
queries.
@stephenh stephenh changed the title wip feat: Use CTE instead of UNIONs to batch queries. May 13, 2023
@stephenh stephenh changed the title feat: Use CTE instead of UNIONs to batch queries. feat: Use CTEs instead of UNIONs to batch queries. May 13, 2023
@stephenh stephenh marked this pull request as ready for review May 15, 2023 01:42
@stephenh stephenh requested a review from zgavin May 15, 2023 19:42
@stephenh stephenh merged commit b37f61a into main May 16, 2023
@stephenh stephenh deleted the new-em-find branch May 16, 2023 00:06
stephenh pushed a commit that referenced this pull request May 16, 2023
# [1.77.0](v1.76.3...v1.77.0) (2023-05-16)

### Features

* Use CTEs instead of UNIONs to batch queries.  ([#638](#638)) ([b37f61a](b37f61a))
@stephenh
Copy link
Collaborator Author

🎉 This PR is included in version 1.77.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move away from UNION ALL for batch em.finds
1 participant