New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding tuned indexes for query plan analysis #446
Conversation
@@ -29,6 +29,7 @@ translations: | |||
author: | |||
name: Noelia Donato | |||
link: https://twitter.com/vamoacodear | |||
- language: Espa帽ol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having trouble populating the blog post cache without this change. Maybe it was just me though 馃檪 It seemed consistent with the structure here, so I kept it for reference in case it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's definitely making the cache update fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I'm excited to see how this improves query performance.
@@ -29,6 +29,7 @@ translations: | |||
author: | |||
name: Noelia Donato | |||
link: https://twitter.com/vamoacodear | |||
- language: Espa帽ol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That's definitely making the cache update fail.
Also, amazing analysis. Thanks for going through all the explanation! |
Happy to help! I enjoyed diving in further. If you get the time, let me know how they impact the times you're logging. It's difficult to measure the real impact without the same data and infrastructure, so I'm curious how it turns out 馃槃 |
Hi Kent 馃憢馃徎 After our discussion about indexes after Remix Conf, I took a closer look at the queries you pointed out. Here's some indexes I would recommend trying. I verified these using
EXPLAIN QUERY PLAN
, which I'll go into details about below. That said, I don't think my local environment is representative of your production system, so I'd keep an eye on your timing numbers and see how things compare. The full write-up is quite long as I walked through how I tested these.Two things to keep in mind if you choose to implement these would be extra disk usage on your SQLite database and additional write overhead. I don't expect either of these to be concerning for your workload, but I wanted to call out the trade-off.
Let me know if you have any questions and thanks for all the fantastic work!
Analysis
getActiveMembers
getActiveMembers
Prisma query
Based on the Prisma operation, we can see that
User.team
andPostRead.createdAt
are index candidates. A count in SQLite (and many other engines) will always result in some scanning, but our goal is to reduce that to only relevant data such that every row scanned is part of the final count. In other words, we don't want to read a row that doesn't match thewhere
here.SQL query
The SQL aligns with our assessment above. Nothing additional to note here.
EXPLAIN QUERY PLAN
beforeThe original query plan showed that
PostRead
(j0
) was read in full with aSCAN
. This would imply that thecreatedAt
filter was applied during the scan.PostRead
outputsuserId
, which is then used to matchUser
twice: once for theJOIN
and again for the finalCOUNT
. WhileUser
appears to be aSEARCH
(rather thanSCAN
), the query plan omits theteam
filter and we can infer that it is applied during the outerSEARCH
.Why does it show a
SEARCH
if we have yet to filter byteam
? Because the query loops overuserId
values returned by the subquery and performs an index search for each.SEARCH
does not always imply an optimal query. In SQL Server world we call this a residual scan.Relevant Indexes
We'll add two indexes. One on
User.team
will improve the outer query's performance by limiting to only relevant records for the count. One onPostRead.createdAt
will similarly improve the inner query, but we'll also includePostRead.userId
in that index to make it a covering index. We need theuserId
to match on the outer query and including it allows the database engine to leverage the value in the results without an additional read to thePostRead
table data.EXPLAIN QUERY PLAN
afterNow our query plan has eliminated the
SCAN
with a targeted covering index, as well as a targeted index for the outer count. While SQLite doesn't show the amount of reads ops for plans, we can deduce that we are now only touching records that contribute to the count 馃檶馃徎getRecentReads
getRecentReads
Prisma query
Based on the Prisma operation, we can see that
PostRead.postSlug
,PostRead.createdAt
andUser.team
are index candidates. Note thatPostRead.createdAt
is an inequality, so there will be some residual scanning. Our goal once again is to reduce that scan to only the relevant records.SQL query
The SQL aligns with our assessment above. Nothing additional to note here.
EXPLAIN QUERY PLAN
beforeThe original query plan showed that
User
(j0
) was was read in full with aSCAN
. This would imply that theteam
filter was applied during the scan.User.id
is then used to matchPostRead
(t0
) using the existing index@@index(userId)
. This match is then used to filterPostRead
again in theouter query, but we see that the outer
SEARCH
does not includepostSlug
orcreatedAt
in the criteria. We can again infer that these are applied in memory with a residual scan.Relevant Indexes
We'll add three indexes to optimize this query, but note that one is a repeat of above. One on
User.team
will improve the inner query's performance by limiting to only relevant records for the join. One onPostRead.userId
is used by the inner query'sJOIN
.One on
PostRead.postSlugand
PostRead.createdAt` will narrow the outer query, but in this case order is important.If we flip the order to
@@index([createdAt, postSlug])
, then the query plan will not utilize the index. Why is that? Remember the callout thatPostRead.createdAt
is filter on an inequality? Inequalities can only be partially utilized asdefined here. Index columns must be utilized from left to right, so having
createdAt
first with partial utilization eliminates our ability to utilizepostSlug
as an equality match (aSEARCH
).EXPLAIN QUERY PLAN
afterOur new query plan has replaced the inner
SCAN
with a specificSEARCH
onUser.team
, followed by anotherSEARCH
to join withPostRead.userId
. Finally, the outer query is also aSEARCH
onpostSlug
andcreatedAt
. We've narrowed the results down to the absolute minimum to perform the count 馃帀getBlogReadRankings
getBlogReadRankings
getBlogReadRankings
is nearly identical to optimizinggetRecentReads
, but without thePostRead.createdAt
filter. This is great, because it will leverage the same indexes we created above. SincePostRead.createdAt
was the rightmost column in our index, the index is still fully utilized 馃帄Rather than repeat the whole process from above, I'll leave keep this one short
馃槃
getBlogRecommendations
getBlogRecommendations
This one is particularly interesting because there are two paths to optimize separately:
user
andclientId
. I'll do my best to consolidate them here 馃檪Prisma query
Based on the Prisma operation, we can see that
PostRead.postSlug
is a good index candidate. Depending on the ternary, eitherPostRead.userId
orPostRead.clientId
would be useful as well.SQL query
The SQL changes with the ternary, but follows this pattern.
EXPLAIN QUERY PLAN
beforeThe original query plan showed that the existing indexes for
PostRead.clientId
andPostRead.userId
are being utilized. Awesome! But notice thePostRead.postSlug
filter is missing. Let's see what happens if we fix that.Relevant Indexes
We'll update the existing
PostRead
indexes to includepostSlug
as the right-most column. Order is important here, since any query that was previously utilizing the existing indexes (likegetReaderCount
andgetSlugReadsByUser
) can still use the new ones this way. Once again, we'll leverage the fact that index columns are utilized left-to-right.EXPLAIN QUERY PLAN
after馃く our new query plan not only replaced the
SEARCH
with a covering index, but also eliminated theB-TREE FOR GROUP BY
operation. I suppose this is because indexes are ordered, so the optimizer knows that uniquepostSlug
values will be grouped together and can use that to avoid in-memory grouping.Why is the
postSlug
not shown in the index filters? Probably because it's an inverted condition (NOT IN
) so it's more efficient to match it in-memory instead. It's still valuable to includepostSlug
in the index because of the change to a covering index, which eliminates the additional table lookup and orders the values as mentioned above.