Improve performance of token sort #3823

codyebberson · 2024-01-27T01:09:27Z

We found a particularly poor performing query today.

The FHIR query was: Observation?_sort=code, all observations sorted by "code"

This was run in a Medplum project with over 3 million Observation resources.

That FHIR query translated to the following SQL:

SELECT "Observation"."id", "Observation"."content"
FROM "Observation"
INNER JOIN "Observation_Token" AS "T1"
ON ("Observation"."id" = "T1"."resourceId" AND "T1"."code" = $1 AND "T1"."code" = $2)
WHERE "Observation"."deleted" = $3 
AND "Observation"."compartments" IS NOT NULL 
AND "Observation"."compartments" && ARRAY[$4,$5]::UUID[]
GROUP BY "Observation"."id" 
ORDER BY MIN("T1"."value") 
LIMIT ?

The first red flag here is ORDER BY MIN("T1"."value") - when using an aggregate function like this, Postgres cannot take advantage of the index, and the performance tanks.

Why use an aggregate function in the first place? When joining on lookup tables, we need to ensure that we only return one row per resource. We need to handle the case where there are multiple matching rows in the lookup table.

This PR converts this into the following query:

SELECT "Observation"."id", "Observation"."content" 
FROM "Observation" 
INNER JOIN (
  SELECT DISTINCT ON ("Observation_Token"."resourceId") "Observation_Token"."resourceId", "Observation_Token"."value" 
  FROM "Observation_Token"
  WHERE "Observation_Token"."code" = $1
) AS "T1" ON "Observation"."id" = "T1"."resourceId" 
WHERE "Observation"."deleted" = $2 
AND "Observation"."compartments" IS NOT NULL 
AND "Observation"."compartments" && ARRAY[$3,$4]::UUID[]
ORDER BY "T1"."value" 
LIMIT ?

This moves the "only one resource" logic into the subquery rather than in the outer query.

Postgres 10+ is quite smart about aggregating this: https://www.percona.com/blog/sql-optimizations-in-postgresql-in-vs-exists-vs-any-all-vs-join/

medplum=# EXPLAIN
medplum-# SELECT "Observation"."id", "Observation"."content"
medplum-# FROM "Observation"
medplum-# INNER JOIN "Observation_Token" AS "T1" ON ("Observation"."id" = "T1"."resourceId" AND "T1"."code" = 'code')
medplum-# WHERE ("Observation"."deleted" = false AND ("Observation"."compartments" IS NOT NULL AND "Observation"."compartments" && ARRAY['88bf1215-fe40-4dc6-80cb-a9540ddd9
8a5']::UUID[]))
medplum-# GROUP BY "Observation"."id"
medplum-# ORDER BY MIN("T1"."value")
medplum-# LIMIT 20;
                                                                                QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=38491.98..38492.03 rows=20 width=1237)
   ->  Sort  (cost=38491.98..38559.82 rows=27133 width=1237)
         Sort Key: (min("T1".value))
         ->  Finalize GroupAggregate  (cost=32851.46..37769.98 rows=27133 width=1237)
               Group Key: "Observation".id
               ->  Gather Merge  (cost=32851.46..37318.77 rows=35976 width=1237)
                     Workers Planned: 2
                     ->  Partial GroupAggregate  (cost=31851.44..32166.23 rows=17988 width=1237)
                           Group Key: "Observation".id
                           ->  Sort  (cost=31851.44..31896.41 rows=17988 width=1214)
                                 Sort Key: "Observation".id
                                 ->  Parallel Hash Join  (cost=6209.40..21049.66 rows=17988 width=1214)
                                       Hash Cond: ("Observation".id = "T1"."resourceId")
                                       ->  Parallel Seq Scan on "Observation"  (cost=0.00..14694.65 rows=11305 width=1205)
                                             Filter: ((NOT deleted) AND (compartments IS NOT NULL) AND (compartments && '{88bf1215-fe40-4dc6-80cb-a9540ddd98a5}'::uuid[]))
                                       ->  Parallel Hash  (cost=5622.84..5622.84 rows=46925 width=25)
                                             ->  Parallel Seq Scan on "Observation_Token" "T1"  (cost=0.00..5622.84 rows=46925 width=25)
                                                   Filter: (code = 'code'::text)
(18 rows)

medplum=# EXPLAIN
medplum-# SELECT
medplum-#     "Observation"."id",
medplum-#     "Observation"."content"
medplum-# FROM
medplum-#     "Observation"
medplum-# INNER JOIN (
medplum(#     SELECT DISTINCT ON ("resourceId")
medplum(#         "resourceId",
medplum(#         "value"
medplum(#     FROM
medplum(#         "Observation_Token"
medplum(#     WHERE
medplum(#         "code" = 'code'
medplum(#     ORDER BY
medplum(#         "resourceId",
medplum(#         "value" -- or whatever criteria you want for choosing the row
medplum(# ) AS "T1" ON "Observation"."id" = "T1"."resourceId"
medplum-# WHERE
medplum-#     "Observation"."deleted" = false
medplum-#     AND "Observation"."compartments" IS NOT NULL
medplum-#     AND "Observation"."compartments" && ARRAY['88bf1215-fe40-4dc6-80cb-a9540ddd98a5']::UUID[]
medplum-# ORDER BY
medplum-#     "T1"."value"
medplum-# LIMIT 20;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=37930.03..37930.08 rows=20 width=1214)
   ->  Sort  (cost=37930.03..37979.31 rows=19713 width=1214)
         Sort Key: "T1".value
         ->  Hash Join  (cost=22123.49..37405.47 rows=19713 width=1214)
               Hash Cond: ("Observation".id = "T1"."resourceId")
               ->  Seq Scan on "Observation"  (cost=0.00..15210.76 rows=27133 width=1205)
                     Filter: ((NOT deleted) AND (compartments IS NOT NULL) AND (compartments && '{88bf1215-fe40-4dc6-80cb-a9540ddd98a5}'::uuid[]))
               ->  Hash  (cost=21480.69..21480.69 rows=51424 width=25)
                     ->  Subquery Scan on "T1"  (cost=20403.34..21480.69 rows=51424 width=25)
                           ->  Unique  (cost=20403.34..20966.45 rows=51424 width=25)
                                 ->  Sort  (cost=20403.34..20684.90 rows=112621 width=25)
                                       Sort Key: "Observation_Token"."resourceId", "Observation_Token".value
                                       ->  Seq Scan on "Observation_Token"  (cost=0.00..8258.81 rows=112621 width=25)
                                             Filter: (code = 'code'::text)
(14 rows)

Analysis from ChatGPT:

To determine which query will be more performant, let's analyze and compare the key components of their execution plans:

Query 1 (with `GROUP BY` and `ORDER BY MIN("T1"."value")`)

Cost Estimate: The total cost estimate goes from 38491.98 to 38559.82. This is relatively high, indicating a potentially more expensive operation.
Sort and GroupAggregate: It performs a sort operation for the GROUP BY and then a GroupAggregate. These operations, especially on large datasets, can be computationally expensive.
Parallel Execution: The query uses parallel processing (Parallel Seq Scan and Parallel Hash). Parallel processing can improve performance on systems with multiple CPU cores.

Query 2 (with `DISTINCT ON` in a Subquery)

Cost Estimate: The total cost estimate ranges from 37930.03 to 37979.31, which is slightly lower than the first query.
Hash Join and Seq Scan: This query also performs a hash join, but it relies on a subquery with a DISTINCT ON operation. This approach is generally more efficient than a GroupAggregate for selecting distinct rows.
Subquery Scan and Unique: The subquery applies a Unique operation after sorting, which is typically more efficient than grouping and aggregating.

Performance Comparison and Insights

Cost Estimates: The cost estimates for the second query are slightly lower than the first, suggesting it might be more efficient. However, the difference is not substantial.
Operation Complexity: The first query's use of GroupAggregate is typically more expensive than the second query's use of Unique after sorting in the subquery.
Parallelism: Both queries use parallel processing, which can improve performance on multi-core systems.

Conclusion

Based on the execution plans, the second query (with DISTINCT ON in the subquery) is likely to be more performant than the first query, primarily due to the more efficient handling of distinct values and potentially less costly operations.

vercel · 2024-01-27T01:09:35Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Ignored Deployments

Name	Status	Updated (UTC)
medplum-app	⬜️ Ignored (Inspect)	Jan 27, 2024 1:09am
medplum-storybook	⬜️ Ignored (Inspect)	Jan 27, 2024 1:09am
medplum-www	⬜️ Ignored (Inspect)	Jan 27, 2024 1:09am

github-actions · 2024-01-27T01:18:40Z

	Messages
📖	@medplum/core: 153.9 kB
📖	@medplum/react: 337.7 kB

Generated by 🚫 dangerJS against c87ddb2

sonarcloud · 2024-01-27T01:22:04Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Fixes #3794 - MeasureReport.period search (#3850) Extra check for vmcontext bots (#3863) Add and use vite-plugin-turbosnap (#3849) Downgrade chromatic (#3848) Repo sql fixes for cockroachdb (#3844) Remove Health Gorilla from medplum-demo-bots (#3845) fix-3815 cache presigned s3 binary urls (#3834) Use tsvector index for token text search (#3791) rate limit should return `OperationOutcome` (#3843) Add global var "module" to vm context bots (#3842) Fix lookup table tsv indexes (#3841) Always use estimate count first (#3840) Disambiguate getClient (#3839) Fix invalid mermaid graph in diagnostic catalog docs (#3836) fix-3809 race condition in Subscription extension fhir-path-criteria-expression %previous value lookup (#3810) Fix Sonar code smells: mark React props readonly (#3832) RDS proxy (#3827) Fixed lookup tables in migration generator (#3830) Fixed deprecated jest matchers (#3831) Update README.md (#3828) Update fhir-basics.md (#3829) Case study content and images (#3820) Added rdsReaderInstanceType and RDS upgrade docs (#3826) Dependency upgrades (#3825) Separate search popup menus for 'text' and 'token' (#3824) Improve performance of token sort (#3823) Additional logging (#3790) Fix calendar input button style (#3817) Don't add _total default in SearchControl (#3818) Dark mode (#3814) Fixes #3812 - FHIR profile cache bug (#3813) Document using medplum client to integrate with external FHIR servers (#3811) Use specific advisory locks (#3805) Nested transactions (#3788) Fix signin page on graphiql (#3802) fix(heartbeat): start heartbeat on first bind to sub (#3793) Fix async job tests (#3795) Document using vm context bots (#3784) Refactored access policy docs based on customer feedback (#3785) Support Redis TLS config from Env (#3787) feat(subscriptions): add `heartbeat` for WS subs (#3740) Update Bot metrics (#3763)

Improve performance of token sort

c87ddb2

codyebberson requested a review from a team as a code owner January 27, 2024 01:09

github-actions bot assigned codyebberson Jan 27, 2024

reshmakh added the search Features and fixes related to search label Jan 27, 2024

reshmakh added this to the January 31st, 2024 milestone Jan 27, 2024

reshmakh approved these changes Jan 27, 2024

View reviewed changes

rahul1 approved these changes Jan 27, 2024

View reviewed changes

codyebberson merged commit 9c35dc2 into main Jan 27, 2024
18 of 19 checks passed

codyebberson deleted the cody-token-sort-performance branch January 27, 2024 17:05

medplumbot mentioned this pull request Jan 31, 2024

Release Version 3.0.2 #3864

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of token sort #3823

Improve performance of token sort #3823

codyebberson commented Jan 27, 2024

vercel bot commented Jan 27, 2024

github-actions bot commented Jan 27, 2024

sonarcloud bot commented Jan 27, 2024

Improve performance of token sort #3823

Improve performance of token sort #3823

Conversation

codyebberson commented Jan 27, 2024

Query 1 (with GROUP BY and ORDER BY MIN("T1"."value"))

Query 2 (with DISTINCT ON in a Subquery)

Performance Comparison and Insights

Conclusion

vercel bot commented Jan 27, 2024

github-actions bot commented Jan 27, 2024

sonarcloud bot commented Jan 27, 2024

Quality Gate passed

Query 1 (with `GROUP BY` and `ORDER BY MIN("T1"."value")`)

Query 2 (with `DISTINCT ON` in a Subquery)