-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of token sort #3823
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎ |
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
reshmakh
approved these changes
Jan 27, 2024
rahul1
approved these changes
Jan 27, 2024
Merged
medplumbot
added a commit
that referenced
this pull request
Jan 31, 2024
Fixes #3794 - MeasureReport.period search (#3850) Extra check for vmcontext bots (#3863) Add and use vite-plugin-turbosnap (#3849) Downgrade chromatic (#3848) Repo sql fixes for cockroachdb (#3844) Remove Health Gorilla from medplum-demo-bots (#3845) fix-3815 cache presigned s3 binary urls (#3834) Use tsvector index for token text search (#3791) rate limit should return `OperationOutcome` (#3843) Add global var "module" to vm context bots (#3842) Fix lookup table tsv indexes (#3841) Always use estimate count first (#3840) Disambiguate getClient (#3839) Fix invalid mermaid graph in diagnostic catalog docs (#3836) fix-3809 race condition in Subscription extension fhir-path-criteria-expression %previous value lookup (#3810) Fix Sonar code smells: mark React props readonly (#3832) RDS proxy (#3827) Fixed lookup tables in migration generator (#3830) Fixed deprecated jest matchers (#3831) Update README.md (#3828) Update fhir-basics.md (#3829) Case study content and images (#3820) Added rdsReaderInstanceType and RDS upgrade docs (#3826) Dependency upgrades (#3825) Separate search popup menus for 'text' and 'token' (#3824) Improve performance of token sort (#3823) Additional logging (#3790) Fix calendar input button style (#3817) Don't add _total default in SearchControl (#3818) Dark mode (#3814) Fixes #3812 - FHIR profile cache bug (#3813) Document using medplum client to integrate with external FHIR servers (#3811) Use specific advisory locks (#3805) Nested transactions (#3788) Fix signin page on graphiql (#3802) fix(heartbeat): start heartbeat on first bind to sub (#3793) Fix async job tests (#3795) Document using vm context bots (#3784) Refactored access policy docs based on customer feedback (#3785) Support Redis TLS config from Env (#3787) feat(subscriptions): add `heartbeat` for WS subs (#3740) Update Bot metrics (#3763)
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jan 31, 2024
Fixes #3794 - MeasureReport.period search (#3850) Extra check for vmcontext bots (#3863) Add and use vite-plugin-turbosnap (#3849) Downgrade chromatic (#3848) Repo sql fixes for cockroachdb (#3844) Remove Health Gorilla from medplum-demo-bots (#3845) fix-3815 cache presigned s3 binary urls (#3834) Use tsvector index for token text search (#3791) rate limit should return `OperationOutcome` (#3843) Add global var "module" to vm context bots (#3842) Fix lookup table tsv indexes (#3841) Always use estimate count first (#3840) Disambiguate getClient (#3839) Fix invalid mermaid graph in diagnostic catalog docs (#3836) fix-3809 race condition in Subscription extension fhir-path-criteria-expression %previous value lookup (#3810) Fix Sonar code smells: mark React props readonly (#3832) RDS proxy (#3827) Fixed lookup tables in migration generator (#3830) Fixed deprecated jest matchers (#3831) Update README.md (#3828) Update fhir-basics.md (#3829) Case study content and images (#3820) Added rdsReaderInstanceType and RDS upgrade docs (#3826) Dependency upgrades (#3825) Separate search popup menus for 'text' and 'token' (#3824) Improve performance of token sort (#3823) Additional logging (#3790) Fix calendar input button style (#3817) Don't add _total default in SearchControl (#3818) Dark mode (#3814) Fixes #3812 - FHIR profile cache bug (#3813) Document using medplum client to integrate with external FHIR servers (#3811) Use specific advisory locks (#3805) Nested transactions (#3788) Fix signin page on graphiql (#3802) fix(heartbeat): start heartbeat on first bind to sub (#3793) Fix async job tests (#3795) Document using vm context bots (#3784) Refactored access policy docs based on customer feedback (#3785) Support Redis TLS config from Env (#3787) feat(subscriptions): add `heartbeat` for WS subs (#3740) Update Bot metrics (#3763)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We found a particularly poor performing query today.
The FHIR query was:
Observation?_sort=code
, all observations sorted by "code"This was run in a Medplum project with over 3 million
Observation
resources.That FHIR query translated to the following SQL:
The first red flag here is
ORDER BY MIN("T1"."value")
- when using an aggregate function like this, Postgres cannot take advantage of the index, and the performance tanks.Why use an aggregate function in the first place? When joining on lookup tables, we need to ensure that we only return one row per resource. We need to handle the case where there are multiple matching rows in the lookup table.
This PR converts this into the following query:
This moves the "only one resource" logic into the subquery rather than in the outer query.
Postgres 10+ is quite smart about aggregating this: https://www.percona.com/blog/sql-optimizations-in-postgresql-in-vs-exists-vs-any-all-vs-join/
Analysis from ChatGPT:
To determine which query will be more performant, let's analyze and compare the key components of their execution plans:
Query 1 (with
GROUP BY
andORDER BY MIN("T1"."value")
)Cost Estimate: The total cost estimate goes from
38491.98
to38559.82
. This is relatively high, indicating a potentially more expensive operation.Sort and GroupAggregate: It performs a sort operation for the
GROUP BY
and then aGroupAggregate
. These operations, especially on large datasets, can be computationally expensive.Parallel Execution: The query uses parallel processing (
Parallel Seq Scan
andParallel Hash
). Parallel processing can improve performance on systems with multiple CPU cores.Query 2 (with
DISTINCT ON
in a Subquery)Cost Estimate: The total cost estimate ranges from
37930.03
to37979.31
, which is slightly lower than the first query.Hash Join and Seq Scan: This query also performs a hash join, but it relies on a subquery with a
DISTINCT ON
operation. This approach is generally more efficient than aGroupAggregate
for selecting distinct rows.Subquery Scan and Unique: The subquery applies a
Unique
operation after sorting, which is typically more efficient than grouping and aggregating.Performance Comparison and Insights
Cost Estimates: The cost estimates for the second query are slightly lower than the first, suggesting it might be more efficient. However, the difference is not substantial.
Operation Complexity: The first query's use of
GroupAggregate
is typically more expensive than the second query's use ofUnique
after sorting in the subquery.Parallelism: Both queries use parallel processing, which can improve performance on multi-core systems.
Conclusion
Based on the execution plans, the second query (with
DISTINCT ON
in the subquery) is likely to be more performant than the first query, primarily due to the more efficient handling of distinct values and potentially less costly operations.