Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add postgresql index and use IN instead of many OR #4670

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

trinity-1686a
Copy link
Contributor

Description

fix part of #4658 (but we'll keep that open to explore other indexes we may want to add)
add an index on splits(index_uid)

fix #4666
use index_uid IN (<list of index>) instead of index_uid = "index1" OR index_uid = "index2" OR ... when querying multiple index.

How was this PR tested?

tested by measuring time and checking query plan on an index with a few hundred thousand splits.
SELECT * FROM "splits" WHERE "index_uid" = $1 is made ~1900x faster on that particular dataset using the index (from 5.8s to 3.1ms)
selecting with a list of 1k indexes, covering almost every split, is made ~3.5x faster (from 3s to 850ms) by using IN over many OR. Execution time and query plan are not affected by IN over a single = when searching with a single index

@trinity-1686a trinity-1686a merged commit a65e269 into main Mar 5, 2024
4 checks passed
@trinity-1686a trinity-1686a deleted the trinity/postgres-optim branch March 5, 2024 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

when listing splits from multiple index in postgres metastore, use IN instead of many OR
2 participants