New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue with Large Tables and HNSW Indexes #455
Comments
Hi @williamjeong2, can you paste the output of |
Hi @ankane, here is the output(768 dim):
|
It looks like the buffer hit rate is pretty low, so a lot of reads are happening from disk. I suspect you'll see better performance with an SSD, especially since HNSW does a lot of random access. I don't think partitioning will help in this situation (unless you're filtering by the partition key). You could also try prewarming the index with |
@ankane |
Awesome, sounds good. |
I measured QPS with just moving the database to SSD (each line is a different table, with more data stored as you go down. The top row has 10 million rows and the last row has 20 million rows, with more rows as you go down):
And this is the output of Table that had slow QPS.
Table that had fast QPS.
Things are much better than before. But It looks like only the fastest tables are in the buffer. There is still a 'pg_prewarm' task left. However, in my case, my largest table is 100G excluding indexes(80G), so I don't expect it to work well. But I will give it a try. |
@williamjeong2 Considering you have 20Gb HNSW index, it is likely that pg_prewarm will help. Otherwise 'natural' warm-up could take a long time. To check, try running your test for around 1000 sec and see how TPS evolves. Or just use pg_prewarm ) |
@ankane @pashkinelfe Your suggestions were very helpful. This is the result of using 'pg_prewarm'. In the results below, the top 2 were using 'pg_prewarm' to warm up the indexes. It didn't take long.
To summarize, I can say that using 'pg_prewarm' has performed the best in my case so far. Of course, in my case, especially since I have a very large index size (400 GB for the vector indexes of all tables combined), I think it will require a very large memory. Nevertheless, I think it should be possible to achieve faster TPS than what I have now, because I have seen very fast QPS results in blog posts, including supabase. Other control variables such as vector length, postgresql configuration, etc. remain to be seen. |
Great, thanks for sharing @williamjeong2! |
Hello,
I'm currently facing performance challenges with pgvector on PostgreSQL, particularly with large tables and queries taking significant time to execute. I'd like to share my situation and seek advice on potential optimizations or configurations that could improve performance.
Environment & Configuration:
shared_buffers
is set to 80GB, andeffective_cache_size
is set to 120GB.Issues & Observations:
Given the above configuration and the challenges faced, I have a few questions:
I understand that the hardware might be a limiting factor, but any guidance on optimizing PostgreSQL or pgvector settings to mitigate some of these performance issues would be greatly appreciated.
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered: