New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AVX-512 support to Hamming and Jaccard distance functions. #519
Add AVX-512 support to Hamming and Jaccard distance functions. #519
Conversation
Nice! Here is some benchmarking on this patch using a r6i.16xlarge, gcc 11, using the jaccard ops on the dbpedia-openai-1000k-angular dataset, binary quantized:
|
Some further tests, using an r6i.16xlarge, gcc 11, using hamming ops on the dbpedia-openai-1000k-angular dataset, binary quantized, m=16 and ef_construction=512 and the https://github.com/pgvector/pgvector/tree/hamming-performance-test branch:
|
Awesome, thanks @nathan-bossart and @jkatz! Added CPU dispatching for this in the bit-dispatch branch (thanks to your recent work on Postgres). Let me know what you think. |
You bit-dispatch branch looks pretty solid to me. A couple small notes:
We use this to assume the presence of
Commit postgres/postgres@02a6a54 suggests that
Again, if you're only interested in newer compilers, I'd bet that assuming the presence of these intrinsics with those macros is sufficient, but it might not be on older systems. Presumably this is what the TODO is referring to. You might also want to do an |
Thanks @nathan-bossart, this is really helpful. For the AVX check, should it use |
I think you are right about that. That bit seems to indicate that both the OS and the processor supports XGETBV, not just the processor. |
I think the most recent version of the bit-dispatch branch should address all of the issues you mentioned above. Edit: Besides the |
LGTM |
Co-authored-by: Nathan Bossart <nathan@postgresql.org> Co-authored-by: "Jonathan S. Katz" <jkatz@users.noreply.github.com>
Great, thanks for driving this! Merged in the commit above. |
Thanks for merging! |
These distance functions are a natural fit for AVX-512 instructions. I'm seeing a decent speedup on top of the ongoing work to process these 64-bits at a time.
On v17 (which uses AVX-512 for
pg_popcount()
when possible), with ~100k randomly generated 2000-bit vectors, maintenance_work_mem = '8GB', and max_parallel_maintenance_workers = 1, I am seeing the following results:At commit
d3c49f1b7d
:With this patch:
I am quite skeptical that I've set up the attributes correctly, but this seems to be enough to get it working on my machine for benchmark purposes. If we want to proceed with these changes, I can spend more time on that.