-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I help get tinyint or half branches released? #326
Comments
Hi @nathanwilk7, the most helpful thing would be sharing specific use cases for either data type (how the vectors are generated and what specifically they're used for). These may be included in 0.6.0, but I'd like to understand how common they will be. |
Sure, I'll add my use cases here. For context, we're doing chem/bio ML type work. Thanks for your thoughts on the below. Our smaller datasets have about 300 million (300M) molecules in them. For those molecules there are a few different types of vectors we'd like to generate. Some of these vectors are generated via cheminformatics methods (basically a molecular hash function with certain similarity properties) and others are generated via embeddings from various ML models.
For some higher level context, I'm currently running Postgres via GCP Cloud SQL to store our other molecular data and it would be nice to be able to integrate the molecular fingerprints/counts and embeddings into Postgres as well instead of needing to bring in another ANN lib/service (e.g.: faiss, pinecone, etc). I ran some rough numbers on storage cost and found that using the current pgvector, I estimate a (very hand wavy back of the envelope) storage cost of about $2k-$3k/yr for each 300M molecule fingerprint count vectors I store. Cutting that storage cost down by 50% or 75% would make it a lot easier to move forward, especially since I'll want to store multiple embedding vectors for different models. The general user stories we have are:
Lastly, in a perfect world I'd like to load up these representations for 36 billion molecules (Enamine REAL) into a vector database and run the same types of searches as above. Happy to discuss these use cases more and provide more context of course. Please let me know if there's anything I can do to push efforts forward on smaller data sizes, product quantization, or support for sparse/bit vectors. |
Thanks @nathanwilk7, this is really great context (and a great explanation)! I really appreciate it. I agree that sparse vectors are probably a better option for 1. I'd like to see what other use cases people have before deciding to add. |
Essentially we would have the very same use case as @nathanwilk7 and would see great benefit to have them represented in pgvector. Very excited to see this! |
I agree that adding sparse vector support would be an amazing feature. I currently store 300k vectors with 5k dimensions, but most of the entries are 0, so storing them would save a bunch of space |
In many other embedding databases, balancing precision and accuracy is often a valid option. My example is that we have a database of around 1TB when we build it with Postgres. Compared to, for instance, faiss it's about 3x bloat - as postgre implementation has an overhead on how data is persisted (probably for a good reason). However, for some customers us having /2 disk space reduction would be "a go". We confirmed that using fp16 doesn't affect our accuracy as much and would be a great option to have |
I see some acceptance of the potential accuracy loss given the benefits. So maybe if we reduce that loss even further, we can draw even more attention to this feature soon. I don't see mention of brain floats or other representations lower than single precision. Was there any investigation on those done yet? I mention that because much of the precision is lost not when the mantissa is reduced, but when the exponent is. So going from single precision floats to the standard ieee half floats is kinda painful. But a representation like brain float would keep the exponent at the same size as it were in the original single precision. It's also available in most current compilers, and should be vectorizable from SSE through AVX512BF16. |
There's now a new halfvec branch for this for anyone who wants to try it (in a non-production environment). CREATE TABLE items (id bigserial PRIMARY KEY, embedding halfvec(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
CREATE INDEX ON items USING hnsw (embedding halfvec_l2_ops);
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5; @nathanwilk7 There's also a bitvector branch that could be good for bit fingerprints (if you have the bandwidth to try it out, it'd be super helpful). CREATE TABLE items (id bigserial PRIMARY KEY, fingerprint bit(3));
INSERT INTO items (fingerprint) VALUES (B'000'), (B'111');
CREATE INDEX ON items USING hnsw (fingerprint bit_jaccard_ops);
SELECT * FROM items ORDER BY fingerprint <%> B'101' LIMIT 5; @tureba From what I've seen, bfloat16 isn't common for nearest neighbor search. |
Very nice, @ankane. Thanks for sharing. I'll invest some time over the next couple of days checking the code out. But from what I see so far, there is a new data type Have you experimented indexing regular 32b |
Yeah, think that could be a common use case. Pushed an update for casting between CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
CREATE INDEX ON items USING hnsw ((embedding::halfvec(3)) halfvec_l2_ops);
-- no re-ranking
SELECT id FROM items ORDER BY embedding::halfvec(3) <-> '[1,2,3]' LIMIT 5;
-- re-ranking
SELECT id FROM (
SELECT * FROM items ORDER BY embedding::halfvec(3) <-> '[1,2,3]' LIMIT 20
) ORDER BY embedding <-> '[1,2,3]' LIMIT 5; May also add |
Per #326 (comment) and #326 (comment) -- I've been testing this exact case using ANN Benchmark with some modifications to support the
A handful of my For my test machine, I used a r7gd.16xlarge (64 vCPU, 512GiB RAM) and stored my data on the local disk to eliminate network latency. PostgreSQL settings of note:
Below are a sampling of test results. Note that the ANN Benchmark test only tests a single query at a time, so this is not a test of concurrency. Below I review index size, build time, recall, and query throughput. Dataset:
|
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 793 | 544 |
Index build time (s) | 33 | 31 |
Recall @ ef_search=10 | 0.711 | 0.710 |
QPS @ ef_search=10 | 2130 | 2349 |
Recall @ ef_search=40 | 0.908 | 0.907 |
QPS @ ef_search=40 | 1090 | 1190 |
Recall @ ef_search=200 | 0.989 | 0.989 |
QPS @ ef_search=200 | 297 | 322 |
m=16; ef_construction=64
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 782 | 538 |
Index build time (s) | 37 | 34 |
Recall @ ef_search=10 | 0.751 | 0.751 |
QPS @ ef_search=10 | 2180 | 2178 |
Recall @ ef_search=40 | 0.937 | 0.937 |
QPS @ ef_search=40 | 1058 | 1119 |
Recall @ ef_search=200 | 0.995 | 0.995 |
QPS @ ef_search=200 | 281 | 315 |
m=16; ef_construction=128
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 782 | 538 |
Index build time (s) | 44 | 40 |
Recall @ ef_search=10 | 0.770 | 0.770 |
QPS @ ef_search=10 | 2096 | 2241 |
Recall @ ef_search=40 | 0.949 | 0.949 |
QPS @ ef_search=40 | 1049 | 1110 |
Recall @ ef_search=200 | 0.997 | 0.997 |
QPS @ ef_search=200 | 279 | 297 |
m=16; ef_construction=256
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 782 | 538 |
Index build time (s) | 58 | 51 |
Recall @ ef_search=10 | 0.776 | 0.776 |
QPS @ ef_search=10 | 2099 | 2284 |
Recall @ ef_search=40 | 0.954 | 0.954 |
QPS @ ef_search=40 | 1020 | 1140 |
Recall @ ef_search=200 | 0.998 | 0.998 |
QPS @ ef_search=200 | 268 | 302 |
m=16; ef_construction=512
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7811 | 538 |
Index build time (s) | 88 | 75 |
Recall @ ef_search=10 | 0.776 | 0.775 |
QPS @ ef_search=10 | 1988 | 2118 |
Recall @ ef_search=40 | 0.956 | 0.956 |
QPS @ ef_search=40 | 983 | 1053 |
Recall @ ef_search=200 | 0.998 | 0.998 |
QPS @ ef_search=200 | 261 | 279 |
Dataset: gist-960-euclidean
m=16; ef_construction=32
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7811 | 2603 |
Index build time (s) | 145 | 68 |
Recall @ ef_search=10 | 0.430 | 0.427 |
QPS @ ef_search=10 | 1160 | 1184 |
Recall @ ef_search=40 | 0.687 | 0.684 |
QPS @ ef_search=40 | 532 | 581 |
Recall @ ef_search=200 | 0.905 | 0.903 |
QPS @ ef_search=200 | 141 | 163 |
m=16; ef_construction=64
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7684 | 2561 |
Index build time (s) | 161 | 77 |
Recall @ ef_search=10 | 0.476 | 0.476 |
QPS @ ef_search=10 | 1178 | 1223 |
Recall @ ef_search=40 | 0.740 | 0.742 |
QPS @ ef_search=40 | 541 | 592 |
Recall @ ef_search=200 | 0.939 | 0.933 |
QPS @ ef_search=200 | 143 | 160 |
m=16; ef_construction=128
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7680 | 2560 |
Index build time (s) | 190 | 101 |
Recall @ ef_search=10 | 0.497 | 0.502 |
QPS @ ef_search=10 | 1177 | 1177 |
Recall @ ef_search=40 | 0.771 | 0.770 |
QPS @ ef_search=40 | 535 | 568 |
Recall @ ef_search=200 | 0.952 | 0.953 |
QPS @ ef_search=200 | 141 | 154 |
m=16; ef_construction=256
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7678 | 2559 |
Index build time (s) | 247 | 147 |
Recall @ ef_search=10 | 0.505 | 0.499 |
QPS @ ef_search=10 | 1114 | 1187 |
Recall @ ef_search=40 | 0.780 | 0.784 |
QPS @ ef_search=40 | 513 | 570 |
Recall @ ef_search=200 | 0.960 | 0.960 |
QPS @ ef_search=200 | 135 | 152 |
m=16; ef_construction=512
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7678 | 2559 |
Index build time (s) | 349 | 229 |
Recall @ ef_search=10 | 0.508 | 0.508 |
QPS @ ef_search=10 | 1105 | 1149 |
Recall @ ef_search=40 | 0.789 | 0.785 |
QPS @ ef_search=40 | 503 | 551 |
Recall @ ef_search=200 | 0.968 | 0.967 |
QPS @ ef_search=200 | 134 | 145 |
Analysis
Overall, if the vector
/halfvec
takes less time to build and shrinks index size 3x while maintaining comparable recall and have a 10% throughput improvement, that seems to be a winner. I'd like to see the full comparison with dbpedia-openai-1000k-angular (should have that tomorrow), but generally these results are positive for a fp32 => fp16 scalar quantization.
Using the methodology in #326 (comment) below are teh results for Top posting the analysis, there are similar patterns to the above -- roughly comparable recall, but Dataset:
|
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7734 | 3867 |
Index build time (s) | 244 | 77 |
Recall @ ef_search=10 | 0.762 | 0.761 |
QPS @ ef_search=10 | 1258 | 1245 |
Recall @ ef_search=40 | 0.913 | 0.911 |
QPS @ ef_search=40 | 649 | 655 |
Recall @ ef_search=200 | 0.973 | 0.973 |
QPS @ ef_search=200 | 207 | 214 |
m=16; ef_construction=64
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7734 | 3867 |
Index build time (s) | 264 | 90 |
Recall @ ef_search=10 | 0.819 | 0.809 |
QPS @ ef_search=10 | 1231 | 1219 |
Recall @ ef_search=40 | 0.945 | 0.945 |
QPS @ ef_search=40 | 627 | 642 |
Recall @ ef_search=200 | 0.987 | 0.987 |
QPS @ ef_search=200 | 191 | 190 |
m=16; ef_construction=128
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7734 | 3867 |
Index build time (s) | 301 | 115 |
Recall @ ef_search=10 | 0.843 | 0.844 |
QPS @ ef_search=10 | 1199 | 1152 |
Recall @ ef_search=40 | 0.962 | 0.962 |
QPS @ ef_search=40 | 604 | 591 |
Recall @ ef_search=200 | 0.993 | 0.993 |
QPS @ ef_search=200 | 165 | 171 |
m=16; ef_construction=256
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7734 | 3867 |
Index build time (s) | 377 | 163 |
Recall @ ef_search=10 | 0.851 | 0.852 |
QPS @ ef_search=10 | 1162 | 1162 |
Recall @ ef_search=40 | 0.968 | 0.968 |
QPS @ ef_search=40 | 567 | 578 |
Recall @ ef_search=200 | 0.996 | 0.996 |
QPS @ ef_search=200 | 156 | 163 |
m=16; ef_construction=512
vector /vector |
vector /halfvec |
|
---|---|---|
Index size (MB) | 7734 | 3867 |
Index build time (s) | 508 | 254 |
Recall @ ef_search=10 | 0.853 | 0.851 |
QPS @ ef_search=10 | 1118 | 1079 |
Recall @ ef_search=40 | 0.971 | 0.971 |
QPS @ ef_search=40 | 530 | 540 |
Recall @ ef_search=200 | 0.997 | 0.997 |
QPS @ ef_search=200 | 144 | 149 |
Nice results. Those numbers are roughly what I expected to see, based in past experiments. Though I admit I'm having trouble running ann-benchmarks as fluidly as you are. |
Just fyi for anyone testing on x86-64: some key |
Hi @ankane and @jkatz. I've tested HNSW index build time on ARM (graviton 3) using the same workflow as #409 (comment) Unfortunately, I get only a marginal performance difference for halfvec compared to vector for serial build. Though the performance difference for many cores is still here. I've looked at the halfvec code and saw that product calculation functions for halfvec don't perform better than those for vector. It seems we have some potential for speedup but as of now it's used only for x86. Did I possibly miss something? |
Hey @pashkinelfe, thanks for testing/sharing. Which commit hash are you using? |
Hi @ankane ,
Build string: |
Is there more to do on the
tinyint
orhalf
branches to get them released or are they ready to be put into0.5.1
? If there is more to do for them, let me know and I'll see if it's something I could take care of (e.g.: code, docs, tests, etc).The text was updated successfully, but these errors were encountered: