Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I help get tinyint or half branches released? #326

Closed
nathanwilk7 opened this issue Oct 29, 2023 · 19 comments
Closed

Can I help get tinyint or half branches released? #326

nathanwilk7 opened this issue Oct 29, 2023 · 19 comments

Comments

@nathanwilk7
Copy link

Is there more to do on the tinyint or half branches to get them released or are they ready to be put into 0.5.1? If there is more to do for them, let me know and I'll see if it's something I could take care of (e.g.: code, docs, tests, etc).

@nathanwilk7 nathanwilk7 changed the title Can I Help Get tinyint or half branches released? Can I help get tinyint or half branches released? Oct 29, 2023
@ankane
Copy link
Member

ankane commented Oct 29, 2023

Hi @nathanwilk7, the most helpful thing would be sharing specific use cases for either data type (how the vectors are generated and what specifically they're used for). These may be included in 0.6.0, but I'd like to understand how common they will be.

@nathanwilk7
Copy link
Author

nathanwilk7 commented Oct 31, 2023

Sure, I'll add my use cases here. For context, we're doing chem/bio ML type work. Thanks for your thoughts on the below.

Our smaller datasets have about 300 million (300M) molecules in them. For those molecules there are a few different types of vectors we'd like to generate. Some of these vectors are generated via cheminformatics methods (basically a molecular hash function with certain similarity properties) and others are generated via embeddings from various ML models.

  1. 300M sparse "count" vectors w/ 1024-2048 dimensions (Morgan Count Fingerprints]. Most of the dimensions will be 0, but the non-zero ones will be integers between 0 and some small N (e.g.: 32 would likely be a reasonable upper bound, 256 would be extremely high). For these vectors, we'd like to be able to do ANN searches across them using the standard distance measures (inner product, cosine, L2). The goal here is to able to run searches to find the most similar molecules to a given molecule or molecule fragment. The tinyint type seems like it could be a good fit here, though supporting spare vectors would potentially be an even bigger win but using dense tinyint might be good enough for now.
  2. 300M embedding vectors w/ 128-1024 dimensions where each dimension is a non-zero decimal number (these are essentially the same as the standard embeddings everyone uses for various ML tasks). We would likely be ok giving up the precision of using half size floats or product quantization or any other similar technique.
  3. I'll also add that we'd like 300M sparse bit vectors w/ 1024-2048 dimensions (Morgan Bit Fingerprints]. For these vectors, we'd like to be able to do ANN searches across them using tanimoto/jaccard distance. I recognize these are probably not going to be supported as true ANN-searchable sparse bitvectors in pgvector anytime soon, so we can probably ignore this unless you think it's on the roadmap.

For some higher level context, I'm currently running Postgres via GCP Cloud SQL to store our other molecular data and it would be nice to be able to integrate the molecular fingerprints/counts and embeddings into Postgres as well instead of needing to bring in another ANN lib/service (e.g.: faiss, pinecone, etc). I ran some rough numbers on storage cost and found that using the current pgvector, I estimate a (very hand wavy back of the envelope) storage cost of about $2k-$3k/yr for each 300M molecule fingerprint count vectors I store. Cutting that storage cost down by 50% or 75% would make it a lot easier to move forward, especially since I'll want to store multiple embedding vectors for different models.

The general user stories we have are:

  1. Given a molecule which is a known "hit" (e.g.: binds to a protein of interest), find the most similar molecules to the hit so we can evaluate them as well.
  2. Given embeddings which are classified as hits by a certain model, find the orderable molecules (from vendors like Enamine) which are most similar to these hits.
  3. Count the number of hits/misses among the nearest N molecules to a given molecule.

Lastly, in a perfect world I'd like to load up these representations for 36 billion molecules (Enamine REAL) into a vector database and run the same types of searches as above.

Happy to discuss these use cases more and provide more context of course. Please let me know if there's anything I can do to push efforts forward on smaller data sizes, product quantization, or support for sparse/bit vectors.

@ankane
Copy link
Member

ankane commented Nov 1, 2023

Thanks @nathanwilk7, this is really great context (and a great explanation)! I really appreciate it.

I agree that sparse vectors are probably a better option for 1.

I'd like to see what other use cases people have before deciding to add.

@andreassteffen
Copy link

andreassteffen commented Nov 2, 2023

Essentially we would have the very same use case as @nathanwilk7 and would see great benefit to have them represented in pgvector. Very excited to see this!

@cile98
Copy link

cile98 commented Nov 15, 2023

I agree that adding sparse vector support would be an amazing feature. I currently store 300k vectors with 5k dimensions, but most of the entries are 0, so storing them would save a bunch of space

@lrotim
Copy link

lrotim commented Jan 11, 2024

In many other embedding databases, balancing precision and accuracy is often a valid option.

My example is that we have a database of around 1TB when we build it with Postgres. Compared to, for instance, faiss it's about 3x bloat - as postgre implementation has an overhead on how data is persisted (probably for a good reason). However, for some customers us having /2 disk space reduction would be "a go".

We confirmed that using fp16 doesn't affect our accuracy as much and would be a great option to have

@tureba
Copy link

tureba commented Feb 23, 2024

I see some acceptance of the potential accuracy loss given the benefits. So maybe if we reduce that loss even further, we can draw even more attention to this feature soon.

I don't see mention of brain floats or other representations lower than single precision. Was there any investigation on those done yet?

I mention that because much of the precision is lost not when the mantissa is reduced, but when the exponent is. So going from single precision floats to the standard ieee half floats is kinda painful. But a representation like brain float would keep the exponent at the same size as it were in the original single precision. It's also available in most current compilers, and should be vectorizable from SSE through AVX512BF16.

@ankane
Copy link
Member

ankane commented Mar 27, 2024

There's now a new halfvec branch for this for anyone who wants to try it (in a non-production environment).

CREATE TABLE items (id bigserial PRIMARY KEY, embedding halfvec(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
CREATE INDEX ON items USING hnsw (embedding halfvec_l2_ops);
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

@nathanwilk7 There's also a bitvector branch that could be good for bit fingerprints (if you have the bandwidth to try it out, it'd be super helpful).

CREATE TABLE items (id bigserial PRIMARY KEY, fingerprint bit(3));
INSERT INTO items (fingerprint) VALUES (B'000'), (B'111');
CREATE INDEX ON items USING hnsw (fingerprint bit_jaccard_ops);
SELECT * FROM items ORDER BY fingerprint <%> B'101' LIMIT 5;

@tureba From what I've seen, bfloat16 isn't common for nearest neighbor search.

@tureba
Copy link

tureba commented Mar 28, 2024

Very nice, @ankane. Thanks for sharing. I'll invest some time over the next couple of days checking the code out.

But from what I see so far, there is a new data type halfvec and hnsw is taught to index tables using it. So now you can index columns of 32b using indexes with 32b values, and 16b columns using indexes of 16b values.

Have you experimented indexing regular 32b vector data types using 16b halfvec only inside the index? Maybe an index amoption could be an easy way to control that index creation. I ask because in my testing, doing that was enough to bring the performance gains of smaller indexes and more index items per page, leading to faster convergence, with acceptable accuracy changes, without having the end user rewrite their tables entirely.

@ankane
Copy link
Member

ankane commented Mar 28, 2024

Yeah, think that could be a common use case. Pushed an update for casting between vector and halfvec (meant to include it earlier). This allows you to do:

CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
CREATE INDEX ON items USING hnsw ((embedding::halfvec(3)) halfvec_l2_ops);

-- no re-ranking
SELECT id FROM items ORDER BY embedding::halfvec(3) <-> '[1,2,3]' LIMIT 5;

-- re-ranking
SELECT id FROM (
    SELECT * FROM items ORDER BY embedding::halfvec(3) <-> '[1,2,3]' LIMIT 20
) ORDER BY embedding <-> '[1,2,3]' LIMIT 5;

May also add quantization="half" and quantization="binary" options to index creation at some point, but would like to start here.

@jkatz
Copy link
Contributor

jkatz commented Mar 30, 2024

Per #326 (comment) and #326 (comment) -- I've been testing this exact case using ANN Benchmark with some modifications to support the halfvec type. I've been running two tests:

  1. Storing a vector in the table, and building the index using a vector (fp32 => fp32); in the table I call this vector/vector
  2. Storing a vector in the table, and building the index using a halfvec (fp32 => fp16); in the table I call this vector/halfvec

A handful of my halfvec tests did not run, but thus far, the results are very promising.

For my test machine, I used a r7gd.16xlarge (64 vCPU, 512GiB RAM) and stored my data on the local disk to eliminate network latency. PostgreSQL settings of note:

Parameter Value
shared_buffers 128GB
max_parallel_maintenance_workers 63
maintenance_work_mem 64GB
effective_cache_size 256GB

Below are a sampling of test results. Note that the ANN Benchmark test only tests a single query at a time, so this is not a test of concurrency. Below I review index size, build time, recall, and query throughput.

Dataset: sift-128-euclidean

m=16; ef_construction=32

vector/vector vector/halfvec
Index size (MB) 793 544
Index build time (s) 33 31
Recall @ ef_search=10 0.711 0.710
QPS @ ef_search=10 2130 2349
Recall @ ef_search=40 0.908 0.907
QPS @ ef_search=40 1090 1190
Recall @ ef_search=200 0.989 0.989
QPS @ ef_search=200 297 322

m=16; ef_construction=64

vector/vector vector/halfvec
Index size (MB) 782 538
Index build time (s) 37 34
Recall @ ef_search=10 0.751 0.751
QPS @ ef_search=10 2180 2178
Recall @ ef_search=40 0.937 0.937
QPS @ ef_search=40 1058 1119
Recall @ ef_search=200 0.995 0.995
QPS @ ef_search=200 281 315

m=16; ef_construction=128

vector/vector vector/halfvec
Index size (MB) 782 538
Index build time (s) 44 40
Recall @ ef_search=10 0.770 0.770
QPS @ ef_search=10 2096 2241
Recall @ ef_search=40 0.949 0.949
QPS @ ef_search=40 1049 1110
Recall @ ef_search=200 0.997 0.997
QPS @ ef_search=200 279 297

m=16; ef_construction=256

vector/vector vector/halfvec
Index size (MB) 782 538
Index build time (s) 58 51
Recall @ ef_search=10 0.776 0.776
QPS @ ef_search=10 2099 2284
Recall @ ef_search=40 0.954 0.954
QPS @ ef_search=40 1020 1140
Recall @ ef_search=200 0.998 0.998
QPS @ ef_search=200 268 302

m=16; ef_construction=512

vector/vector vector/halfvec
Index size (MB) 7811 538
Index build time (s) 88 75
Recall @ ef_search=10 0.776 0.775
QPS @ ef_search=10 1988 2118
Recall @ ef_search=40 0.956 0.956
QPS @ ef_search=40 983 1053
Recall @ ef_search=200 0.998 0.998
QPS @ ef_search=200 261 279

Dataset: gist-960-euclidean

m=16; ef_construction=32

vector/vector vector/halfvec
Index size (MB) 7811 2603
Index build time (s) 145 68
Recall @ ef_search=10 0.430 0.427
QPS @ ef_search=10 1160 1184
Recall @ ef_search=40 0.687 0.684
QPS @ ef_search=40 532 581
Recall @ ef_search=200 0.905 0.903
QPS @ ef_search=200 141 163

m=16; ef_construction=64

vector/vector vector/halfvec
Index size (MB) 7684 2561
Index build time (s) 161 77
Recall @ ef_search=10 0.476 0.476
QPS @ ef_search=10 1178 1223
Recall @ ef_search=40 0.740 0.742
QPS @ ef_search=40 541 592
Recall @ ef_search=200 0.939 0.933
QPS @ ef_search=200 143 160

m=16; ef_construction=128

vector/vector vector/halfvec
Index size (MB) 7680 2560
Index build time (s) 190 101
Recall @ ef_search=10 0.497 0.502
QPS @ ef_search=10 1177 1177
Recall @ ef_search=40 0.771 0.770
QPS @ ef_search=40 535 568
Recall @ ef_search=200 0.952 0.953
QPS @ ef_search=200 141 154

m=16; ef_construction=256

vector/vector vector/halfvec
Index size (MB) 7678 2559
Index build time (s) 247 147
Recall @ ef_search=10 0.505 0.499
QPS @ ef_search=10 1114 1187
Recall @ ef_search=40 0.780 0.784
QPS @ ef_search=40 513 570
Recall @ ef_search=200 0.960 0.960
QPS @ ef_search=200 135 152

m=16; ef_construction=512

vector/vector vector/halfvec
Index size (MB) 7678 2559
Index build time (s) 349 229
Recall @ ef_search=10 0.508 0.508
QPS @ ef_search=10 1105 1149
Recall @ ef_search=40 0.789 0.785
QPS @ ef_search=40 503 551
Recall @ ef_search=200 0.968 0.967
QPS @ ef_search=200 134 145

Analysis

Overall, if the vector/halfvec takes less time to build and shrinks index size 3x while maintaining comparable recall and have a 10% throughput improvement, that seems to be a winner. I'd like to see the full comparison with dbpedia-openai-1000k-angular (should have that tomorrow), but generally these results are positive for a fp32 => fp16 scalar quantization.

@jkatz
Copy link
Contributor

jkatz commented Mar 30, 2024

Using the methodology in #326 (comment) below are teh results for dbpedia-openai-1000k-angular.

Top posting the analysis, there are similar patterns to the above -- roughly comparable recall, but vector/halfvec has 3x faster index builds and half the space utilization. This time, there's roughly comparable QPS, but that's likely because each index entry is taking up an entire page, even in fp16 quantized form.

Dataset: dbpedia-openai-1000k-angular

m=16; ef_construction=32

vector/vector vector/halfvec
Index size (MB) 7734 3867
Index build time (s) 244 77
Recall @ ef_search=10 0.762 0.761
QPS @ ef_search=10 1258 1245
Recall @ ef_search=40 0.913 0.911
QPS @ ef_search=40 649 655
Recall @ ef_search=200 0.973 0.973
QPS @ ef_search=200 207 214

m=16; ef_construction=64

vector/vector vector/halfvec
Index size (MB) 7734 3867
Index build time (s) 264 90
Recall @ ef_search=10 0.819 0.809
QPS @ ef_search=10 1231 1219
Recall @ ef_search=40 0.945 0.945
QPS @ ef_search=40 627 642
Recall @ ef_search=200 0.987 0.987
QPS @ ef_search=200 191 190

m=16; ef_construction=128

vector/vector vector/halfvec
Index size (MB) 7734 3867
Index build time (s) 301 115
Recall @ ef_search=10 0.843 0.844
QPS @ ef_search=10 1199 1152
Recall @ ef_search=40 0.962 0.962
QPS @ ef_search=40 604 591
Recall @ ef_search=200 0.993 0.993
QPS @ ef_search=200 165 171

m=16; ef_construction=256

vector/vector vector/halfvec
Index size (MB) 7734 3867
Index build time (s) 377 163
Recall @ ef_search=10 0.851 0.852
QPS @ ef_search=10 1162 1162
Recall @ ef_search=40 0.968 0.968
QPS @ ef_search=40 567 578
Recall @ ef_search=200 0.996 0.996
QPS @ ef_search=200 156 163

m=16; ef_construction=512

vector/vector vector/halfvec
Index size (MB) 7734 3867
Index build time (s) 508 254
Recall @ ef_search=10 0.853 0.851
QPS @ ef_search=10 1118 1079
Recall @ ef_search=40 0.971 0.971
QPS @ ef_search=40 530 540
Recall @ ef_search=200 0.997 0.997
QPS @ ef_search=200 144 149

@tureba
Copy link

tureba commented Apr 1, 2024

Nice results. Those numbers are roughly what I expected to see, based in past experiments. Though I admit I'm having trouble running ann-benchmarks as fluidly as you are.

@ankane ankane mentioned this issue Apr 1, 2024
@ankane
Copy link
Member

ankane commented Apr 2, 2024

Support for half vectors is now merged (32a502c) and will be included in 0.7.0 (#508) 🎉

@ankane ankane closed this as completed Apr 2, 2024
@ankane
Copy link
Member

ankane commented Apr 2, 2024

Support for sparse vectors (abac7a3) and bit vectors (94a444f) is merged as well

@ankane
Copy link
Member

ankane commented Apr 8, 2024

Just fyi for anyone testing on x86-64: some key halfvec functions now use intrinsics, so you should see much better performance 🚀

@pashkinelfe
Copy link
Contributor

pashkinelfe commented Apr 22, 2024

Hi @ankane and @jkatz. I've tested HNSW index build time on ARM (graviton 3) using the same workflow as #409 (comment)

Unfortunately, I get only a marginal performance difference for halfvec compared to vector for serial build. Though the performance difference for many cores is still here.
image

I've looked at the halfvec code and saw that product calculation functions for halfvec don't perform better than those for vector. It seems we have some potential for speedup but as of now it's used only for x86. Did I possibly miss something?

@ankane
Copy link
Member

ankane commented Apr 23, 2024

Hey @pashkinelfe, thanks for testing/sharing. Which commit hash are you using?

@pashkinelfe
Copy link
Contributor

pashkinelfe commented Apr 23, 2024

Hi @ankane ,
Commit 3df5655
PG16.2
in the previous comment results are for gcc11,
now I added measurements for GCC13
image

gcc -v
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.1.0-8ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (Ubuntu 13.1.0-8ubuntu1~22.04)

Build string:
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O3 -g3 -march=native -lstdc++ -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -fvisibility=hidden -I. -I./ -I/usr/local/pgsql/include/server -I/usr/local/pgsql/include/internal -D_GNU_SOURCE -c -o src/halfutils.o src/halfutils.c -MMD -MP -MF .deps/halfutils.Po

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

8 participants