Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Indexing 1M vectors
This page gives some advice and results on datasets of around 1M vectors.
When the dataset is around 1m vectors, the exhaustive index becomes too slow, so a good alternative is
IndexIVFFlat. It still returns exact distances but occasionally misses a neighbor because it is non-exhaustive.
Below are a few experiments on datasets of size 1M with different indexing methods. We focus on the tradeoff between:
- speed, measured on a "Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz" with 20 threads enabled. It is reported in ms in batch mode (ie. all query vectors are handled simultaneously).
- accuracy. We use the //1-NN recall @R// measure with R=1, 10, or 100. This is the fraction of query vectors where the actual nearest-neighbor is ranked in the R first results.
WARNING: This is an optimistic setting, running with a single thread is about 10x slower and providing queries one by one is another 2-3x slower (see the timings for sift_1M below)
The index types that are tested are referred to by their index key, as created by the
For all indexes, we request 100 results per query.
CNN activation maps
We use 4096D descriptors extracted from 1M images, and the same from query images. The 4096D descriptors are then reduced to 256D by PCA (beforehand, the cost of the PCA transformation is not reflected in the results). The most appropriate indexes for this are:
So an IVF is better for high accuracy regimes, and IMI for lower regimes.
NOTE: IVF16384 gives slightly better results than IVF4096, but the later is significantly cheaper to do
add's (this cost is not reflected in the measurements).
If memory is a concern, then compressed vectors can be used (with
IVF4096,Flat is plotted only for reference.
OPQ32_128,IVF4096,PQ32 uses size of the code + id = 32 + 8 = 40 bytes per vector. This makes is total of 40MB total for the database, but there are overheads due to precomputed tables, geometric array allocation, etc.
We also use resnet-34 descriptors trained on Imagenet and computed on the 1M first images of Flickr100M, from which we remove the two upper layers to get 512D activation maps. The behavior is similar:
This is a common benchmark used in research papers. It consists of SIFT descriptors (128D) extracted from image patches.
The comments are about the same as for the CNN data.
Here we also evaluated the performance in relation with threading and batching:
nt1: batched, single-thread
nobatch/nobatch_nt1: queries are submitted 1 by 1
parallelqueries: queries are submitted in parallel 1 by 1, and processed by a non-batch loop.
It appears that a 1-thread run is 5x to 12x slower. To submit queries one at a time, threading is not useful.
NOTE: in this plot we report the speed in QPS (queries per second) and the x-axis is the 10-precision at 10 measure (or intersection measure). This is for easy comparison with nmslib, which is the best library on this benchmark.
A comparison with the benchmarks above is not accurate because the machines are not the same. A direct comparison with nmslib shows that nmslib is faster, but uses significantly more memory. For Faiss, the build time is sub-linear and memory usage is linear.
|search time||1-R@1||index size||index build time|
|Flat-CPU||9.100 s||1.0000||512 MB||0 s|
|nmslib (hnsw)||0.081 s||0.8195||512 + 796 MB||173 s|
|IVF16384,Flat||0.538 s||0.8980||512 + 8 MB||240 s|
|IVF16384,Flat (Titan X)||0.059 s||0.8145||512 + 8 MB||5 s|
|Flat-GPU (Titan X)||0.753 s||0.9935||512 MB||0 s|
The first row is for exact search with Faiss. The two last results are with a GPU (Titan X). The Flat indexes are brute force indexes that return exact results (up to ties and floating-point precision issues).
This is used as a benchmark by Annoy. The performance measure is different: intersection of the found 10-NN with the GT 10-NN.
The IVFPQ followed by a verification improves the results slightly.
We use data sampled uniformly on a unit sphere in 128D. This is the hardest data to index because there is no regularity whatsoever that the indexing method can exploit.
The measured performance is recall @ 100, otherwise it would be too low. Here, for most operating points, doing a brute-force computation is the best we can do.
There are several uses of HNSW as an indexing method in FAISS:
the normal HNSW that operates on full vectors
operate on quantized vectors (SQ)
as a quantizer for an IVF
as an assignment index for kmeans
The various use cases are evaluated with
benchs/bench_hnsw.py on SIFT1M. The output looks like (with 20 threads):
testing HNSW Flat efSearch 16 0.011 ms per query, R@1 0.8740 efSearch 32 0.020 ms per query, R@1 0.9492 efSearch 64 0.033 ms per query, R@1 0.9779 efSearch 128 0.059 ms per query, R@1 0.9887 efSearch 256 0.104 ms per query, R@1 0.9920 testing HNSW with a scalar quantizer efSearch 16 0.005 ms per query, R@1 0.7281 efSearch 32 0.008 ms per query, R@1 0.8506 efSearch 64 0.011 ms per query, R@1 0.9242 efSearch 128 0.020 ms per query, R@1 0.9566 efSearch 256 0.039 ms per query, R@1 0.9716 testing IVF Flat (baseline) nprobe 1 0.076 ms per query, R@1 0.4085 nprobe 4 0.067 ms per query, R@1 0.6331 nprobe 16 0.078 ms per query, R@1 0.8263 nprobe 64 0.141 ms per query, R@1 0.9470 nprobe 256 0.344 ms per query, R@1 0.9861 testing IVF Flat with HNSW quantizer nprobe 1 0.007 ms per query, R@1 0.4058 nprobe 4 0.010 ms per query, R@1 0.6305 nprobe 16 0.021 ms per query, R@1 0.8247 nprobe 64 0.063 ms per query, R@1 0.9462 nprobe 256 0.220 ms per query, R@1 0.9842
The comparisons show:
HNSW obtains much better speed / precision operating points than IVFFlat (eg. 0.020 ms vs. 0.140 ms to get > 0.9 recall at 1), at a higher memory cost
HNSW with a scalar quantizer is better than the classical HNSW (note that that SIFT1M is originally encoded in bytes)
using HNSW as a quantizer on top of a memory-efficient IVF improves the search speed (the performance is similar to an IMI quantizer, but clusters are more balanced)
Tests on kmeans clustering:
performing kmeans on the sift1M vectors (baseline) Clustering 1000000 points in 128D to 16384 clusters, redo 1 times, 10 iterations Preprocessing in 0.17 s Iteration 9 (612.53 s, search 612.18 s): objective=3.85228e+10 imbalance=1.235 nsplit=0 performing kmeans on the sift1M using HNSW assignment Clustering 1000000 points in 128D to 16384 clusters, redo 1 times, 10 iterations Preprocessing in 0.17 s Iteration 9 (74.63 s, search 73.46 s): objective=3.85232e+10 imbalance=1.234 nsplit=0
ie. the clustering is 8.2x times faster than with an exhaustive assignment, without impacting the objective value or increasing the number of empty clusters (nsplit) or imbalance factor, both are signs of unhealthy clusters.