SimpleANNS

Naive implementations of some ANN (Approximate Nearest Neighbor) search algorithms without any optimization and generalization.

Progess

Algorithms

The algorithms used in ANN search can generally be classified as tree based, LHS based, graph based and IVF based, where graph and IVF based algorithms are more popular today but tree and LHS based algorithms are no longer commonly used. Moreover, quantization can be used to compress data and reduce memory usage.

[CVPR20 Tutorial] Billion-scale Approximate Nearest Neighbor Search is a good tutorial for ANN beginners.

ann-benchmarks contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics.

Tree Based

Based on the idea of partitioning vector space, performs poorly in high demensional vector space due to the curse of dimensionality.

FLANN
- automatically select "Randomized KD Tree" or "k-means Tree"
- take any given dataset and desired degree of precision and use these to automatically determine the best algorithm and parameter values.
- Paper: FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION
Annoy
- is based on random projection forest, uses multi-tree with a shared priority queue
- Blog: Nearest neighbors and vector models – part 2 – algorithms and data structures

LHS (Locality Sensitive Hashing) Based

Instead of collision avoidance, the general idea of hashing, the idea of LSH is to expolit collisions for mapping points which are nearby into the same bucket. It is popular in theory area, but performs poorly in practice with real-world data.

FALCONN

Graph Based

Popular in recent years, mostly based on the idea of proximity graph. Given a query, start from a source point (randomly chosen or supplied by a separate algorithm), greedily find the closest point to the query.

Blog: Proximity Graph-based Approximate Nearest Neighbor Search

Delaunay Triangulations , which are proximity graphs, are defined as the dual graph of the Voronoi diagram, and many graph based algorithms (like kNN graphs and NSW graphs) are approximation of DT.
- Blog: [Tutorial] Voronoi Diagram and Delaunay Triangulation in O(n log n) with Fortune's Algorithm
- Blog: Delaunay triangulation and Voronoi diagram
NSW
- has small world nagigation properties which has polylog time complexity of insertion and seraching.
- Paper: Approximate nearest neighbor algorithm based on navigable small world graphs
HNSW
- incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements
- can be seen as an extension of the probabilistic skip list structure with proximity graphs instead of the linked lists
- Paper: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
EFANNA
- Paper: EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
NSG
- Paper: Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
SSG
- Paper: High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility
Vamana (DiskANN)
- optimizes for disk IO, and makes previous NSG method SSD-friendly.
- Paper: DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
- Blog: DiskANN: A Disk-based ANNS Solution with High Recall and High QPS on Billion-scale Dataset
SPTAG
- Paper: Query-driven iterated neighborhood graph search for large scale indexing
NGT
- is based on navgating spread-out graph and is being used in Taobao now.
- Paper: Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity

Inverted Index Based

This algorithm clusters the dataset to partition the space to Voronoi Cells, and by searching only neighbor cells reduces the data points needed to search.

IVF-HNSW
- Paper: Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors

Quantization

PQ
- The essence of PQ is to decompose the high-dimensional vector space into the Cartesian product of subspaces and then quantize these subspaces separately
- Faiss
  - is based on PQ + IVF.
- Tutorial: https://rutgers-db.github.io/cs541-fall19/slides/notes5.pdf
- Paper: Product Quantization for Nearest Neighbor Search
- Survey: A Survey of Product Quantization
ScaNN
- Paper: Accelerating Large-Scale Inference with Anisotropic Vector Quantization

Datasets

SIFT

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
src		src
test		test
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimpleANNS

Progess

Algorithms

Tree Based

LHS (Locality Sensitive Hashing) Based

Graph Based

Inverted Index Based

Quantization

Datasets

About

Releases

Packages

Languages

hhy3/SimpleANNS

Folders and files

Latest commit

History

Repository files navigation

SimpleANNS

Progess

Algorithms

Tree Based

LHS (Locality Sensitive Hashing) Based

Graph Based

Inverted Index Based

Quantization

Datasets

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages