Skip to content

An indexer leverages hnswlib for vector search and DocumentArrayMemmap for storing full Documents

Notifications You must be signed in to change notification settings

jina-ai/executor-U1MIndexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

U1MIndexer

An indexer that leverages hnswlib for vector search and DocumentArrayMemmap for storing full Documents. U1M means under one million Documents. It is a perfect indexer if you are working with indexers less than 1M and aims for query-time speed. When working with one million Documents, you can expect single query to be complete at 2ms and 600~800 QPS on batch queries. When working with more than 1M Documents, one has to be careful with its memory consumption. The indexing speed of U1MIndexer is much slower comparing to U100KIndexer.

U1MIndexer returns approximate nearest neighbours.

The code is based on HNSWSearcher

Pros & cons

Pros

  • Extremely fast query speed, irrelevant to the number of stored Documents.
  • Always return full Documents.

Cons

  • Indexing is relatively slow.
  • Extra dependencies: hnswlib and bidict.
  • Extra parameters to tune HNSW performance.

Performance

One can run benchmark.py to get a quick performance overview.

Stored data Indexing time Query size=1 Query size=8 Query size=64
10000 2.903 0.002 0.013 0.092
100000 65.330 0.002 0.014 0.104
500000 463.509 0.003 0.019 0.132
1000000 988.535 0.003 0.017 0.127

About

An indexer leverages hnswlib for vector search and DocumentArrayMemmap for storing full Documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published