Run Atlas Vector Search under various conditions to assess performance.
Configurable parameters include:
- Filtering
- numCandidates
- Limit
- Request concurrency
Future Improvements:
- Factor out test cases into a config file that can be passed to run.py via a CLI
- 100M vector performance testing
All test results provided in the Atlas Vector Search Benchmark were produced using the scripts titled run_amazon_ecommerce_voyage_15m.py and run_amazon_ecommerce_voyage_multidim.py.
These run scripts use this dataset embedded with voyage-3-large assessing scalar and binary quantized indexes produced using save_voyage_embeddings. The multidimensional script issues queries against an index with 4 different dimensionalities of that embedding model, produced by building a view that slices the original embeddings (detailed here).
Original run.py script use sphere dataset.
Cohere run script uses cohere wikipedia dataset.
Jina/amazon script uses this dataset embedded with jina-embeddings-v3(https://huggingface.co/jinaai/jina-embeddings-v3) and an index that is binary quantized, using saved exact results at 1M and 17M vectors. This also uses a range filter instead of a point filter as the sphere dataset tests used.