GitHub - ipsarros/DolphinnPy: High-dimensional approximate nearest neighbor in python

DolphinnPy

Python 2.7

Numpy is required: numpy.org for instance: pip install numpy

DolphinnPy provides with a simple, yet efficient method for the problem of computing an (approximate) nearest neighbor in high dimensions. The algorithm is based on https://arxiv.org/abs/1612.07405, where we show linear space and sublinear query for a specific setting of parameters.

First, N points are randomly mapped to keys in {0,1}^K, for K<=logN, by making use of the Hypeplane LSH family. Then, for a given query, candidate nearest neighbors are the ones within a small hamming radius with respect to their keys. Our approach resembles the multi-probe LSH approach but it differs on how the list of candidates is computed.

Files:

main.py: reads files, builds data structure, executes queries. dolphinn.py: data structure constructor, queries method. utils.py: various useful functions. bruteforce.py: linear scan for validation purposes.

Hardcoded parameters (in main.py):

K: new dimension - key bit length. num_of_probes: how many buckets are allowed to be visited. M: how many candidate points are allowed to be examined.

Dataset, queryset files paths are in the script: in fvecs format. Requires input from http://corpus-texmex.irisa.fr/

How to run: python main.py

Preprocesses dataset, then runs Dolphinn and brute-force search on all queries. Prints K, preprocessing and average-query times. Prints multiplicative approximation, number of exact answers.

Some tasks:

Fix K, change num_of_probes and M: try to increase number of exact answers/decrease multiplicative approximation.
Fix num_of_probes and M, change K: try to increase number of exact answers/decrease multiplicative approximation.
After reading the files, the script calls an isotropize function for both sets. Run the script after commenting out these two lines.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
bruteforce.py		bruteforce.py
dolphinn.py		dolphinn.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

bruteforce.py

bruteforce.py

dolphinn.py

dolphinn.py

main.py

main.py

utils.py

utils.py

Repository files navigation

About

Releases

Packages

Languages

License

ipsarros/DolphinnPy

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages