Losha is a framework focusing on Distributed Similarity Search


  • simhash
  • e2lsh
  • gqr + pca hashing


Build and Run

We assume you have set up a NFS director (denoeted as /data) that every machine can access it.

Make runnable binary

$ cd /data
$ git clone --recursive
$ cd losha
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..  
$ make help                     
$ make -j4 Master
$ make -j4 e2lsh 

Prepare dataset and groundtruth

$ cd ../script
$ python
$ hadoop dfs -mkdir /losha /losha/audio
$ hadoop dfs -Ddfs.blocksize=8388608 -put audio_base.idfvecs /losha/audio
$ hadoop dfs -Ddfs.blocksize=8388608 -put audio_query.idfvecs /losha/audio
$ cd ../gqr/ && mkdir build && cd ./script
$ sh

Run in single machine (assume the hostname is master, Master run in one terminal and e2lsh runs in another terminal)

$ cd ../../script
$ ../build/husky/Master --conf ../conf/e2lsh.conf
$ // open another shell to run e2lsh
$ ../build/e2lsh --conf ../conf/e2lsh.conf
$ sh

Run in multiple machines (assume 2 worker machines, with hostnames as worker1 and worker2)

$ ../build/husky/Master --conf ../conf/e2lsh-slaves.conf
$ ./ ../build/e2lsh --conf ../conf/e2lsh-slaves.conf
$ sh


- Master and e2lsh run on two different shells
- Always remove output files on HDFS before next try.
- We suggest to set up GQR and Husky before using LoSHa. 


LoSHa: A General Framework for Scalable Locality Sensitive Hashing (SIGIR 2017)