Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vamana and Vamana-PQ (DiskANN) to ANN Benchmarks #230

Merged
merged 10 commits into from
Apr 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ env:
- LIBRARY=elasticsearch DATASET=random-xs-20-angular
- LIBRARY=elastiknn DATASET=random-xs-20-angular
- LIBRARY=opendistroknn DATASET=random-xs-20-angular
- LIBRARY=diskann DATASET=random-xs-20-angular

before_install:
- pip install -r requirements.txt
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Evaluated
* [ScaNN](https://github.com/google-research/google-research/tree/master/scann)
* [Elastiknn](https://github.com/alexklibisz/elastiknn)
* [OpenDistro Elasticsearch KNN](https://github.com/opendistro-for-elasticsearch/k-NN)
* [DiskANN](https://github.com/microsoft/diskann): Vamana, Vamana-PQ

Data sets
=========
Expand Down
132 changes: 132 additions & 0 deletions algos.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,72 @@ float:
args: []

euclidean:
vamana(diskann):
docker-tag: ann-benchmarks-diskann
module: ann_benchmarks.algorithms.diskann
constructor: Vamana
base-args : ["@metric"]
run-groups :
vamana_100_64_1-2:
args : [{'l_build': 100, 'max_outdegree': 64, 'alpha': 1.2}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_100_64_1-1:
args : [{'l_build': 100, 'max_outdegree': 64, 'alpha': 1.1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_100_64_1:
args : [{'l_build': 100, 'max_outdegree': 64, 'alpha': 1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_32_1-2:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_32_1-1:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_32_1:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana-pq(diskann):
docker-tag: ann-benchmarks-diskann_pq
module: ann_benchmarks.algorithms.diskann
constructor: VamanaPQ
base-args : ["@metric"]
run-groups :
vamana_pq_100_64_1-2_32:
args : [{'l_build': 100, 'max_outdegree': 64, 'alpha': 1.2, 'chunks': 32}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_100_64_1_32:
args : [{'l_build': 100, 'max_outdegree': 64, 'alpha': 1, 'chunks': 32}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_80_64_1-2_96:
args : [{'l_build': 80, 'max_outdegree': 64, 'alpha': 1.2, 'chunks': 96}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_80_64_1_96:
args : [{'l_build': 80, 'max_outdegree': 64, 'alpha': 1, 'chunks': 96}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_80_64_1-2_112:
args : [{'l_build': 80, 'max_outdegree': 64, 'alpha': 1.2, 'chunks': 112}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_80_64_1_112:
args : [{'l_build': 80, 'max_outdegree': 64, 'alpha': 1, 'chunks': 112}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1-2_32:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2, 'chunks': 32}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1_32:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1, 'chunks': 32}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1-2_96:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2, 'chunks': 96}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1_96:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1, 'chunks': 96}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1-2_112:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2, 'chunks': 112}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1_112:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1, 'chunks': 112}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
scann:
docker-tag: ann-benchmarks-scann
module: ann_benchmarks.algorithms.scann
Expand Down Expand Up @@ -533,6 +599,72 @@ float:
query-args: [[10, 20, 40, 80, 120, 200, 400, 600, 800]]

angular:
vamana(diskann):
docker-tag: ann-benchmarks-diskann
module: ann_benchmarks.algorithms.diskann
constructor: Vamana
base-args : ["@metric"]
run-groups :
vamana_125_64_1-2:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1.2}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_64_1-1:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1.1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_64_1:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_32_1-2:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_32_1-1:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_125_32_1:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana-pq(diskann):
docker-tag: ann-benchmarks-diskann_pq
module: ann_benchmarks.algorithms.diskann
constructor: VamanaPQ
base-args : ["@metric"]
run-groups :
vamana_pq_125_64_1-2_14:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1.2, 'chunks': 14}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_64_1_14:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1, 'chunks': 14}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_64_1-2_28:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1.2, 'chunks': 28}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_64_1_28:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1, 'chunks': 28}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_64_1-2_42:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1.2, 'chunks': 42}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_64_1_42:
args : [{'l_build': 125, 'max_outdegree': 64, 'alpha': 1, 'chunks': 42}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1-2_14:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2, 'chunks': 14}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1_14:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1, 'chunks': 14}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1-2_28:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2, 'chunks': 28}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1_28:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1, 'chunks': 28}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1-2_42:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1.2, 'chunks': 42}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
vamana_pq_125_32_1_42:
args : [{'l_build': 125, 'max_outdegree': 32, 'alpha': 1, 'chunks': 42}]
query-args : [[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]]
puffinn:
docker-tag: ann-benchmarks-puffinn
module: ann_benchmarks.algorithms.puffinn
Expand Down
190 changes: 190 additions & 0 deletions ann_benchmarks/algorithms/diskann.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
import sys
import os
import vamanapy as vp
import numpy as np
import struct
import time
from ann_benchmarks.algorithms.base import BaseANN


class Vamana(BaseANN):
def __init__(self, metric, param):
self.metric = {'angular': 'cosine', 'euclidean': 'l2'}[metric]
self.l_build = int(param["l_build"])
self.max_outdegree = int(param["max_outdegree"])
self.alpha = float(param["alpha"])
print("Vamana: L_Build = " + str(self.l_build))
print("Vamana: R = " + str(self.max_outdegree))
print("Vamana: Alpha = " + str(self.alpha))
self.params = vp.Parameters()
self.params.set("L", self.l_build)
self.params.set("R", self.max_outdegree)
self.params.set("C", 750)
self.params.set("alpha", self.alpha)
self.params.set("saturate_graph", False)
self.params.set("num_threads", 1)

def fit(self, X):

def bin_to_float(binary):
return struct.unpack('!f',struct.pack('!I', int(binary, 2)))[0]

print("Vamana: Starting Fit...")
index_dir = 'indices'

if not os.path.exists(index_dir):
os.makedirs(index_dir)

data_path = os.path.join(index_dir, 'base.bin')
self.name = 'Vamana-{}-{}-{}'.format(self.l_build,
self.max_outdegree, self.alpha)
save_path = os.path.join(index_dir, self.name)
print('Vamana: Index Stored At: ' + save_path)
shape = [np.float32(bin_to_float('{:032b}'.format(X.shape[0]))),
np.float32(bin_to_float('{:032b}'.format(X.shape[1])))]
X = X.flatten()
X = np.insert(X, 0, shape)
X.tofile(data_path)

if not os.path.exists(save_path):
print('Vamana: Creating Index')
s = time.time()
if self.metric == 'l2':
index = vp.SinglePrecisionIndex(vp.Metric.FAST_L2, data_path)
elif self.metric == 'cosine':
index = vp.SinglePrecisionIndex(vp.Metric.INNER_PRODUCT, data_path)
else:
print('Vamana: Unknown Metric Error!')
index.build(self.params, [])
t = time.time()
print('Vamana: Index Build Time (sec) = ' + str(t - s))
index.save(save_path)
if os.path.exists(save_path):
print('Vamana: Loading Index: ' + str(save_path))
s = time.time()
if self.metric == 'l2':
self.index = vp.SinglePrecisionIndex(vp.Metric.FAST_L2, data_path)
elif self.metric == 'cosine':
self.index = vp.SinglePrecisionIndex(vp.Metric.INNER_PRODUCT, data_path)
else:
print('Vamana: Unknown Metric Error!')
self.index.load(file_name = save_path)
print("Vamana: Index Loaded")
self.index.optimize_graph()
print("Vamana: Graph Optimization Completed")
t = time.time()
print('Vamana: Index Load Time (sec) = ' + str(t - s))
else:
print("Vamana: Unexpected Index Build Time Error")

print('Vamana: End of Fit')

def set_query_arguments(self, l_search):
print("Vamana: L_Search = " + str(l_search))
self.l_search = l_search

def query(self, v, n):
return self.index.single_numpy_query(v, n, self.l_search)

def batch_query(self, X, n):
self.num_queries = X.shape[0]
self.result = self.index.batch_numpy_query(X, n, self.num_queries, self.l_search)

def get_batch_results(self):
return self.result.reshape((self.num_queries, self.result.shape[0] // self.num_queries))


class VamanaPQ(BaseANN):
def __init__(self, metric, param):
self.metric = {'angular': 'cosine', 'euclidean': 'l2'}[metric]
self.l_build = int(param["l_build"])
self.max_outdegree = int(param["max_outdegree"])
self.alpha = float(param["alpha"])
self.chunks = int(param["chunks"])
print("Vamana PQ: L_Build = " + str(self.l_build))
print("Vamana PQ: R = " + str(self.max_outdegree))
print("Vamana PQ: Alpha = " + str(self.alpha))
print("Vamana PQ: Chunks = " + str(self.chunks))
self.params = vp.Parameters()
self.params.set("L", self.l_build)
self.params.set("R", self.max_outdegree)
self.params.set("C", 750)
self.params.set("alpha", self.alpha)
self.params.set("saturate_graph", False)
self.params.set("num_chunks", self.chunks)
self.params.set("num_threads", 1)

def fit(self, X):

def bin_to_float(binary):
return struct.unpack('!f',struct.pack('!I', int(binary, 2)))[0]

print("Vamana PQ: Starting Fit...")
index_dir = 'indices'

if self.chunks > X.shape[1]:
raise ValueError

if not os.path.exists(index_dir):
os.makedirs(index_dir)

data_path = os.path.join(index_dir, 'base.bin')
pq_path = os.path.join(index_dir, 'pq_memory_index')
self.name = 'VamanaPQ-{}-{}-{}'.format(self.l_build,
self.max_outdegree, self.alpha)
save_path = os.path.join(index_dir, self.name)
print('Vamana PQ: Index Stored At: ' + save_path)
shape = [np.float32(bin_to_float('{:032b}'.format(X.shape[0]))),
np.float32(bin_to_float('{:032b}'.format(X.shape[1])))]
X = X.flatten()
X = np.insert(X, 0, shape)
X.tofile(data_path)

if not os.path.exists(save_path):
print('Vamana PQ: Creating Index')
s = time.time()
if self.metric == 'l2':
index = vp.SinglePrecisionIndex(vp.Metric.FAST_L2, data_path)
elif self.metric == 'cosine':
index = vp.SinglePrecisionIndex(vp.Metric.INNER_PRODUCT, data_path)
else:
print('Vamana PQ: Unknown Metric Error!')
index.pq_build(data_path, pq_path, self.params)
t = time.time()
print('Vamana PQ: Index Build Time (sec) = ' + str(t - s))
index.save(save_path)
if os.path.exists(save_path):
print('Vamana PQ: Loading Index: ' + str(save_path))
s = time.time()
if self.metric == 'l2':
self.index = vp.SinglePrecisionIndex(vp.Metric.FAST_L2, data_path)
elif self.metric == 'cosine':
self.index = vp.SinglePrecisionIndex(vp.Metric.INNER_PRODUCT, data_path)
else:
print('Vamana PQ: Unknown Metric Error!')
self.index.load(file_name = save_path)
print("Vamana PQ: Index Loaded")
self.index.pq_load(pq_prefix_path = pq_path)
print("Vamana PQ: PQ Data Loaded")
self.index.optimize_graph()
print("Vamana PQ: Graph Optimization Completed")
t = time.time()
print('Vamana PQ: Index Load Time (sec) = ' + str(t - s))
else:
print("Vamana PQ: Unexpected Index Build Time Error")

print('Vamana PQ: End of Fit')

def set_query_arguments(self, l_search):
print("Vamana PQ: L_Search = " + str(l_search))
self.l_search = l_search

def query(self, v, n):
return self.index.pq_single_numpy_query(v, n, self.l_search)

def batch_query(self, X, n):
self.num_queries = X.shape[0]
self.result = self.index.pq_batch_numpy_query(X, n, self.num_queries, self.l_search)

def get_batch_results(self):
return self.result.reshape((self.num_queries, self.result.shape[0] // self.num_queries))
2 changes: 1 addition & 1 deletion install/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ RUN pip3 install -U pip

WORKDIR /home/app
COPY requirements.txt run_algorithm.py ./
RUN pip3 install -rrequirements.txt
RUN pip3 install -r requirements.txt

ENTRYPOINT ["python3", "run_algorithm.py"]
29 changes: 29 additions & 0 deletions install/Dockerfile.diskann
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM ann-benchmarks

RUN apt-get update
RUN apt-get install -y wget git cmake g++ libaio-dev libgoogle-perftools-dev clang-format-4.0 libboost-dev python3 python3-setuptools python3-pip
RUN pip3 install pybind11 numpy

RUN cd /tmp && wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
RUN cd /tmp && apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
RUN cd /tmp && rm GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
RUN cd /tmp && sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
RUN apt-get update
RUN apt-get install -y intel-mkl-64bit-2020.0-088

RUN update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so libblas.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 150
RUN update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 150
RUN update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so liblapack.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 150
RUN update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 150

RUN echo "/opt/intel/lib/intel64" > /etc/ld.so.conf.d/mkl.conf
RUN echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/mkl.conf
RUN ldconfig
RUN echo "MKL_THREADING_LAYER=GNU" >> /etc/environment

RUN git clone --single-branch --branch python_bindings https://github.com/microsoft/diskann
RUN mkdir -p diskann/build
RUN cd diskann/build && cmake -DCMAKE_BUILD_TYPE=Release ..
RUN cd diskann/build && make -j
RUN cd diskann/python && pip install -e .
RUN python3 -c 'import vamanapy'