Skip to content

Commit

Permalink
Feat added bk tree benchmark and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Pablo Elias committed Oct 9, 2023
1 parent 20ae652 commit a2cbbb3
Show file tree
Hide file tree
Showing 5 changed files with 47 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction

PyNear is a python library, internally built in C++, for efficient KNN search using metric distance function such as L2 distance (see VPTreeL2Index) or Hamming distances (VPTreeBinaryIndex) as well as other distance functions. It uses AVX2 instructions to optimize distance functions so to improve search performance.
PyNear is a python library, internally built in C++, for efficient KNN search using metric distance function such as L2 distance (see VPTreeL2Index) or Hamming distances (VPTreeBinaryIndex and BKTreeBinaryIndex) as well as other distance functions. It uses AVX2 instructions to optimize distance functions so to improve search performance.

PyNear aims providing different efficient algorithms for nearest neighbor search. One of the differentials of PyNear is the adoption of [Vantage Point Tree](./docs/vptrees.md) in order to mitigate course of dimensionality for high dimensional features (see VPTree* indices for more information in [docs](./docs/README.md)).

Expand Down
30 changes: 29 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,22 @@
PyNear has several available indexes that will use different distance functions or algorithms to perform the search.
Available indices are:

### Threshold based KNN Indices:

| Index Name | Description |
|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| pynear.BKTreeBinaryIndex | Uses AVX2 optimized Hamming distance function and [BKTree](https://en.wikipedia.org/wiki/BK-tree) algorithm to perform exact searches within a threshold distance. |

### Non Threshold based KNN Indices:

| Index Name | Description |
|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| pynear.VPTreeL2Index | Uses AVX2 optimized L2 (euclidean norm) distance function and VPTree algorithm to perform exact searches. |
| pynear.VPTreeL1Index | Uses L1 (manhattan) distance function and VPTree algorithm to perform exact searches. |
| pynear.VPTreeBinaryIndex | Uses AVX2 optimized Hamming distances function and VPTree algorithm to perform exact searches. Supports 16, 32, 64, 128 and 256 bit dimensional vectors only. |
| pynear.VPTreeBinaryIndex | Uses AVX2 optimized Hamming distance function and VPTree algorithm to perform exact searches. Supports 16, 32, 64, 128 and 256 bit dimensional vectors only. |
| pynear.VPTreeChebyshevIndex | Uses [Chebyshev](https://en.wikipedia.org/wiki/Chebyshev_distance) distance function and VPTree algorithm to perform exact searches. |


## Usage example

### Creating the index
Expand Down Expand Up @@ -41,6 +50,25 @@ vptree.set(data)
vptree_indices, vptree_distances = vptree.searchKNN(queries, k)
```

#### pynear.BKTreeBinaryIndex

```python
np.random.seed(seed=42)

dimension = 32 # 32 bytes are 256 bit dimensional vectos
num_points = 2021
data = np.random.normal(scale=255, loc=0, size=(num_points, dimension)).astype(dtype=np.uint8)

num_queries = 8
queries = np.random.normal(scale=255, loc=0, size=(num_queries, dimension)).astype(dtype=np.uint8)

vptree = pynear.BKTreeBinaryIndex()
vptree.set(data)

# To search using maximum threshold use dimension * 8 (the maximum distance) or set any other threshold
indices, distances, keys = tree.find_threshold(data, dimensions * 8)
```

For convenience, apart from `searchKNN` function, vptree also provides `search1NN` for searching the closest nearest neighbor.

#### pynear.VPTreeL2Index
Expand Down
Binary file modified docs/img/binary-index-comparison/result_k=8.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions pynear/benchmark/bin_benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ benchmark:
- name: "Binary Index Comparison"
k: [8]
num_queries: [16]
dimensions: [32, 64, 256]
dataset_total_size: 1000000
dimensions: [32, 64, 128, 256]
dataset_total_size: 200000
dataset_num_clusters: 50
dataset_type: "uint8"
index_types:
- FaissIndexBinaryFlat
- AnnoyHamming
- VPTreeBinaryIndex

- BKTreeBinaryIndex
16 changes: 14 additions & 2 deletions pynear/benchmark/index_adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def create_index_adapter(index_name: str):
"AnnoyManhattan": AnnoyManhattanAdapter,
"AnnoyHamming": AnnoyHammingAdapter,
"SKLearnL2": SKLearnL2Adapter,
"BKTreeBinaryIndex": PyNearBKTreeAdapter,
"VPTreeL2Index": pynear.VPTreeL2Index,
"VPTreeL1Index": pynear.VPTreeL1Index,
"VPTreeBinaryIndex": pynear.VPTreeBinaryIndex,
Expand All @@ -28,7 +29,7 @@ def create_index_adapter(index_name: str):
raise ValueError(f"Index name {index_name} not supported")

if index_name.startswith("VPTree"):
return PyNearAdapter(index_name)
return PyNearVPAdapter(index_name)

return mapper[index_name]()

Expand All @@ -52,7 +53,7 @@ def _search_implementation(self, query, k: int):
pass


class PyNearAdapter(IndexAdapter):
class PyNearVPAdapter(IndexAdapter):
def __init__(self, pyvp_index_name: str):
self._index = None
self._pyvp_index_name = pyvp_index_name
Expand All @@ -70,6 +71,17 @@ def build_index(self, data: np.ndarray):
def _search_implementation(self, query, k: int):
self._index.searchKNN(query, k)

class PyNearBKTreeAdapter(IndexAdapter):
def __init__(self):
self._index = pynear.BKTreeBinaryIndex()
self._dimensions = 0

def build_index(self, data: np.ndarray):
self._index.set(data)
self._dimensions = data.shape[1]

def _search_implementation(self, query, k: int):
self._index.find_threshold(query, self._dimensions)

class FaissIndexFlatL2Adapter(IndexAdapter):
def __init__(self):
Expand Down

0 comments on commit a2cbbb3

Please sign in to comment.