Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
cb7b398
New methods loadIndexFromStream and saveIndexToStream expose de-/seri…
dbespalov Oct 12, 2020
e161db8
Implement __getstate__ and __setstate__ to allow pickling of hnswlib.…
dbespalov Oct 12, 2020
e0eacad
Verify knn_query results match before/after pickling hnswlib.Index ob…
dbespalov Oct 12, 2020
ec4f4b1
add documeentation
dbespalov Oct 12, 2020
a3646cc
clean-up readme
dbespalov Oct 12, 2020
a1ba4e5
clean-up readme
dbespalov Oct 12, 2020
cf3846c
clean-up readme
dbespalov Oct 12, 2020
27471cd
clean-up readme
dbespalov Oct 12, 2020
4220956
Update bindings_test_pickle.py
dbespalov Oct 12, 2020
72b6501
Revert "New methods loadIndexFromStream and saveIndexToStream expose …
Oct 23, 2020
3a62b41
use python's buffer protocol to avoid making copies of ann data (stat…
Oct 23, 2020
fe6d2fa
replace tab characters with spaces
Oct 23, 2020
c9fb60d
test each space (ip/cosine/l2) as a separate unittest
Oct 23, 2020
3c4510d
return array_t pointers
dbespalov Oct 25, 2020
64c5154
expose static method of Index class as copy constructor in python
dbespalov Oct 25, 2020
7b445c8
do not waste space when returning serialized appr_alg->linkLists_
dbespalov Oct 25, 2020
c02f1dc
serialize element_lookup_ and element_level_ as array_t arrays; pass …
dbespalov Oct 26, 2020
1f25102
warn that serialization is not thread safe with add_items
dbespalov Nov 3, 2020
1165370
warn that serialization is not thread safe with add_items; add todo b…
dbespalov Nov 3, 2020
2c040e6
remove camel casing
dbespalov Nov 3, 2020
6298996
add static const int data member to class Index that stores serializa…
dbespalov Nov 6, 2020
c8276d8
add todo block to convert parameter tuple to dicts
dbespalov Nov 6, 2020
345f71d
add todo block to convert parameter tuple to dicts
dbespalov Nov 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
#### Short API description
* `hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`.

Index methods:
`hnswlib.Index` methods:
* `init_index(max_elements, ef_construction = 200, M = 16, random_seed = 100)` initializes the index from with no elements.
* `max_elements` defines the maximum number of elements that can be stored in the structure(can be increased/shrunk).
* `ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)).
Expand Down Expand Up @@ -76,14 +76,34 @@ Index methods:

* `get_current_count()` - returns the current number of element stored in the index



Read-only properties of `hnswlib.Index` class:

* `space` - name of the space (can be one of "l2", "ip", or "cosine").

* `dim` - dimensionality of the space.

* `M` - parameter that defines the maximum number of outgoing connections in the graph.

* `ef_construction` - parameter that controls speed/accuracy trade-off during the index construction.

* `max_elements` - current capacity of the index. Equivalent to `p.get_max_elements()`.

* `element_count` - number of items in the index. Equivalent to `p.get_current_count()`.

Properties of `hnswlib.Index` that support reading and writing:

* `ef` - parameter controlling query time/accuracy trade-off.

* `num_threads` - default number of threads to use in `add_items` or `knn_query`. Note that calling `p.set_num_threads(3)` is equivalent to `p.num_threads=3`.




#### Python bindings examples
```python
import hnswlib
import numpy as np
import pickle

dim = 128
num_elements = 10000
Expand All @@ -106,6 +126,18 @@ p.set_ef(50) # ef should always be > k

# Query dataset, k - number of closest elements (returns 2 numpy arrays)
labels, distances = p.knn_query(data, k = 1)

# Index objects support pickling
# WARNING: serialization via pickle.dumps(p) or p.__getstate__() is NOT thread-safe with p.add_items method!
# Note: ef parameter is included in serialization; random number generator is initialized with random_seeed on Index load
p_copy = pickle.loads(pickle.dumps(p)) # creates a copy of index p using pickle round-trip

### Index parameters are exposed as class properties:
print(f"Parameters passed to constructor: space={p_copy.space}, dim={p_copy.dim}")
print(f"Index construction: M={p_copy.M}, ef_construction={p_copy.ef_construction}")
print(f"Index size is {p_copy.element_count} and index capacity is {p_copy.max_elements}")
print(f"Search speed/quality trade-off parameter: ef={p_copy.ef}")

```

An example with updates after serialization/deserialization:
Expand Down
13 changes: 6 additions & 7 deletions hnswlib/hnswalg.h
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,6 @@ namespace hnswlib {
if (!input.is_open())
throw std::runtime_error("Cannot open file");


// get file size:
input.seekg(0,input.end);
std::streampos total_filesize=input.tellg();
Expand Down Expand Up @@ -874,7 +873,7 @@ namespace hnswlib {
for (auto&& cand : sCand) {
if (cand == neigh)
continue;

dist_t distance = fstdistfunc_(getDataByInternalId(neigh), getDataByInternalId(cand), dist_func_param_);
if (candidates.size() < elementsToKeep) {
candidates.emplace(distance, cand);
Expand Down Expand Up @@ -1137,7 +1136,7 @@ namespace hnswlib {
}

std::priority_queue<std::pair<dist_t, tableint>, std::vector<std::pair<dist_t, tableint>>, CompareByFirst> top_candidates;
if (has_deletions_) {
if (has_deletions_) {
top_candidates=searchBaseLayerST<true,true>(
currObj, query_data, std::max(ef_, k));
}
Expand Down Expand Up @@ -1186,27 +1185,27 @@ namespace hnswlib {
std::unordered_set<tableint> s;
for (int j=0; j<size; j++){
assert(data[j] > 0);
assert(data[j] < cur_element_count);
assert(data[j] < cur_element_count);
assert (data[j] != i);
inbound_connections_num[data[j]]++;
s.insert(data[j]);
connections_checked++;

}
assert(s.size() == size);
}
}
if(cur_element_count > 1){
int min1=inbound_connections_num[0], max1=inbound_connections_num[0];
for(int i=0; i < cur_element_count; i++){
for(int i=0; i < cur_element_count; i++){
assert(inbound_connections_num[i] > 0);
min1=std::min(inbound_connections_num[i],min1);
max1=std::max(inbound_connections_num[i],max1);
}
std::cout << "Min inbound: " << min1 << ", Max inbound:" << max1 << "\n";
}
std::cout << "integrity ok, checked " << connections_checked << " connections\n";

}

};
Expand Down
Loading