From e2000e9eddab3369673a745c65393739c36c8ce9 Mon Sep 17 00:00:00 2001 From: TakaakiFuruse Date: Thu, 25 Feb 2021 15:36:17 +0900 Subject: [PATCH 1/5] Improved description of `add_items` This is just a suggestion of doc improvement. For `add_items` description, I have made `labels` part and `data_labels` part together since... 1. There's no argument called `labels` for `add_items` func. 2. It felt like `labels` were a typo of `data_labels` from a commit 5c20009. --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8d139fdc..4acdfa6b 100644 --- a/README.md +++ b/README.md @@ -47,9 +47,9 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib. * `M` defines tha maximum number of outgoing connections in the graph ([ALGO_PARAMS.md](ALGO_PARAMS.md)). * `add_items(data, data_labels, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure. - * `labels` is an optional N-size numpy array of integer labels for all elements in `data`. * `num_threads` sets the number of cpu threads to use (-1 means use default). - * `data_labels` specifies the labels for the data. If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient. + * `data_labels` are optional N-size numpy array of integer labels for all elements in `data`. + - If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient. * Thread-safe with other `add_items` calls, but not with `knn_query`. * `mark_deleted(data_label)` - marks the element as deleted, so it will be omitted from search results. From 95d6b0275a57785ebb497a947afc4198f5d00eee Mon Sep 17 00:00:00 2001 From: TakaakiFuruse Date: Sat, 6 Mar 2021 14:33:44 +0900 Subject: [PATCH 2/5] data_labels => ids ref: https://github.com/nmslib/hnswlib/pull/289#issuecomment-789353096 --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 4acdfa6b..45547478 100644 --- a/README.md +++ b/README.md @@ -46,9 +46,9 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib. * `ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)). * `M` defines tha maximum number of outgoing connections in the graph ([ALGO_PARAMS.md](ALGO_PARAMS.md)). -* `add_items(data, data_labels, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure. +* `add_items(data, ids, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure. * `num_threads` sets the number of cpu threads to use (-1 means use default). - * `data_labels` are optional N-size numpy array of integer labels for all elements in `data`. + * `ids` are optional N-size numpy array of integer labels for all elements in `data`. - If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient. * Thread-safe with other `add_items` calls, but not with `knn_query`. From 9ba16e24be66b5595b9b9702d7819b2e2797e278 Mon Sep 17 00:00:00 2001 From: TakaakiFuruse Date: Sat, 6 Mar 2021 14:38:22 +0900 Subject: [PATCH 3/5] data_label => label --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 45547478..3f14a05c 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib. - If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient. * Thread-safe with other `add_items` calls, but not with `knn_query`. -* `mark_deleted(data_label)` - marks the element as deleted, so it will be omitted from search results. +* `mark_deleted(label)` - marks the element as deleted, so it will be omitted from search results. * `resize_index(new_size)` - changes the maximum capacity of the index. Not thread safe with `add_items` and `knn_query`. From 1afdce0e908a773af447f44247fe68113d1f2b27 Mon Sep 17 00:00:00 2001 From: TakaakiFuruse Date: Sat, 6 Mar 2021 14:54:41 +0900 Subject: [PATCH 4/5] fixed sample code, data_labels => ids --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3f14a05c..7057d58f 100644 --- a/README.md +++ b/README.md @@ -113,7 +113,7 @@ num_elements = 10000 # Generating sample data data = np.float32(np.random.random((num_elements, dim))) -data_labels = np.arange(num_elements) +ids = np.arange(num_elements) # Declaring index p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or ip @@ -122,7 +122,7 @@ p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or p.init_index(max_elements = num_elements, ef_construction = 200, M = 16) # Element insertion (can be called several times): -p.add_items(data, data_labels) +p.add_items(data, ids) # Controlling the recall by setting ef: p.set_ef(50) # ef should always be > k From af284e6c3f408d31e9123f783dfeeb77eb54b8c6 Mon Sep 17 00:00:00 2001 From: TakaakiFuruse Date: Sat, 6 Mar 2021 15:00:42 +0900 Subject: [PATCH 5/5] changed order of args --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7057d58f..b2c0166c 100644 --- a/README.md +++ b/README.md @@ -41,7 +41,7 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib. * `hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`. `hnswlib.Index` methods: -* `init_index(max_elements, ef_construction = 200, M = 16, random_seed = 100)` initializes the index from with no elements. +* `init_index(max_elements, M = 16, ef_construction = 200, random_seed = 100)` initializes the index from with no elements. * `max_elements` defines the maximum number of elements that can be stored in the structure(can be increased/shrunk). * `ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)). * `M` defines tha maximum number of outgoing connections in the graph ([ALGO_PARAMS.md](ALGO_PARAMS.md)).