-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add pq support for hnsw searching #122
Conversation
Codecov Report
@@ Coverage Diff @@
## main #122 +/- ##
==========================================
+ Coverage 60.13% 66.16% +6.02%
==========================================
Files 23 23
Lines 1312 1318 +6
==========================================
+ Hits 789 872 +83
+ Misses 523 446 -77
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Totally, all of the implementations are perfect. We need some minor revisions to polish the code and API signature.
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
I'm wondering why the below annlite/bindings/hnsw_bindings.cpp Lines 224 to 255 in 103b5bf
Seem like the adding of the first row can be merged into |
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
bindings/hnsw_bindings.cpp
Outdated
throw py::attribute_error( | ||
"PQ class returning the codebook with wrong dimension"); | ||
} | ||
std::shared_ptr<float *> codebook_buffer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::shared_ptr<float *> codebook_buffer = | |
std::shared_ptr<float> codebook_buffer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the template type is the type, u do not want a shared pointer to a float pointer, but a shared poiinter to a float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now that I see it, I am not sure this is what u want, what u would like is to have an std::array<float>
instead of a float*
. Like this u can use this std::array
as an object and forget about the pointers, and just pass it around as const reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the codebook
has a known size u should use std::array
to handle the ownership of the underlying buffer in a safe way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake about the shared_ptr
's type here, it should be float
.
May I ask when you suggested std::array
, what's the solution in your mind? Do you mean we should copy the codebook from the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stick to only one syntax to detect what variables are object members. this->
or not this->
but do not mix it
include/hnswlib/space_pq.h
Outdated
DISTFUNC<float> fstdistfunc_; | ||
size_t data_size_, d_subvectors; | ||
pq_dist_param_t param; | ||
std::shared_ptr<float *> codebook; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shared_ptr as mentioned in other PRs is not clear. If u can use std::shared_ptr
means u can use std::array
and this is what u should have.
bindings/hnsw_bindings.cpp
Outdated
throw py::attribute_error( | ||
"PQ class returning the codebook with wrong dimension"); | ||
} | ||
std::shared_ptr<float *> codebook_buffer = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the codebook
has a known size u should use std::array
to handle the ownership of the underlying buffer in a safe way
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
the above commits fixed the before problems, except for using the |
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
I removed the ownership of |
The conflicts happened at very beginning of this pr, so merging instead of rebasing |
Signed-off-by: Jianbai Ye <jianbaiye@outlook.com>
In order to reduce the memory usage of HNSW, we can store the embedding quantized by PQ(normally will be
uint8_t
). Also, the distance computation will become a look-up operation, which should be a little bit faster while adding and searching.