-
-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nearest neighbors and approximate clustering #15
Comments
I would definitely be interested in adding it, first of all, as a variant of the nearest neighbors algorithm (similarly to A couple of questions though concerning SIMD: how much of it leaks to the public interface? In other words, do I - as a user of the algorithm - have to be aware of SIMD-specific datatype when using the algorithm? Can that implementation complexity be hidden away from me? |
@LukeMathWalker The reason that SIMD types are used in the public API as vectors for nearest neighbor is mostly due to alignment issues. By using the SIMD types I can ensure that Rust will always put them in vectors, structs, and the stack at the correct alignment. This could be hidden from the user, but I would need to specially take care of alignment issues manually in the code. I could create a wrapper type that just took in an array that creates a SIMD wrapper internally as necessary. |
That would be ideal @vadixidav - I'd like to avoid each model forcing the user to learn a different input type. It would significantly complicate the usage of the overall crate. |
Alright, I am a bit focused on some other open source stuff currently, but I will come back to this and simplify that API. Technically it could be used now easily given a speed penalty for not using SIMD, but I can make wrappers so we can get speed without exposing the SIMD types. |
Alright, I have done my best to make it as easy as possible to interact with arrays from HNSW. It now has extensive impls of https://docs.rs/hnsw/0.4.0/hnsw/struct.Hamming.html If desired, I can even make type aliases to avoid needing the Let me know if you need anything else to be done. |
Pretty sure this issue can be closed, since we already have #119 for nearest-neighbours. |
Feel free to use the HNSW crate I created which ports HNSW to Rust: https://github.com/rust-photogrammetry/hnsw. If you are interested I can add it as an ANN algorithm. I am not as familiar with clustering, but you should be able to build a very fast approximate clustering algorithm utiliIng HNSW.
Let me know if this would be a good addition and what to add to linfa (and where). HNSW should perform quite well from a benchmark perspective, but I haven't ran it neck-to-neck with existing implementations using the exact same data and bench harness yet, so I am interested to try that if python bindings are added.
The text was updated successfully, but these errors were encountered: