You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of remove_many takes a list of hashes (in Python) and copies it into Rust for removal. There are a few improvements to make it more performant, specifically allowing the use of MinHash sketches as argument, and call a different Rust function that avoids the list copying (similar to what add_many already does).
Tagging @keyabarve because she has been looking for a Rust issue for some time, and this will touch Python, Rust and the interface inbetween =]
Suggested plan for fixing this issue:
Add tests for the new behavior (a remove_many taking a MinHash as argument). These tests will fail, because there is no implementation of remove_many that accepts MinHash sketches as arguments yet =])
Once there is a test, modify the remove_many method to also take MinHash arguments. This will involve creating a Rust function and expose it to Python. (more details to be added here)
Tests should be passing at this point.
Add benchmarks for remove_many in benchmarks/benchmarks.py. Note there is a mem benchmark for add_many that can serve as template, but we want to check both remove_many with lists/sets or with a MinHash as arguments.
Verify that new version of remove_many is actually faster than the previous version.
The true fix is making remove_many more efficient, because it is a common use case. At the moment remove_many needs a list of hashes to remove (which is convenient, because we can pass a list or set with the hashes), but many times we are actually pulling the hashes from a MinHash without any modifications, and in this case it is better to pass the full MinHash into Rust and process all the data there (avoiding extracting hashes from the MinHash, then building a list, and so on).
This block needs to be changed:
into something that checks if the argument is a MinHash, and if so calls another Rust function, like add_many does:
The text was updated successfully, but these errors were encountered:
The current implementation of
remove_many
takes a list of hashes (in Python) and copies it into Rust for removal. There are a few improvements to make it more performant, specifically allowing the use ofMinHash
sketches as argument, and call a different Rust function that avoids the list copying (similar to whatadd_many
already does).Tagging @keyabarve because she has been looking for a Rust issue for some time, and this will touch Python, Rust and the interface inbetween =]
Suggested plan for fixing this issue:
remove_many
taking aMinHash
as argument). These tests will fail, because there is no implementation ofremove_many
that acceptsMinHash
sketches as arguments yet =])remove_many
method to also takeMinHash
arguments. This will involve creating a Rust function and expose it to Python. (more details to be added here)remove_many
inbenchmarks/benchmarks.py
. Note there is a mem benchmark foradd_many
that can serve as template, but we want to check bothremove_many
withlist
s/set
s or with aMinHash
as arguments.remove_many
is actually faster than the previous version.Other tips
add_many
, our example for modifications in this issue)From #1552 (comment):
The text was updated successfully, but these errors were encountered: