Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving remove_many performance #1553

Closed
luizirber opened this issue May 25, 2021 · 0 comments · Fixed by #1571
Closed

Improving remove_many performance #1553

luizirber opened this issue May 25, 2021 · 0 comments · Fixed by #1571
Labels
good next issue An issue that should be ready to resolve. rust

Comments

@luizirber
Copy link
Member

The current implementation of remove_many takes a list of hashes (in Python) and copies it into Rust for removal. There are a few improvements to make it more performant, specifically allowing the use of MinHash sketches as argument, and call a different Rust function that avoids the list copying (similar to what add_many already does).

Tagging @keyabarve because she has been looking for a Rust issue for some time, and this will touch Python, Rust and the interface inbetween =]

Suggested plan for fixing this issue:

  • Add tests for the new behavior (a remove_many taking a MinHash as argument). These tests will fail, because there is no implementation of remove_many that accepts MinHash sketches as arguments yet =])
  • Once there is a test, modify the remove_many method to also take MinHash arguments. This will involve creating a Rust function and expose it to Python. (more details to be added here)
  • Tests should be passing at this point.
  • Add benchmarks for remove_many in benchmarks/benchmarks.py. Note there is a mem benchmark for add_many that can serve as template, but we want to check both remove_many with lists/sets or with a MinHash as arguments.
  • Verify that new version of remove_many is actually faster than the previous version.

Other tips


From #1552 (comment):

The true fix is making remove_many more efficient, because it is a common use case. At the moment remove_many needs a list of hashes to remove (which is convenient, because we can pass a list or set with the hashes), but many times we are actually pulling the hashes from a MinHash without any modifications, and in this case it is better to pass the full MinHash into Rust and process all the data there (avoiding extracting hashes from the MinHash, then building a list, and so on).
This block needs to be changed:

into something that checks if the argument is a MinHash, and if so calls another Rust function, like add_many does:

@luizirber luizirber added rust good next issue An issue that should be ready to resolve. labels May 25, 2021
@ctb ctb closed this as completed in #1571 Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good next issue An issue that should be ready to resolve. rust
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant