Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance changes (draft) #18

Merged
merged 3 commits into from Aug 21, 2022
Merged

Conversation

marcus-pousette
Copy link
Collaborator

@marcus-pousette marcus-pousette commented Aug 20, 2022

I was playing our with the library so I created a few changes I could possibly split into multiple PRs.
Would like to hear you thoughts about this

Changes for improved performance

  • add_document use Cow for hash keys.
  • hashbrown Hashmap instead of std::collections. Faster for small hash keys. Though does not provide the same level of HashDoS resistance
  • TermData query_terms property is replaced with query_terms_len to prevent unnecessary copies of all query terms during query (not benchmarked yet) (Breaking change for the ScoreCalculator trait)

Bench results on my computer add_100k_docs
Master
301.19 ms

This branch
221.24 ms

API change ideas

  • Filter is removed, since the Tokenizer can just be seen as a preprocessing step to indexation and query and can do anything that the Filter does. One argument less to worry about. (Breaking change)

@marcus-pousette marcus-pousette changed the title Improve add document performance change Improve add document performance change (draft) Aug 20, 2022
@marcus-pousette marcus-pousette changed the title Improve add document performance change (draft) Improve performance changes (draft) Aug 20, 2022
fmt::{Debug, Formatter},
hash::Hash,
usize,
};

use crate::{FieldAccessor, Filter, Tokenizer};
use crate::{FieldAccessor, Tokenizer};
use hashbrown::{HashMap, HashSet};
Copy link
Collaborator

@tmpfs tmpfs Aug 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a significant change in performance, I guess most of it is the switch to hashbrown?

I can imagine a small gain from removing the Filter calls but not that significant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Most is hashbrown . A small portion seem to be from the Cow change in the add_document also, but much less significant

Copy link
Collaborator

@tmpfs tmpfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think removing Filter is a good decision, less is more. It seems like hashbrown gives most of the performance improvement, ~25% speedup is a big deal 🙌

@marcus-pousette marcus-pousette merged commit c04d2f0 into master Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants