Improve performance changes (draft) #18

marcus-pousette · 2022-08-20T21:15:28Z

I was playing our with the library so I created a few changes I could possibly split into multiple PRs.
Would like to hear you thoughts about this

Changes for improved performance

add_document use Cow for hash keys.
hashbrown Hashmap instead of std::collections. Faster for small hash keys. Though does not provide the same level of HashDoS resistance
TermData query_terms property is replaced with query_terms_len to prevent unnecessary copies of all query terms during query (not benchmarked yet) (Breaking change for the ScoreCalculator trait)

Bench results on my computer add_100k_docs
Master
301.19 ms

This branch
221.24 ms

API change ideas

Filter is removed, since the Tokenizer can just be seen as a preprocessing step to indexation and query and can do anything that the Filter does. One argument less to worry about. (Breaking change)

tmpfs · 2022-08-21T04:18:12Z

src/index.rs

    fmt::{Debug, Formatter},
    hash::Hash,
    usize,
 };

-use crate::{FieldAccessor, Filter, Tokenizer};
+use crate::{FieldAccessor, Tokenizer};
+use hashbrown::{HashMap, HashSet};


That's quite a significant change in performance, I guess most of it is the switch to hashbrown?

I can imagine a small gain from removing the Filter calls but not that significant.

Yes. Most is hashbrown . A small portion seem to be from the Cow change in the add_document also, but much less significant

tmpfs

This looks great! I think removing Filter is a good decision, less is more. It seems like hashbrown gives most of the performance improvement, ~25% speedup is a big deal 🙌

marcus-pousette added 2 commits August 20, 2022 22:46

add_document cow keys

cb63e36

replace std hashmap with hashbrown hashmap

a25f1b0

marcus-pousette requested a review from tmpfs August 20, 2022 21:15

marcus-pousette changed the title ~~Improve add document performance change~~ Improve add document performance change (draft) Aug 20, 2022

marcus-pousette changed the title ~~Improve add document performance change (draft)~~ Improve performance changes (draft) Aug 20, 2022

refactor

81fc550

marcus-pousette force-pushed the add-document-cow branch from 3aac183 to 81fc550 Compare August 20, 2022 21:26

tmpfs reviewed Aug 21, 2022

View reviewed changes

tmpfs approved these changes Aug 21, 2022

View reviewed changes

marcus-pousette merged commit c04d2f0 into master Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance changes (draft) #18

Improve performance changes (draft) #18

marcus-pousette commented Aug 20, 2022 •

edited

tmpfs Aug 21, 2022 •

edited

marcus-pousette Aug 21, 2022

tmpfs left a comment

Improve performance changes (draft) #18

Improve performance changes (draft) #18

Conversation

marcus-pousette commented Aug 20, 2022 • edited

Changes for improved performance

API change ideas

tmpfs Aug 21, 2022 • edited

Choose a reason for hiding this comment

marcus-pousette Aug 21, 2022

Choose a reason for hiding this comment

tmpfs left a comment

Choose a reason for hiding this comment

marcus-pousette commented Aug 20, 2022 •

edited

tmpfs Aug 21, 2022 •

edited