Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: faster term hash map #1940

Merged
merged 3 commits into from
Apr 17, 2023
Merged

perf: faster term hash map #1940

merged 3 commits into from
Apr 17, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Mar 14, 2023

perf: faster arena hashmap

add inlines
faster initialization by replacing iter with vec!
remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry)
reduce saturation threshold from 1/3 to 1/2 to reduce memory consumption
use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff)
fix naming LinearProbing
remove byteorder dependency

@PSeitz PSeitz force-pushed the fuzzy_term_map branch 5 times, most recently from 9223807 to b9aeff7 Compare March 15, 2023 08:18
PSeitz and others added 3 commits April 17, 2023 14:11
add inlines
remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry)
reduce saturation threshold from 1/3 to 1/2 to reduce memory
use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff)
fix naming LinearProbing
remove byteorder dependency

memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy
Co-authored-by: Paul Masurel <paul@quickwit.io>
@codecov-commenter
Copy link

Codecov Report

Merging #1940 (525b560) into main (0286ece) will decrease coverage by 0.04%.
The diff coverage is 84.92%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1940      +/-   ##
==========================================
- Coverage   94.53%   94.49%   -0.04%     
==========================================
  Files         319      320       +1     
  Lines       59629    59706      +77     
==========================================
+ Hits        56372    56421      +49     
- Misses       3257     3285      +28     
Impacted Files Coverage Δ
src/postings/mod.rs 98.57% <ø> (ø)
stacker/example/hashmap.rs 0.00% <0.00%> (ø)
stacker/src/lib.rs 100.00% <ø> (ø)
sstable/src/dictionary.rs 85.98% <75.00%> (ø)
sstable/src/streamer.rs 93.19% <80.00%> (+0.07%) ⬆️
sstable/src/delta.rs 95.30% <85.18%> (-2.36%) ⬇️
sstable/src/block_reader.rs 74.32% <86.66%> (-1.09%) ⬇️
stacker/src/arena_hashmap.rs 93.45% <97.56%> (+1.30%) ⬆️
columnar/src/tests.rs 99.12% <100.00%> (ø)
ownedbytes/src/lib.rs 96.94% <100.00%> (-1.72%) ⬇️
... and 6 more

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@PSeitz PSeitz merged commit e83abbf into main Apr 17, 2023
@PSeitz PSeitz deleted the fuzzy_term_map branch April 17, 2023 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants