Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Hashing #33

Merged
merged 5 commits into from
Nov 4, 2023
Merged

Optimize Hashing #33

merged 5 commits into from
Nov 4, 2023

Conversation

BrennenMM7
Copy link
Contributor

Hello 👋 ,

Stumbled across jammdb as I was in middle of creating my own Rust rendition of BoltDB, deciding to fork/contrib here going forward as this project looks like a great foundation to continue forward.

This PR is to optimize the hashing packages utilized on meta pages for better read/write performance from the db. Utilizing some observability tactics, I found the longest function stalling operations was the hash_self function in meta.rs. This is due to the fact that every time a new TX is created hashing must occur of the meta pages, and the following package "sha3::{Digest, Sha3_256}" creates a deep call stack.

BoltDB implements a FNV-1a hash for its meta pages, which is significantly faster to calculate than utilizing a secure Sha3_256.

I also bumped the crates to newer versions, but no real perf gain was seen from this, fine with reversion if needed.

Looking forward to more contribs 👍

Benchmarking ( Each Get/Set is a new TX, No batching, No concurrency)

Before:
Time taken to set 100000 keys: 179.91754808s
Time taken to get 100000 keys: 155.249245ms

After:
Time taken to set 100000 keys: 151.29345334s
Time taken to get 100000 keys: 44.5641ms

@pjtatlow
Copy link
Owner

Hey @BrennenMM7! This looks great, thanks for submitting it! I'm going to do some testing myself soon and merge this in.

@BrennenMM7
Copy link
Contributor Author

Sounds good, to note this will obviously effect anyone's current DB files going forward as this would be a breaking change. Since the computed hash is different they'll never match unless, the DB is opened and rehashed to the new mechanism.

@pjtatlow
Copy link
Owner

Yeah I'm wondering if there is an easy way to do that. Try to read the metadata page as an FNV-1a hash, and if it doesn't work try the old one. Then always write FNV-1a metadata, so databases will be migrated automatically.

What do you think?

@BrennenMM7
Copy link
Contributor Author

That is definitely an option, could also split the versioning of the crate to 0.1.x vs 0.2.x keeping the new hash as the latest version, and the old for other who wish to not migrate.

Another option, or additionally provide a helper function/toolkit to migrate a DB from one version to the other.

Copy link

codecov bot commented Nov 4, 2023

Codecov Report

Attention: 33 lines in your changes are missing coverage. Please review.

Comparison is base (67a3b92) 94.22% compared to head (c21d937) 93.67%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #33      +/-   ##
==========================================
- Coverage   94.22%   93.67%   -0.56%     
==========================================
  Files          21       21              
  Lines        3778     3826      +48     
==========================================
+ Hits         3560     3584      +24     
- Misses        218      242      +24     
Files Coverage Δ
src/bucket.rs 97.24% <100.00%> (+<0.01%) ⬆️
tests/common/record.rs 85.71% <100.00%> (-0.24%) ⬇️
src/db.rs 93.02% <40.00%> (+1.95%) ⬆️
src/page.rs 91.08% <38.09%> (-8.23%) ⬇️
src/meta.rs 84.95% <72.13%> (-15.05%) ⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pjtatlow pjtatlow enabled auto-merge (squash) November 4, 2023 04:06
@pjtatlow pjtatlow merged commit f406a0d into pjtatlow:master Nov 4, 2023
12 checks passed
@pjtatlow
Copy link
Owner

pjtatlow commented Nov 4, 2023

@BrennenMM7 thanks again! I ended up making it backwards compatible for now. Will probably remove that eventually but for now it's not too bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants