Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lmdb store #10

Merged
merged 30 commits into from Oct 14, 2019
Merged

Add lmdb store #10

merged 30 commits into from Oct 14, 2019

Conversation

willemolding
Copy link
Contributor

@willemolding willemolding commented Oct 10, 2019

PR summary

This PR adds a new implementation of both CAS and EAV that uses LMDB. This is the main store used by the Monero cryptocurrency as well as being used my Mozilla in the firefox browser. As such it is well supported and has excellent Rust bindings currently maintained by Mozilla.

LMDB maintains a sorted order of keys which, with a clever key naming scheme, allows for a huge optimization in performing queries that match exactly on an entity (the most common type of query). Rather than iterating the entire database it is possible to jump to the beginning of EAVI entries for the entry and read until the next entry.

Benchmarks show comparable performance to PickleDB for CAS and EAV get_all retrievals but a significant improvement in get_exact queries.

test cas::lmdb::tests::bench_lmdb_cas_add         ... bench:      14,914 ns/iter (+/- 638)
test cas::lmdb::tests::bench_lmdb_cas_fetch       ... bench:      12,686 ns/iter (+/- 882)
test eav::lmdb::tests::bench_lmdb_eav_add         ... bench:      29,498 ns/iter (+/- 1,510)
test eav::lmdb::tests::bench_lmdb_eav_fetch_all   ... bench:   2,235,458 ns/iter (+/- 57,434)
test eav::lmdb::tests::bench_lmdb_eav_fetch_exact ... bench:       2,760 ns/iter (+/- 173)


test cas::pickle::tests::bench_pickle_cas_add         ... bench:      12,762 ns/iter (+/- 509)
test cas::pickle::tests::bench_pickle_cas_fetch       ... bench:      12,168 ns/iter (+/- 429)
test eav::pickle::tests::bench_pickle_eav_add         ... bench:      25,724 ns/iter (+/- 815)
test eav::pickle::tests::bench_pickle_eav_fetch_all   ... bench:   2,238,151 ns/iter (+/- 37,428)
test eav::pickle::tests::bench_pickle_eav_fetch_exact ... bench:     119,686 ns/iter (+/- 4,885)

Future questions

  • How should we decide the initial memory map size and when/how should this be increased?

Review checklist

  • The story has unit or integration tests
  • No new bugs, and any tech-debt is identified and justified
  • There is enough API documentation (how to use)
  • There is enough code documentation (how the code works)

Copy link
Contributor

@struktured struktured left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I was thinking we need to change the api to iterator style rather than in memory collections but one thing at a time.

@willemolding
Copy link
Contributor Author

Nice! I was thinking we need to change the api to iterator style rather than in memory collections but one thing at a time.

Yeah Ashanti and I talked about that and I had a bit of a crack but not as easy as I first thought. Agreed that should be next though.

Copy link
Member

@maackle maackle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, fundamentally. I made some minor comments about things to change, like the From<&[u8]>, and the DRYness, which would be nice to fix before. Also @AshantiMutinta didn't request changes, but she made some important points that should probably be addressed before merging.

crates/holochain_persistence_api/src/hash.rs Outdated Show resolved Hide resolved
crates/holochain_persistence_lmdb/src/cas/lmdb.rs Outdated Show resolved Hide resolved
Comment on lines 53 to 55
// Thes flags make writes waaaaay faster by async writing to disk rather than blocking
// There is some loss of data integrity guarantees that comes with this
.set_flags(EnvironmentFlags::WRITE_MAP | EnvironmentFlags::MAP_ASYNC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be OK though right? Since our stores are all monotonic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it opens up the possibility of corrupting the whole database by writing to the wrong places in memory since it doesn't verify before syncing to disk. But Rust should stop us from doing that right?

crates/holochain_persistence_lmdb/src/eav/lmdb.rs Outdated Show resolved Hide resolved
crates/holochain_persistence_lmdb/src/eav/lmdb.rs Outdated Show resolved Hide resolved
crates/holochain_persistence_lmdb/src/eav/lmdb.rs Outdated Show resolved Hide resolved
willemolding and others added 4 commits October 11, 2019 14:40
Co-Authored-By: Michael Dougherty <michael.dougherty@holo.host>
Co-Authored-By: Michael Dougherty <michael.dougherty@holo.host>
@thedavidmeister
Copy link
Collaborator

keen to see if this translates to something like indexed db in the future in the browser :)

@willemolding
Copy link
Contributor Author

keen to see if this translates to something like indexed db in the future in the browser :)

Yeah I saw your post in the mattermost. That could probably even better support EAV queries as I believe it has support for custom indexes as well as sorted keys.

@willemolding willemolding merged commit c010f7c into develop Oct 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants