Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe complexity bounds #24

Closed
vporton opened this issue Mar 8, 2023 · 1 comment
Closed

Describe complexity bounds #24

vporton opened this issue Mar 8, 2023 · 1 comment

Comments

@vporton
Copy link

vporton commented Mar 8, 2023

Please describe complexity bounds of your DB. Are reads/writes bounded? Are they logarithmic?

A few minutes ago I rejected Ozon which is another Rust port of BoltDB, because it multiplies capacity by two, so using inefficiently at least disk space. Do you share with Ozon this "feature"?

P.S. I need a bounded or logarithmic for both reads and writes key/value store.

@pjtatlow
Copy link
Owner

pjtatlow commented Mar 8, 2023

Okay, let me explain how it works a little so you can decide if it will work for you.

So jammdb, like BoltDB itself (and I believe lmdb too), uses a copy-on-write B+ tree to store the data in order to provide MVCC. That means whatever pages were modified during a write transaction will be copied and a lot of the data may be duplicated (2x) although certainly not the entire database unless you are updating everything in the bucket. The old pages are not reclaimed until there are no open transactions using them. That means if you open a read-only transaction and make a ton of writes before closing it, you will see a big increase in disk usage (♾️x). This is not a problem if you do not have long-running transactions though, although what is a long-running transaction will depend on your application.

Additionally, in order to minimize systems calls to grow the file, we also grow the file by a constant value, so there will be some disk space inefficiency there too. I don't think that grow value is configurable, though it wouldn't be too hard a change to make.

In terms of time complexity, you're looking at logarithmic time for both reads and writes, but I hope this explained why answering that for disk usage isn't so simple and depends on how you use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants