Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to NuDB? #36

Open
vinniefalco opened this issue Aug 24, 2016 · 6 comments
Open

Comparison to NuDB? #36

vinniefalco opened this issue Aug 24, 2016 · 6 comments

Comments

@vinniefalco
Copy link

vinniefalco commented Aug 24, 2016

I've been polishing up my NuDB key/value database library which we're using in rippled (https://github.com/ripple/rippled) so I'm looking at other key value systems for comparisons. We were very unhappy with the performance of RocksDB for our use case (which admittedly is rather unique). Interestingly, sparkey comes the closest in terms of implementation to NuDB. They both use an append-only data file, and a key file which implements an on-disk hash table. One difference is that NuDB uses linear hashing so that buckets reach the 50% occupancy. When a bucket gets full, it is "spilled" over into the data file (that bucket now becomes a linked list of buckets).

I'd be very interested in your feedback on NuDB's design in comparison to sparkey. I'd like to hear about problems you encountered with other key value stores and the data access pattern of your software. I'm curious to know how NuDB compares for your use case, or if there are any techniques in sparkey that I could adopt. The repo is here:
https://github.com/vinniefalco/NuDB

Thanks!

@spkrka
Copy link
Member

spkrka commented Sep 29, 2016

I read the README and made the following observations:

  • Fixed key size might make some things more convenient, but it seems annoying for the user, except for use cases where the key size is well known and there's not much variation. I am not sure what about the design requires the key size to be fixed since the Data record is already variable length.
  • Value size up to 4 GB is reasonable. Sparkey went for 64 bit support both for keys and values which turned out to be unnecessary for our usecases.
  • Performance independent of growth - Not sure I trust this one. Typically hash collisions increase as the data set grows (for the worst case entry, the average is fine).

It seems reasonable and a bit more sophisticated than Sparkey in some ways.

Problems encountered with other data stores:
We used to use CDB and Tokyo Cabinet which both had some problems.

  • CDB was really fast and compact, but had a 4 GB limit (we could have sharded it manually I suppose)
  • Tokyo Cabinet was fairly fast for reading while in RAM, but creating it was really slow once you went outside of RAM due to all the random access disk writes.

The main use case is for building daily/weekly snapshots of data for fast random reads. Data sizes are typically in the 50-100 GB range with ~100-500 million keys. (Fairly small keys and values). So build time needs to be fast and not too memory intensive (currently it's intensive when building the index, but not when appending). Random reads should be fast if it's in RAM (or in SSD in some cases), which is achieved by keeping the format compact and trying to minimizing IO reads. (It's typically one IO access in the index and then one in the data file).

I don't really have time to do a full review of NuDB but if you have any more concrete questions I can try my best to answer them.

@vinniefalco
Copy link
Author

You bring up a good point with respect to variable key sizes. Our use case, is where the key represents a cryptographic digest of the data. The database is used in a content-addressable storage system. A design goal of NuDB is to maximize the amount of useful work performed when reading a bucket from the key file. The more keys that can fit into a bucket, the more work we can perform for a single IOP.

If we allow the keys to be variable-length, then the key size has to be stored in the bucket entry, or else size of the read necessary to load the value is unknown - and that would require two reads to retrieve the value.

You said "Not sure I trust this one. Typically hash collisions increase as the data set grows (for the worst case entry, the average is fine)." And that's true. To combat this, we've tuned the size of the digest to 48 bits. Given the constraint on 2^32 total values, the odds of a collision in a full database are quite low.

Our distributed ledger is at almost 3TB now, and NuDB's been performing remarkably well.

Thanks for your insights!

@JoelKatz
Copy link

If by "hash collisions" you mean objects that go into the same bucket, the solution is just to increase the number of buckets as the number of entries increases such that the distribution of bucket sizes approaches a constant distribution as the number of entries increases. Thus the probability that a random read will encounter an spilled bucket performs a random walk around the selected value as the data set size increases, it does not uniformly increase.

@vinniefalco
Copy link
Author

I believe he means objects which hash to the same value. NuDB buckets hold many keys. If we want a database that supports 2^32 distinct items, you can expect ~35% of keys to have collisions using a 32-bit hash. Therefore, we use a 48-bit hash function which reduces the number of hash collisions to a tiny fraction.

@spkrka
Copy link
Member

spkrka commented Sep 30, 2016

Both hash value collisions and bucket collisions are bad. Sparkey started out with 32 bit hashing but that turned to be bad when there are many entries so we added 64 bit hashing as well - similar to how you're using 48 bit hashing.

But when you add many entries that also means that there will be some buckets that have a disproportionate high number of entries (and probably many buckets that are empty). I think you could see this if you plotted number of entries against number of entries in the biggest bucket (including spilling) - I would expect it to grow logarithmically.

@vinniefalco
Copy link
Author

Beast's buckets hold many entries, with a 32 byte key size and 50% loadFactor, each bucket holds 117 entries. For uniformly distributed random keys, the number of buckets having one spill record is always less than 1%. And the number of buckets having two spill records is 0 most of the time, sometimes 1.

Buckets in a NuDB key file are almost never empty since it uses linear hashing. Statistically speaking, the number of empty buckets will be roughly equal to the number of buckets with one or more spill records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants