Comparison to NuDB? #36

vinniefalco · 2016-08-24T21:46:38Z

I've been polishing up my NuDB key/value database library which we're using in rippled (https://github.com/ripple/rippled) so I'm looking at other key value systems for comparisons. We were very unhappy with the performance of RocksDB for our use case (which admittedly is rather unique). Interestingly, sparkey comes the closest in terms of implementation to NuDB. They both use an append-only data file, and a key file which implements an on-disk hash table. One difference is that NuDB uses linear hashing so that buckets reach the 50% occupancy. When a bucket gets full, it is "spilled" over into the data file (that bucket now becomes a linked list of buckets).

I'd be very interested in your feedback on NuDB's design in comparison to sparkey. I'd like to hear about problems you encountered with other key value stores and the data access pattern of your software. I'm curious to know how NuDB compares for your use case, or if there are any techniques in sparkey that I could adopt. The repo is here:
https://github.com/vinniefalco/NuDB

Thanks!

spkrka · 2016-09-29T14:56:19Z

I read the README and made the following observations:

Fixed key size might make some things more convenient, but it seems annoying for the user, except for use cases where the key size is well known and there's not much variation. I am not sure what about the design requires the key size to be fixed since the Data record is already variable length.
Value size up to 4 GB is reasonable. Sparkey went for 64 bit support both for keys and values which turned out to be unnecessary for our usecases.
Performance independent of growth - Not sure I trust this one. Typically hash collisions increase as the data set grows (for the worst case entry, the average is fine).

It seems reasonable and a bit more sophisticated than Sparkey in some ways.

Problems encountered with other data stores:
We used to use CDB and Tokyo Cabinet which both had some problems.

CDB was really fast and compact, but had a 4 GB limit (we could have sharded it manually I suppose)
Tokyo Cabinet was fairly fast for reading while in RAM, but creating it was really slow once you went outside of RAM due to all the random access disk writes.

The main use case is for building daily/weekly snapshots of data for fast random reads. Data sizes are typically in the 50-100 GB range with ~100-500 million keys. (Fairly small keys and values). So build time needs to be fast and not too memory intensive (currently it's intensive when building the index, but not when appending). Random reads should be fast if it's in RAM (or in SSD in some cases), which is achieved by keeping the format compact and trying to minimizing IO reads. (It's typically one IO access in the index and then one in the data file).

I don't really have time to do a full review of NuDB but if you have any more concrete questions I can try my best to answer them.

vinniefalco · 2016-09-29T17:15:12Z

You bring up a good point with respect to variable key sizes. Our use case, is where the key represents a cryptographic digest of the data. The database is used in a content-addressable storage system. A design goal of NuDB is to maximize the amount of useful work performed when reading a bucket from the key file. The more keys that can fit into a bucket, the more work we can perform for a single IOP.

If we allow the keys to be variable-length, then the key size has to be stored in the bucket entry, or else size of the read necessary to load the value is unknown - and that would require two reads to retrieve the value.

You said "Not sure I trust this one. Typically hash collisions increase as the data set grows (for the worst case entry, the average is fine)." And that's true. To combat this, we've tuned the size of the digest to 48 bits. Given the constraint on 2^32 total values, the odds of a collision in a full database are quite low.

Our distributed ledger is at almost 3TB now, and NuDB's been performing remarkably well.

Thanks for your insights!

JoelKatz · 2016-09-29T21:11:04Z

If by "hash collisions" you mean objects that go into the same bucket, the solution is just to increase the number of buckets as the number of entries increases such that the distribution of bucket sizes approaches a constant distribution as the number of entries increases. Thus the probability that a random read will encounter an spilled bucket performs a random walk around the selected value as the data set size increases, it does not uniformly increase.

vinniefalco · 2016-09-29T21:35:54Z

I believe he means objects which hash to the same value. NuDB buckets hold many keys. If we want a database that supports 2^32 distinct items, you can expect ~35% of keys to have collisions using a 32-bit hash. Therefore, we use a 48-bit hash function which reduces the number of hash collisions to a tiny fraction.

spkrka · 2016-09-30T08:30:47Z

Both hash value collisions and bucket collisions are bad. Sparkey started out with 32 bit hashing but that turned to be bad when there are many entries so we added 64 bit hashing as well - similar to how you're using 48 bit hashing.

But when you add many entries that also means that there will be some buckets that have a disproportionate high number of entries (and probably many buckets that are empty). I think you could see this if you plotted number of entries against number of entries in the biggest bucket (including spilling) - I would expect it to grow logarithmically.

vinniefalco · 2016-09-30T12:19:07Z

Beast's buckets hold many entries, with a 32 byte key size and 50% loadFactor, each bucket holds 117 entries. For uniformly distributed random keys, the number of buckets having one spill record is always less than 1%. And the number of buckets having two spill records is 0 most of the time, sometimes 1.

Buckets in a NuDB key file are almost never empty since it uses linear hashing. Statistically speaking, the number of empty buckets will be roughly equal to the number of buckets with one or more spill records.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison to NuDB? #36

Comparison to NuDB? #36

vinniefalco commented Aug 24, 2016 •

edited

spkrka commented Sep 29, 2016

vinniefalco commented Sep 29, 2016

JoelKatz commented Sep 29, 2016

vinniefalco commented Sep 29, 2016

spkrka commented Sep 30, 2016

vinniefalco commented Sep 30, 2016

Comparison to NuDB? #36

Comparison to NuDB? #36

Comments

vinniefalco commented Aug 24, 2016 • edited

spkrka commented Sep 29, 2016

vinniefalco commented Sep 29, 2016

JoelKatz commented Sep 29, 2016

vinniefalco commented Sep 29, 2016

spkrka commented Sep 30, 2016

vinniefalco commented Sep 30, 2016

vinniefalco commented Aug 24, 2016 •

edited