Skip to content

mikeal/fast-block-store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

fast-block-store

Storing data by a hash based key (content address) has some unique properties that allow for highly performance storage systems to be built.

  • The only transaction guarantee you need is per-key. Most databases go through a tremendous amount of effort to maintain a consistent transactional state across all the keys in a database.
  • The key already includes randomization, which means you already have a vector for sharding the data across many stores.
  • The value data never changes and most use cases only require an append-only (no deletes) storage engine.

This project is a proof-of-concept block store.

Write pipeline:

  • Block (Key/Value pair) write
    • Key is a byte value that must end in a hash digest of at least 4 bytes.
      • Note: we don't specific whether this is a CID, a multihash, or hash digest. This is left up to the consumer, since different use cases may have differing key requirements.
      • This means you should never store identity multihashes because they will not be properly balanced. But this shouldn't be a problem in practice, the value data is already in the key so there's no need to ever store an identity multihash (typically used for inline CID's) on-disc.
    • Value is bytes data that matches the hash used in the key
  • Value data is written to a rolling append-only file log.
    • The Index of this value is [ LogNumber, Position, Length ]
    • Since values never change are are rarely (if ever) deleted it's much faster to write them to an append-only file log.
    • Keeping potentialy large values out of the key-value store makes its own rebalancing and compaction much more efficient.
  • Key/Index is written to a sharded key-value store.
    • Any key-value store will work. We can run additional tests to find the ideal on-disc storage structure.
    • The number of shards is user configurable and can be any integer. Since we have randomization in the hash digest we can use it to shard over any number of individual key-value stores.
    • This means that any compaction or re-balancing that needs to happen in the key-value store is spread out over the shards and randomized and as long as you provide a sufficient number of shards for your workflow you're unlikely to have any of this impact your overall write speed.

Specific to this implementation:

  • The log/offset/length of each block is stored in a sharded key-value store keyed by CID.
    • We just use the last 4 bytes (Uint32) of the hash digest for every CID and shard over multiple independent on-disc key-value stores. So the shard id is just Floor( ( Uint32(DigestTail) / 0xFFFFFFFF ) * TotalShards )
    • This POC is sharding into 256 leveldb stores.
      • Parsing a CAR file of the Filecoin chain the initial write speed is ~1M block writes per minute (AWS ec2 instance writing to EBS).
      • Write speeds remained stable even after all the chain data had been loaded. In this test the speed of CAR file parsing was still playing a factor in overall performance so we haven't even saturated the store yet.
        • In order to load the CAR data faster writes were happening with a high degree of concurrency. Leveldb optimizes under this kind of load so serial writes to the store are likely much lower than this benchmark shows.

About

Fast append-only IPLD store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages