Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Replace DbLedger with simple blob store #2566

Closed
@MarkJr94

Description

@MarkJr94

Problem

DbLedger currently requires rocksdb, while we use very little of what it has to offer. RocksDb types are also This also has significant build costs and is a large non-Rust dependency.

Solution

Re-implement file-based ledger storage, with a focus on writing as fast as possible, as well as providing more specifically useful API and other characteristics.

Goals

  • Remove rocksdb dependency
  • Store ledger data reliably
  • Exceed current DbLedger performance

Design

There are two primary components: BlobStore and KVStore

BlobStore

Essentially a struct that should behave as close to DbLedger as possible so that replacement is easy. It relies on the KVStore for all actual storage

KVStore

Key-value persistence layer that maps keys to a value.
Value is just a type alias for clarity: type Value = Vec<u8>. All operations must be ACID transactions to whatever degree possible.

Operations:

  • put (key: Key, value: Value) -> Result<()>
  • get (key: Key) -> Result<Option<Value>>
  • delete (key: Key) -> Result<bool>
    • return value indicates if there was something to delete
  • put_many (rows: [(K, V)]) -> Result<()>
  • get_many (keys: [Key]) -> Result<Vec<Value>>
  • delete_many (keys: [Key]) -> Result<()>
  • scan(range: RangeInclusive<Key>) -> Result<impl Iterator<Item=(Key, Value)>>

The storage strategy is essentially a simple log-structured merge tree (LSMT) approach. There is an in-memory table and series of on-disk SSTables [1]. When storing a record, it is first written to a write-log, then added to the in-memory table.

When this table fills up, it is then dumped to disk in two files. 1 file simple contains the (Key, Value) pairs. The second file is the index file, which contains some metadata, and a list of (Key, Loc) pairs where Loc represents to location and extent of the record in the data file. These files combined form a logical SSTable. When the in-memory table is dumped, it creates an SSTable of level 0. As level 0 fills up, the compaction process begins.

The compaction strategy is known as "levelled compaction" and is mostly a simplified version of what's described by ScyllaDB and LevelDB [2][3].

Implementation

Performance is very critical for this feature. I have rudimentary benchmarks that indicate same-or-better perf when compared to the current DbLedger but not very rigorous or well-documented ones. I will create them.

Questions

  • Should most of this comment (probably everything above this) be expanded on and moved to something like a proposal in the book?
  • Should this be a new crate in the workspace?
    • Currently everything is mostly a new module in the solana crate. This seems like it could be a separate crate. That would make boundaries clear and might make builds, test, and benchmarking simpler (and possibly faster).

In-flight

  • principled indexing and storage using simple LSMTree approach
    • insert
    • flushing in-memory table
    • merging on disk tables
    • read
      • simple read
      • merge read
    • range scan
    • recovery
      • in-memory write buffer
      • SSTable indexes
    • integrate
  • Change naming to be more inline with recent developments (block/slot/etc)
  • Testing
    • basic integration tests
    • utility functions for testing
    • Isolate and test unit-testable components or find some way to do I/O free testing
  • Benchmarks
    • basic benchmarks
  • Documentation
    • document all modules, types, and methods exposed to the rest of the crate
    • document important internal modules, types, and methods
  1. https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
  2. https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies#leveled-compaction
  3. https://github.com/google/leveldb/blob/master/doc/impl.md

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions