-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Rocksdb Architecture Guide
# Introduction -- from leveldb -- rocksdb -- kv store -- fb engg team
# Assumptions and Goals -- good for fast storage (flash) -- good for random reads -- tradeoff between read ampl, write ampl -- range scans and point lookups -- production settting needs configurable param num levels, size of files, size of dats in a level, etc
# Disk Format .sst files for data .log files for trasactions manifest_file for database versions LOG* for server information logs
# Read write apis
- Puts -- batch puts -- disable wal -- async puts, fsync and fdatasync -- Iterators and snapshots -- batch commit to transaction log and manifest updates -- binary search for overlapping files for every level checksums for reads (default false) hardware assists bloom filters shared block cache ReadOnly mode
# Compactions -- multi-threaded -- thread pool per environment -- priority queues for merge sort -- user defined hook for implementing ttl, sanity checks, etc -- avoid compression for two levels, snappy, bzip, zlib
# Incremental Backups GetLiveFiles GetUpdatesSince -- wals are archived
# Environments posix (production) hdfs environment (prototype)
# Tools and Tests sst_dump manifest_dump compact database, change number of levels stress test
# java api
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- SST Partitioner
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Performance
- Projects Being Developed
- Misc