A high-performance, log-structured key-value database engine implemented in Node.js, inspired by the Bitcask storage model. Features fast reads/writes, automatic compaction, and crash recovery.
- Log-Structured Storage: Append-only writes for maximum write performance
- In-Memory Indexing: Hash table index for O(1) key lookups
- Background Compaction: Automatic garbage collection to reclaim space
- Crash Recovery: Automatic index rebuilding from log files on startup
- Binary Serialization: Efficient binary format for storage
- Batched Writes: Optimized batch operations for better throughput
- Persistence: All data survives process restarts
- Binary Data Support: Store any type of data (strings, JSON, buffers)
bitcask-store/
├── data/
│ ├── active.log ← current write log
│ └── older-*.log ← compacted/archived logs
├── src/
│ ├── index.js ← entry point & public API
│ ├── storage.js ← append-only log handling
│ ├── indexer.js ← in-memory index map
│ ├── compactor.js ← background compaction logic
│ └── utils.js ← binary serialization helpers
├── test/
│ ├── test.js ← comprehensive test suite
│ └── benchmark.js ← performance benchmarks
├── examples/
│ └── demo.js ← feature demonstration
└── package.json
git clone <repository-url>
cd bit-cask
npm installconst BitcaskStore = require('./src/index');
async function example() {
const store = new BitcaskStore({
dataDir: './my-data',
maxLogSize: 64 * 1024 * 1024, // 64MB log files
autoCompaction: true
});
await store.open();
// Basic operations
await store.put('user:1', JSON.stringify({ name: 'Alice', age: 30 }));
const user = JSON.parse((await store.get('user:1')).toString());
console.log(user); // { name: 'Alice', age: 30 }
// Check existence
console.log(store.has('user:1')); // true
// Delete
await store.delete('user:1');
console.log(await store.get('user:1')); // null
await store.close();
}const store = new BitcaskStore({
dataDir: './data', // Data directory path
maxLogSize: 64 * 1024 * 1024, // Max log file size (64MB)
autoCompaction: true, // Enable background compaction
compactionOptions: {
fragmentationThreshold: 0.3, // Compact when 30% fragmented
compactionInterval: 5 * 60 * 1000, // Check every 5 minutes
minLogAge: 60 * 1000 // Don't compact files newer than 1 minute
}
});Opens the store and rebuilds the index from existing log files.
Closes the store, flushes pending writes, and stops background compaction.
Stores a key-value pair. Value can be string, Buffer, or any serializable data.
Retrieves a value by key. Returns Buffer or null if not found.
Deletes a key. Returns boolean indicating if the key existed.
Checks if a key exists (synchronous). Returns boolean.
Returns an array of all active keys (synchronous).
Returns the number of active keys (synchronous).
await store.batch([
{ op: 'put', key: 'key1', value: 'value1' },
{ op: 'put', key: 'key2', value: 'value2' },
{ op: 'delete', key: 'key3' }
]);// Iterate over all key-value pairs
for await (const [key, value] of store.entries()) {
console.log(key, value.toString());
}Manually trigger compaction to reclaim space from deleted entries.
Returns detailed statistics about the store:
{
keys: 1000, // Active keys
deletedKeys: 50, // Deleted keys (fragmentation)
totalSize: 1048576, // Total data size in bytes
memoryUsage: 204800, // Index memory usage
fragmentationRatio: 0.05, // Fragmentation percentage
compaction: {
totalCompactions: 3,
totalBytesReclaimed: 524288,
lastCompactionTime: 1634567890123
}
}Each log entry uses a binary format for efficiency:
[timestamp(8)] [keySize(4)] [valueSize(4)] [key] [value]
- timestamp: 8-byte big-endian timestamp
- keySize: 4-byte big-endian key length
- valueSize: 4-byte big-endian value length
- key: UTF-8 encoded key
- value: Raw value bytes
Deleted entries are stored as tombstones (zero-length values).
- Writes: O(1) - Always append to active log
- Reads: O(1) - Hash table lookup + single disk read
- Memory: O(n) where n = number of unique keys ever written
- Compaction: Background process, doesn't block operations
Run benchmarks with: npm run benchmark
Typical performance on modern hardware:
- Sequential Writes: ~50,000 ops/sec
- Sequential Reads: ~100,000 ops/sec
- Random Reads: ~80,000 ops/sec
- Mixed Workload: ~30,000 ops/sec
Perfect for:
- Caching layers - Fast reads with persistence
- Session storage - High write throughput
- Configuration management - Small datasets with frequent updates
- Event logging - Append-heavy workloads
- Embedded databases - Applications needing simple persistence
Not ideal for:
- Range queries - No ordered iteration
- Large datasets - Index must fit in memory
- Complex queries - Simple key-value only
- Multi-node setups - Single-node design
Run the comprehensive test suite:
npm testRun performance benchmarks:
npm run benchmarkRun the interactive demo:
npm startOn startup, the store scans all log files and rebuilds the in-memory index. This ensures data consistency after crashes or unexpected shutdowns.
Background compaction runs periodically and:
- Identifies fragmented log files
- Rewrites active entries to new log files
- Updates index references
- Removes old log files
- Batched async writes reduce system call overhead
- Write buffer accumulates entries before flushing
- Periodic flush ensures durability
- Index stores only metadata (file offset, size, timestamp)
- Actual values are read from disk on demand
- Memory usage scales with number of unique keys
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
ISC License - see LICENSE file for details.
Inspired by the Bitcask storage model developed by Basho Technologies for Riak. This implementation focuses on the core concepts while adding modern Node.js optimizations.