Bitcask-Inspired Key-Value Store

A high-performance, log-structured key-value database engine implemented in Node.js, inspired by the Bitcask storage model. Features fast reads/writes, automatic compaction, and crash recovery.

Features

Log-Structured Storage: Append-only writes for maximum write performance
In-Memory Indexing: Hash table index for O(1) key lookups
Background Compaction: Automatic garbage collection to reclaim space
Crash Recovery: Automatic index rebuilding from log files on startup
Binary Serialization: Efficient binary format for storage
Batched Writes: Optimized batch operations for better throughput
Persistence: All data survives process restarts
Binary Data Support: Store any type of data (strings, JSON, buffers)

Architecture

bitcask-store/
├── data/
│   ├── active.log         ← current write log
│   └── older-*.log        ← compacted/archived logs
├── src/
│   ├── index.js           ← entry point & public API
│   ├── storage.js         ← append-only log handling
│   ├── indexer.js         ← in-memory index map
│   ├── compactor.js       ← background compaction logic
│   └── utils.js           ← binary serialization helpers
├── test/
│   ├── test.js            ← comprehensive test suite
│   └── benchmark.js       ← performance benchmarks
├── examples/
│   └── demo.js            ← feature demonstration
└── package.json

Installation

git clone <repository-url>
cd bit-cask
npm install

Quick Start

const BitcaskStore = require('./src/index');

async function example() {
    const store = new BitcaskStore({
        dataDir: './my-data',
        maxLogSize: 64 * 1024 * 1024, // 64MB log files
        autoCompaction: true
    });
    
    await store.open();
    
    // Basic operations
    await store.put('user:1', JSON.stringify({ name: 'Alice', age: 30 }));
    const user = JSON.parse((await store.get('user:1')).toString());
    console.log(user); // { name: 'Alice', age: 30 }
    
    // Check existence
    console.log(store.has('user:1')); // true
    
    // Delete
    await store.delete('user:1');
    console.log(await store.get('user:1')); // null
    
    await store.close();
}

API Reference

Constructor Options

const store = new BitcaskStore({
    dataDir: './data',                    // Data directory path
    maxLogSize: 64 * 1024 * 1024,        // Max log file size (64MB)
    autoCompaction: true,                 // Enable background compaction
    compactionOptions: {
        fragmentationThreshold: 0.3,      // Compact when 30% fragmented
        compactionInterval: 5 * 60 * 1000, // Check every 5 minutes
        minLogAge: 60 * 1000              // Don't compact files newer than 1 minute
    }
});

Core Methods

`await store.open()`

Opens the store and rebuilds the index from existing log files.

`await store.close()`

Closes the store, flushes pending writes, and stops background compaction.

`await store.put(key, value)`

Stores a key-value pair. Value can be string, Buffer, or any serializable data.

`await store.get(key)`

Retrieves a value by key. Returns Buffer or null if not found.

`await store.delete(key)`

Deletes a key. Returns boolean indicating if the key existed.

`store.has(key)`

Checks if a key exists (synchronous). Returns boolean.

`store.keys()`

Returns an array of all active keys (synchronous).

`store.size()`

Returns the number of active keys (synchronous).

Batch Operations

await store.batch([
    { op: 'put', key: 'key1', value: 'value1' },
    { op: 'put', key: 'key2', value: 'value2' },
    { op: 'delete', key: 'key3' }
]);

Iteration

// Iterate over all key-value pairs
for await (const [key, value] of store.entries()) {
    console.log(key, value.toString());
}

Maintenance

`await store.compact()`

Manually trigger compaction to reclaim space from deleted entries.

`store.getStats()`

Returns detailed statistics about the store:

{
    keys: 1000,                    // Active keys
    deletedKeys: 50,              // Deleted keys (fragmentation)
    totalSize: 1048576,           // Total data size in bytes
    memoryUsage: 204800,          // Index memory usage
    fragmentationRatio: 0.05,     // Fragmentation percentage
    compaction: {
        totalCompactions: 3,
        totalBytesReclaimed: 524288,
        lastCompactionTime: 1634567890123
    }
}

Storage Format

Each log entry uses a binary format for efficiency:

[timestamp(8)] [keySize(4)] [valueSize(4)] [key] [value]

timestamp: 8-byte big-endian timestamp
keySize: 4-byte big-endian key length
valueSize: 4-byte big-endian value length
key: UTF-8 encoded key
value: Raw value bytes

Deleted entries are stored as tombstones (zero-length values).

Performance Characteristics

Writes: O(1) - Always append to active log
Reads: O(1) - Hash table lookup + single disk read
Memory: O(n) where n = number of unique keys ever written
Compaction: Background process, doesn't block operations

Benchmark Results

Run benchmarks with: npm run benchmark

Typical performance on modern hardware:

Sequential Writes: ~50,000 ops/sec
Sequential Reads: ~100,000 ops/sec
Random Reads: ~80,000 ops/sec
Mixed Workload: ~30,000 ops/sec

Use Cases

Perfect for:

Caching layers - Fast reads with persistence
Session storage - High write throughput
Configuration management - Small datasets with frequent updates
Event logging - Append-heavy workloads
Embedded databases - Applications needing simple persistence

Not ideal for:

Range queries - No ordered iteration
Large datasets - Index must fit in memory
Complex queries - Simple key-value only
Multi-node setups - Single-node design

Testing

Run the comprehensive test suite:

npm test

Run performance benchmarks:

npm run benchmark

Run the interactive demo:

npm start

Implementation Details

Crash Recovery

On startup, the store scans all log files and rebuilds the in-memory index. This ensures data consistency after crashes or unexpected shutdowns.

Compaction Strategy

Background compaction runs periodically and:

Identifies fragmented log files
Rewrites active entries to new log files
Updates index references
Removes old log files

Write Optimization

Batched async writes reduce system call overhead
Write buffer accumulates entries before flushing
Periodic flush ensures durability

Memory Management

Index stores only metadata (file offset, size, timestamp)
Actual values are read from disk on demand
Memory usage scales with number of unique keys

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

ISC License - see LICENSE file for details.

Acknowledgments

Inspired by the Bitcask storage model developed by Basho Technologies for Riak. This implementation focuses on the core concepts while adding modern Node.js optimizations.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
debug-data		debug-data
demo-data		demo-data
examples		examples
quick-test		quick-test
src		src
test		test
README.md		README.md
package.json		package.json

lakshya2524/new-db-engine

Folders and files

Latest commit

History

Repository files navigation

Bitcask-Inspired Key-Value Store

Features

Architecture

Installation

Quick Start

API Reference

Constructor Options

Core Methods

await store.open()

await store.close()

await store.put(key, value)

await store.get(key)

await store.delete(key)

store.has(key)

store.keys()

store.size()

Batch Operations

Iteration

Maintenance

await store.compact()

store.getStats()

Storage Format

Performance Characteristics

Benchmark Results

Use Cases

Testing

Implementation Details

Crash Recovery

Compaction Strategy

Write Optimization

Memory Management

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages