Skip to content

lakshya2524/new-db-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bitcask-Inspired Key-Value Store

A high-performance, log-structured key-value database engine implemented in Node.js, inspired by the Bitcask storage model. Features fast reads/writes, automatic compaction, and crash recovery.

Features

  • Log-Structured Storage: Append-only writes for maximum write performance
  • In-Memory Indexing: Hash table index for O(1) key lookups
  • Background Compaction: Automatic garbage collection to reclaim space
  • Crash Recovery: Automatic index rebuilding from log files on startup
  • Binary Serialization: Efficient binary format for storage
  • Batched Writes: Optimized batch operations for better throughput
  • Persistence: All data survives process restarts
  • Binary Data Support: Store any type of data (strings, JSON, buffers)

Architecture

bitcask-store/
├── data/
│   ├── active.log         ← current write log
│   └── older-*.log        ← compacted/archived logs
├── src/
│   ├── index.js           ← entry point & public API
│   ├── storage.js         ← append-only log handling
│   ├── indexer.js         ← in-memory index map
│   ├── compactor.js       ← background compaction logic
│   └── utils.js           ← binary serialization helpers
├── test/
│   ├── test.js            ← comprehensive test suite
│   └── benchmark.js       ← performance benchmarks
├── examples/
│   └── demo.js            ← feature demonstration
└── package.json

Installation

git clone <repository-url>
cd bit-cask
npm install

Quick Start

const BitcaskStore = require('./src/index');

async function example() {
    const store = new BitcaskStore({
        dataDir: './my-data',
        maxLogSize: 64 * 1024 * 1024, // 64MB log files
        autoCompaction: true
    });
    
    await store.open();
    
    // Basic operations
    await store.put('user:1', JSON.stringify({ name: 'Alice', age: 30 }));
    const user = JSON.parse((await store.get('user:1')).toString());
    console.log(user); // { name: 'Alice', age: 30 }
    
    // Check existence
    console.log(store.has('user:1')); // true
    
    // Delete
    await store.delete('user:1');
    console.log(await store.get('user:1')); // null
    
    await store.close();
}

API Reference

Constructor Options

const store = new BitcaskStore({
    dataDir: './data',                    // Data directory path
    maxLogSize: 64 * 1024 * 1024,        // Max log file size (64MB)
    autoCompaction: true,                 // Enable background compaction
    compactionOptions: {
        fragmentationThreshold: 0.3,      // Compact when 30% fragmented
        compactionInterval: 5 * 60 * 1000, // Check every 5 minutes
        minLogAge: 60 * 1000              // Don't compact files newer than 1 minute
    }
});

Core Methods

await store.open()

Opens the store and rebuilds the index from existing log files.

await store.close()

Closes the store, flushes pending writes, and stops background compaction.

await store.put(key, value)

Stores a key-value pair. Value can be string, Buffer, or any serializable data.

await store.get(key)

Retrieves a value by key. Returns Buffer or null if not found.

await store.delete(key)

Deletes a key. Returns boolean indicating if the key existed.

store.has(key)

Checks if a key exists (synchronous). Returns boolean.

store.keys()

Returns an array of all active keys (synchronous).

store.size()

Returns the number of active keys (synchronous).

Batch Operations

await store.batch([
    { op: 'put', key: 'key1', value: 'value1' },
    { op: 'put', key: 'key2', value: 'value2' },
    { op: 'delete', key: 'key3' }
]);

Iteration

// Iterate over all key-value pairs
for await (const [key, value] of store.entries()) {
    console.log(key, value.toString());
}

Maintenance

await store.compact()

Manually trigger compaction to reclaim space from deleted entries.

store.getStats()

Returns detailed statistics about the store:

{
    keys: 1000,                    // Active keys
    deletedKeys: 50,              // Deleted keys (fragmentation)
    totalSize: 1048576,           // Total data size in bytes
    memoryUsage: 204800,          // Index memory usage
    fragmentationRatio: 0.05,     // Fragmentation percentage
    compaction: {
        totalCompactions: 3,
        totalBytesReclaimed: 524288,
        lastCompactionTime: 1634567890123
    }
}

Storage Format

Each log entry uses a binary format for efficiency:

[timestamp(8)] [keySize(4)] [valueSize(4)] [key] [value]
  • timestamp: 8-byte big-endian timestamp
  • keySize: 4-byte big-endian key length
  • valueSize: 4-byte big-endian value length
  • key: UTF-8 encoded key
  • value: Raw value bytes

Deleted entries are stored as tombstones (zero-length values).

Performance Characteristics

  • Writes: O(1) - Always append to active log
  • Reads: O(1) - Hash table lookup + single disk read
  • Memory: O(n) where n = number of unique keys ever written
  • Compaction: Background process, doesn't block operations

Benchmark Results

Run benchmarks with: npm run benchmark

Typical performance on modern hardware:

  • Sequential Writes: ~50,000 ops/sec
  • Sequential Reads: ~100,000 ops/sec
  • Random Reads: ~80,000 ops/sec
  • Mixed Workload: ~30,000 ops/sec

Use Cases

Perfect for:

  • Caching layers - Fast reads with persistence
  • Session storage - High write throughput
  • Configuration management - Small datasets with frequent updates
  • Event logging - Append-heavy workloads
  • Embedded databases - Applications needing simple persistence

Not ideal for:

  • Range queries - No ordered iteration
  • Large datasets - Index must fit in memory
  • Complex queries - Simple key-value only
  • Multi-node setups - Single-node design

Testing

Run the comprehensive test suite:

npm test

Run performance benchmarks:

npm run benchmark

Run the interactive demo:

npm start

Implementation Details

Crash Recovery

On startup, the store scans all log files and rebuilds the in-memory index. This ensures data consistency after crashes or unexpected shutdowns.

Compaction Strategy

Background compaction runs periodically and:

  1. Identifies fragmented log files
  2. Rewrites active entries to new log files
  3. Updates index references
  4. Removes old log files

Write Optimization

  • Batched async writes reduce system call overhead
  • Write buffer accumulates entries before flushing
  • Periodic flush ensures durability

Memory Management

  • Index stores only metadata (file offset, size, timestamp)
  • Actual values are read from disk on demand
  • Memory usage scales with number of unique keys

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

ISC License - see LICENSE file for details.

Acknowledgments

Inspired by the Bitcask storage model developed by Basho Technologies for Riak. This implementation focuses on the core concepts while adding modern Node.js optimizations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published