"Reinventing the wheel to increase the gyrus sulcus" π§
A production-quality relational database management system built from scratch in Python for deep learning of database internals. This project is designed to be used by others while serving as a comprehensive educational resource.
- Overview
- System Architecture
- Component Design
- Implementation Roadmap
- Testing Strategy
- Development Guidelines
ModakDB is a full-featured RDBMS that implements core database concepts from first principles. While educational, it's architectured for real-world usage with proper error handling, testing, and performance considerations.
- β Storage Layer: Page-based heap file storage with efficient space management
- β SQL Parser: Full SQL DML/DDL support (SELECT, INSERT, UPDATE, DELETE, CREATE, DROP)
- β Query Execution: Volcano-style iterator model with push-based execution
- β Indexing: B+ tree indexes for fast lookups and range queries
- β Transactions: ACID compliance with MVCC (Multi-Version Concurrency Control)
- β Buffer Pool: LRU-based page caching for performance
- β Recovery: Write-Ahead Logging (WAL) for crash recovery
- β Concurrency: Two-phase locking with deadlock detection
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Application β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β SQL Interface Layer β
β β’ Connection Management β’ Session State β’ Result Sets β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Parser & Planner β
β β’ Lexer & Parser β’ AST β’ Query Optimization β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Query Executor β
β β’ Scan β’ Filter β’ Join β’ Aggregate β’ Sort β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Transaction & Concurrency Layer β
β β’ Transaction Manager β’ Lock Manager β’ MVCC β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Buffer Pool Manager β
β β’ Page Cache (LRU) β’ Pin/Unpin β’ Dirty Page Tracking β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Storage Engine β
β β’ Heap Files β’ Page Layout β’ Slotted Pages β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Disk Manager β
β β’ File I/O β’ Page Allocation β’ Free Space Tracking β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. SQL String βββΆ Parser βββΆ AST
2. AST βββΆ Planner βββΆ Query Plan (Scan β Filter)
3. Executor starts iteration:
a. Scan operator requests page from Buffer Pool
b. Buffer Pool checks cache, if miss, loads from Storage
c. Storage reads page from Heap File
d. Filter operator evaluates predicate on each tuple
e. Matching tuples returned to client
βββββββββββββββββββββββββββββββββββββββββββ
β Database File (.mdb) β
βββββββββββββββββββββββββββββββββββββββββββ€
β Page 0: Header Page β
β - DB Metadata β
β - Free Page List β
βββββββββββββββββββββββββββββββββββββββββββ€
β Page 1-N: Data Pages β
β - Slotted Page Layout β
β - Page Header (24 bytes) β
β - Slot Array (grows down) β
β - Free Space β
β - Tuples (grow up) β
βββββββββββββββββββββββββββββββββββββββββββ
PAGE_SIZE = 4096
Page Header (24 bytes):
- page_id: 4 bytes (uint32)
- lsn: 8 bytes (uint64) - Log Sequence Number
- page_type: 1 byte (heap/index/meta)
- num_slots: 2 bytes (uint16)
- free_space_offset: 2 bytes (uint16)
- free_space_size: 2 bytes (uint16)
- next_page_id: 4 bytes (uint32)
- checksum: 4 bytes (uint32)
Slot Array (variable):
Each slot: (offset: 2 bytes, length: 2 bytes)
Tuples (variable):
- Grow from end of page upward
- Variable length recordsPage: Represents a single page in memoryHeapFile: Manages collection of pages for a tableSlottedPage: Implements slotted page layoutDiskManager: Handles file I/O operations
Catalog Structure:
Database
ββ Tables (dict)
β ββ TableSchema
β β ββ table_name: str
β β ββ columns: List[Column]
β β ββ indexes: List[Index]
β β ββ heap_file_id: int
β ββ Column
β ββ name: str
β ββ type: ColumnType
β ββ nullable: bool
β ββ primary_key: bool
β ββ default: Any
ββ Indexes (dict)class ColumnType(Enum):
INT = "INT" # 4 bytes
BIGINT = "BIGINT" # 8 bytes
FLOAT = "FLOAT" # 4 bytes
DOUBLE = "DOUBLE" # 8 bytes
VARCHAR = "VARCHAR" # Variable (max length specified)
CHAR = "CHAR" # Fixed length
BOOLEAN = "BOOLEAN" # 1 byte
DATE = "DATE" # 4 bytes (days since epoch)
TIMESTAMP = "TIMESTAMP" # 8 bytes (microseconds)
BLOB = "BLOB" # Variable length binarySupported SQL Statements:
- CREATE TABLE table_name (columns...)
- DROP TABLE table_name
- INSERT INTO table_name VALUES (...)
- SELECT columns FROM table WHERE condition
- UPDATE table SET col=val WHERE condition
- DELETE FROM table WHERE condition
- CREATE INDEX index_name ON table(column)
Expression Support:
- Arithmetic: +, -, *, /, %
- Comparison: =, !=, <, >, <=, >=
- Logical: AND, OR, NOT
- Functions: COUNT, SUM, AVG, MIN, MAX
- Aggregations with GROUP BY and HAVING
Abstract Syntax Tree Nodes:
- Statement (base class)
ββ SelectStatement
β ββ columns: List[Expression]
β ββ from_table: str
β ββ where: Expression
β ββ group_by: List[Expression]
β ββ having: Expression
β ββ order_by: List[(Expression, ASC/DESC)]
ββ InsertStatement
ββ UpdateStatement
ββ DeleteStatement
ββ DDLStatement (CREATE/DROP)class Operator(ABC):
@abstractmethod
def open(self): ... # Initialize operator
@abstractmethod
def next(self) -> Tuple: # Get next tuple (iterator protocol)
@abstractmethod
def close(self): ... # Clean up resources
Operator Tree Example:
ProjectionOp (name, age)
β
FilterOp (age > 25)
β
SeqScanOp (users table)Physical Operators:
β’ SeqScan: Sequential table scan
β’ IndexScan: Index-based lookup
β’ Filter: Predicate evaluation
β’ Project: Column projection
β’ NestedLoopJoin: Nested loop join
β’ HashJoin: Hash-based join
β’ Sort: External merge sort
β’ Aggregate: Grouping and aggregation
β’ Limit: Result limitingBuffer Pool:
- Fixed size array of page frames
- Hash table: page_id β frame_id
- LRU replacement policy (doubly-linked list)
- Pin counter for each frame
- Dirty bit tracking
Operations:
β’ FetchPage(page_id) β Page
β’ UnpinPage(page_id, is_dirty)
β’ FlushPage(page_id)
β’ FlushAllPages()
β’ NewPage() β Page
Eviction Policy:
1. Find unpinned page with pin_count = 0
2. If dirty, write to disk
3. Remove from buffer pool
4. Load new pageAtomicity:
- Undo logging for rollback
- Transaction abort on errors
Consistency:
- Constraint checking
- Referential integrity
Isolation:
- MVCC (Multi-Version Concurrency Control)
- Snapshot isolation level
Durability:
- Write-Ahead Logging (WAL)
- Checkpoint mechanism
Tuple Structure with Versioning:
Tuple:
- data: bytes
- xmin: Transaction ID (creator)
- xmax: Transaction ID (deleter, NULL if active)
- is_deleted: bool
Visibility Rules:
A tuple is visible to transaction T if:
1. xmin < T.start_timestamp AND
2. (xmax IS NULL OR xmax > T.start_timestamp)
Transaction States:
ACTIVE β COMMITTED
β ABORTEDB+ Tree Properties:
- Order M (max children per node)
- All records in leaf nodes
- Leaves linked for range queries
- Search time: O(log n)
Node Layout:
Internal Node:
- Keys: [K1, K2, ..., Kn]
- Children: [P0, P1, ..., Pn]
- P0 < K1 β€ P1 < K2 β€ ... < Kn β€ Pn
Leaf Node:
- Keys: [K1, K2, ..., Kn]
- Records: [R1, R2, ..., Rn]
- Next pointer: Link to sibling
Log Record Types:
- BEGIN: Transaction start
- COMMIT: Transaction commit
- ABORT: Transaction rollback
- UPDATE: Tuple modification (before/after)
- INSERT: Tuple insertion
- DELETE: Tuple deletion
- CHECKPOINT: System checkpoint
Recovery Algorithm (ARIES):
1. Analysis Pass: Scan log to find dirty pages
2. Redo Pass: Replay all operations (idempotent)
3. Undo Pass: Rollback uncommitted transactionsβ‘ Storage Layer
β‘ Page structure and layout
β‘ Disk manager (file I/O)
β‘ Heap file implementation
β‘ Slotted page management
β‘ Catalog Layer
β‘ Data type definitions
β‘ Table schema management
β‘ Column definitions
β‘ Catalog persistence
β‘ Parser Layer
β‘ SQL grammar definition (Lark)
β‘ Lexer and parser
β‘ AST construction
β‘ Statement validation
β‘ Executor Layer
β‘ Operator interface
β‘ Sequential scan
β‘ Filter operator
β‘ Projection operator
β‘ Simple DML execution
β‘ Buffer Pool Manager
β‘ Page frame allocation
β‘ LRU replacement policy
β‘ Pin/unpin mechanism
β‘ Dirty page tracking
β‘ Flush policies
β‘ B+ Tree Index
β‘ Node structure
β‘ Insert operation
β‘ Search operation
β‘ Delete operation
β‘ Range scan
β‘ Index scan operator
β‘ Transaction Management
β‘ Transaction context
β‘ Begin/Commit/Abort
β‘ Undo logging
β‘ MVCC implementation
β‘ Concurrency Control
β‘ Lock manager
β‘ Two-phase locking
β‘ Deadlock detection
β‘ Write-Ahead Logging
β‘ Log manager
β‘ Log record formats
β‘ Log buffering
β‘ Checkpoint mechanism
β‘ Recovery
β‘ ARIES algorithm
β‘ Crash recovery
β‘ Transaction rollback
β‘ Query Optimization
β‘ Cost model
β‘ Join ordering
β‘ Index selection
β‘ Advanced Operators
β‘ Hash join
β‘ Sort-merge join
β‘ Aggregation
β‘ GROUP BY / HAVING
β‘ Performance
β‘ Query caching
β‘ Statistics collection
β‘ Profiling tools
tests/unit/
βββ test_storage.py
β βββ TestPage
β βββ TestSlottedPage
β βββ TestHeapFile
β βββ TestDiskManager
β
βββ test_catalog.py
β βββ TestColumnType
β βββ TestColumn
β βββ TestTableSchema
β βββ TestCatalog
β
βββ test_parser.py
β βββ TestLexer
β βββ TestParser
β βββ TestSelectStatement
β βββ TestDDLStatements
β
βββ test_executor.py
β βββ TestSeqScan
β βββ TestFilter
β βββ TestProjection
β βββ TestJoinOperators
β
βββ test_buffer.py
β βββ TestBufferPool
β βββ TestLRUReplacer
β βββ TestPageEviction
β
βββ test_transaction.py
βββ TestTransactionManager
βββ TestMVCC
βββ TestLockManager
tests/integration/
βββ test_end_to_end.py
β βββ test_create_table_and_insert
β βββ test_query_execution
β βββ test_updates_and_deletes
β βββ test_complex_queries
β
βββ test_transactions.py
β βββ test_acid_properties
β βββ test_concurrent_transactions
β βββ test_deadlock_detection
β βββ test_rollback
β
βββ test_recovery.py
β βββ test_crash_recovery
β βββ test_wal_replay
β βββ test_checkpoint
β
βββ test_performance.py
βββ benchmark_insert
βββ benchmark_scan
βββ benchmark_index_lookup
- Unit Tests: 90%+ coverage
- Integration Tests: All major workflows
- Performance Tests: Regression detection
- Edge Cases: Null values, boundary conditions, concurrent access
modakdb/
βββ __init__.py # Package initialization
βββ api/ # Public API
β βββ __init__.py
β βββ database.py # Database class (main entry)
β βββ exceptions.py # Custom exceptions
βββ storage/ # Storage layer
β βββ __init__.py
β βββ disk_manager.py # File I/O operations
β βββ page.py # Page structure
β βββ heap_file.py # Heap file management
β βββ slotted_page.py # Slotted page layout
βββ catalog/ # Schema management
β βββ __init__.py
β βββ types.py # Data types
β βββ schema.py # Table schemas
β βββ catalog.py # System catalog
βββ parser/ # SQL parsing
β βββ __init__.py
β βββ grammar.lark # SQL grammar
β βββ parser.py # Parser implementation
β βββ ast_nodes.py # AST definitions
βββ planner/ # Query planning
β βββ __init__.py
β βββ planner.py # Query planner
β βββ optimizer.py # Query optimizer
βββ executor/ # Query execution
β βββ __init__.py
β βββ operator.py # Base operator class
β βββ scan.py # Scan operators
β βββ filter.py # Filter operator
β βββ projection.py # Projection operator
β βββ join.py # Join operators
β βββ aggregate.py # Aggregation
β βββ sort.py # Sort operator
βββ buffer/ # Buffer pool
β βββ __init__.py
β βββ buffer_pool.py # Buffer pool manager
β βββ replacer.py # LRU replacer
βββ transaction/ # Transactions
β βββ __init__.py
β βββ transaction.py # Transaction context
β βββ lock_manager.py # Lock management
β βββ mvcc.py # MVCC implementation
βββ recovery/ # Recovery
β βββ __init__.py
β βββ log_manager.py # WAL manager
β βββ log_record.py # Log records
β βββ recovery.py # Recovery algorithm
βββ index/ # Indexing
βββ __init__.py
βββ index.py # Base index interface
βββ btree.py # B+ tree implementation
# Use type hints
def insert_tuple(self, table_name: str, values: List[Any]) -> int:
"""Insert a tuple into a table.
Args:
table_name: Name of the table
values: List of column values
Returns:
Row ID of inserted tuple
Raises:
TableNotFoundError: If table doesn't exist
"""
pass
# Use dataclasses for structured data
@dataclass
class Page:
page_id: int
data: bytearray
is_dirty: bool = False
pin_count: int = 0
# Use enums for constants
class PageType(Enum):
HEAP_PAGE = 1
INDEX_PAGE = 2
META_PAGE = 3# Custom exception hierarchy
class ModakDBError(Exception):
"""Base exception for all ModakDB errors"""
pass
class StorageError(ModakDBError):
"""Storage layer errors"""
pass
class ParseError(ModakDBError):
"""SQL parsing errors"""
pass
class ExecutionError(ModakDBError):
"""Query execution errors"""
passfrom modakdb import Database
# Create database
db = Database("mydb.mdb")
# Create table
db.execute("""
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
age INT,
email VARCHAR(255)
)
""")
# Insert data
db.execute("INSERT INTO users VALUES (1, 'Alice', 30, 'alice@example.com')")
db.execute("INSERT INTO users VALUES (2, 'Bob', 25, 'bob@example.com')")
# Query data
result = db.execute("SELECT name, age FROM users WHERE age > 25")
for row in result:
print(f"{row['name']}: {row['age']}")
# Transactions
with db.transaction() as txn:
txn.execute("UPDATE users SET age = 31 WHERE name = 'Alice'")
txn.execute("DELETE FROM users WHERE name = 'Bob'")
# Automatically commits if no exception
db.close()- Database System Concepts - Silberschatz, Korth, Sudarshan
- Database Management Systems - Ramakrishnan, Gehrke
- Architecture of a Database System - Hellerstein, Stonebraker, Hamilton
- The Internals of PostgreSQL - Hironobu SUZUKI (online)
- ARIES Recovery Algorithm (Mohan et al.)
- B+ tree Implementations (Comer)
- MVCC in PostgreSQL (PostgreSQL Documentation)
# Clone repository
git clone https://github.com/rohitwtbs/modakdb.git
cd modakdb
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest --cov=modakdb --cov-report=html tests/
# Type checking
mypy modakdb/
# Linting
ruff check modakdb/MIT License - See LICENSE for details
This is an educational project, but contributions are welcome! Please:
- Write tests for new features
- Follow the existing code style
- Update documentation
- Add type hints
Happy Database Building! π
"The best way to understand databases is to build one."
MIT License - See LICENSE file