Skip to content

rohitwtbs/modakdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ModakDB - Educational RDBMS in Python

"Reinventing the wheel to increase the gyrus sulcus" 🧠

A production-quality relational database management system built from scratch in Python for deep learning of database internals. This project is designed to be used by others while serving as a comprehensive educational resource.


πŸ“‹ Table of Contents

  1. Overview
  2. System Architecture
  3. Component Design
  4. Implementation Roadmap
  5. Testing Strategy
  6. Development Guidelines

🎯 Overview

ModakDB is a full-featured RDBMS that implements core database concepts from first principles. While educational, it's architectured for real-world usage with proper error handling, testing, and performance considerations.

Core Features

  • βœ… Storage Layer: Page-based heap file storage with efficient space management
  • βœ… SQL Parser: Full SQL DML/DDL support (SELECT, INSERT, UPDATE, DELETE, CREATE, DROP)
  • βœ… Query Execution: Volcano-style iterator model with push-based execution
  • βœ… Indexing: B+ tree indexes for fast lookups and range queries
  • βœ… Transactions: ACID compliance with MVCC (Multi-Version Concurrency Control)
  • βœ… Buffer Pool: LRU-based page caching for performance
  • βœ… Recovery: Write-Ahead Logging (WAL) for crash recovery
  • βœ… Concurrency: Two-phase locking with deadlock detection

πŸ—οΈ System Architecture

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Client Application                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   SQL Interface Layer                       β”‚
β”‚  β€’ Connection Management  β€’ Session State  β€’ Result Sets   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Parser & Planner                         β”‚
β”‚  β€’ Lexer & Parser  β€’ AST  β€’ Query Optimization             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Query Executor                            β”‚
β”‚  β€’ Scan  β€’ Filter  β€’ Join  β€’ Aggregate  β€’ Sort             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Transaction & Concurrency Layer                β”‚
β”‚  β€’ Transaction Manager  β€’ Lock Manager  β€’ MVCC             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Buffer Pool Manager                        β”‚
β”‚  β€’ Page Cache (LRU)  β€’ Pin/Unpin  β€’ Dirty Page Tracking   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Storage Engine                            β”‚
β”‚  β€’ Heap Files  β€’ Page Layout  β€’ Slotted Pages              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Disk Manager                             β”‚
β”‚  β€’ File I/O  β€’ Page Allocation  β€’ Free Space Tracking      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow Example: SELECT * FROM users WHERE age > 25

1. SQL String ──▢ Parser ──▢ AST
2. AST ──▢ Planner ──▢ Query Plan (Scan β†’ Filter)
3. Executor starts iteration:
   a. Scan operator requests page from Buffer Pool
   b. Buffer Pool checks cache, if miss, loads from Storage
   c. Storage reads page from Heap File
   d. Filter operator evaluates predicate on each tuple
   e. Matching tuples returned to client

πŸ”§ Component Design

1. Storage Layer (modakdb/storage/)

Heap File Structure

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Database File (.mdb)           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Page 0: Header Page                     β”‚
β”‚    - DB Metadata                          β”‚
β”‚    - Free Page List                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Page 1-N: Data Pages                    β”‚
β”‚    - Slotted Page Layout                 β”‚
β”‚    - Page Header (24 bytes)              β”‚
β”‚    - Slot Array (grows down)             β”‚
β”‚    - Free Space                           β”‚
β”‚    - Tuples (grow up)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Page Layout (4096 bytes)

PAGE_SIZE = 4096

Page Header (24 bytes):
  - page_id: 4 bytes (uint32)
  - lsn: 8 bytes (uint64) - Log Sequence Number
  - page_type: 1 byte (heap/index/meta)
  - num_slots: 2 bytes (uint16)
  - free_space_offset: 2 bytes (uint16)
  - free_space_size: 2 bytes (uint16)
  - next_page_id: 4 bytes (uint32)
  - checksum: 4 bytes (uint32)

Slot Array (variable):
  Each slot: (offset: 2 bytes, length: 2 bytes)
  
Tuples (variable):
  - Grow from end of page upward
  - Variable length records

Key Classes

  • Page: Represents a single page in memory
  • HeapFile: Manages collection of pages for a table
  • SlottedPage: Implements slotted page layout
  • DiskManager: Handles file I/O operations

2. Catalog Layer (modakdb/catalog/)

Schema Management

Catalog Structure:
  Database
    β”œβ”€ Tables (dict)
    β”‚   β”œβ”€ TableSchema
    β”‚   β”‚   β”œβ”€ table_name: str
    β”‚   β”‚   β”œβ”€ columns: List[Column]
    β”‚   β”‚   β”œβ”€ indexes: List[Index]
    β”‚   β”‚   └─ heap_file_id: int
    β”‚   └─ Column
    β”‚       β”œβ”€ name: str
    β”‚       β”œβ”€ type: ColumnType
    β”‚       β”œβ”€ nullable: bool
    β”‚       β”œβ”€ primary_key: bool
    β”‚       └─ default: Any
    └─ Indexes (dict)

Supported Data Types

class ColumnType(Enum):
    INT = "INT"              # 4 bytes
    BIGINT = "BIGINT"        # 8 bytes
    FLOAT = "FLOAT"          # 4 bytes
    DOUBLE = "DOUBLE"        # 8 bytes
    VARCHAR = "VARCHAR"      # Variable (max length specified)
    CHAR = "CHAR"            # Fixed length
    BOOLEAN = "BOOLEAN"      # 1 byte
    DATE = "DATE"            # 4 bytes (days since epoch)
    TIMESTAMP = "TIMESTAMP"  # 8 bytes (microseconds)
    BLOB = "BLOB"            # Variable length binary

3. Parser Layer (modakdb/parser/)

SQL Grammar (using Lark)

Supported SQL Statements:
  - CREATE TABLE table_name (columns...)
  - DROP TABLE table_name
  - INSERT INTO table_name VALUES (...)
  - SELECT columns FROM table WHERE condition
  - UPDATE table SET col=val WHERE condition
  - DELETE FROM table WHERE condition
  - CREATE INDEX index_name ON table(column)

Expression Support:
  - Arithmetic: +, -, *, /, %
  - Comparison: =, !=, <, >, <=, >=
  - Logical: AND, OR, NOT
  - Functions: COUNT, SUM, AVG, MIN, MAX
  - Aggregations with GROUP BY and HAVING

AST Structure

Abstract Syntax Tree Nodes:
  - Statement (base class)
    β”œβ”€ SelectStatement
    β”‚   β”œβ”€ columns: List[Expression]
    β”‚   β”œβ”€ from_table: str
    β”‚   β”œβ”€ where: Expression
    β”‚   β”œβ”€ group_by: List[Expression]
    β”‚   β”œβ”€ having: Expression
    β”‚   └─ order_by: List[(Expression, ASC/DESC)]
    β”œβ”€ InsertStatement
    β”œβ”€ UpdateStatement
    β”œβ”€ DeleteStatement
    └─ DDLStatement (CREATE/DROP)

4. Executor Layer (modakdb/executor/)

Volcano Iterator Model

class Operator(ABC):
    @abstractmethod
    def open(self): ...      # Initialize operator
    
    @abstractmethod
    def next(self) -> Tuple: # Get next tuple (iterator protocol)
    
    @abstractmethod
    def close(self): ...     # Clean up resources

Operator Tree Example:
    ProjectionOp (name, age)
         β”‚
    FilterOp (age > 25)
         β”‚
    SeqScanOp (users table)

Execution Operators

Physical Operators:
  β€’ SeqScan: Sequential table scan
  β€’ IndexScan: Index-based lookup
  β€’ Filter: Predicate evaluation
  β€’ Project: Column projection
  β€’ NestedLoopJoin: Nested loop join
  β€’ HashJoin: Hash-based join
  β€’ Sort: External merge sort
  β€’ Aggregate: Grouping and aggregation
  β€’ Limit: Result limiting

5. Buffer Pool (modakdb/buffer/)

LRU Cache Design

Buffer Pool:
  - Fixed size array of page frames
  - Hash table: page_id β†’ frame_id
  - LRU replacement policy (doubly-linked list)
  - Pin counter for each frame
  - Dirty bit tracking

Operations:
  β€’ FetchPage(page_id) β†’ Page
  β€’ UnpinPage(page_id, is_dirty)
  β€’ FlushPage(page_id)
  β€’ FlushAllPages()
  β€’ NewPage() β†’ Page

Eviction Policy:
  1. Find unpinned page with pin_count = 0
  2. If dirty, write to disk
  3. Remove from buffer pool
  4. Load new page

6. Transaction Management (modakdb/transaction/)

ACID Properties

Atomicity:

  • Undo logging for rollback
  • Transaction abort on errors

Consistency:

  • Constraint checking
  • Referential integrity

Isolation:

  • MVCC (Multi-Version Concurrency Control)
  • Snapshot isolation level

Durability:

  • Write-Ahead Logging (WAL)
  • Checkpoint mechanism

MVCC Implementation

Tuple Structure with Versioning:
  Tuple:
    - data: bytes
    - xmin: Transaction ID (creator)
    - xmax: Transaction ID (deleter, NULL if active)
    - is_deleted: bool

Visibility Rules:
  A tuple is visible to transaction T if:
    1. xmin < T.start_timestamp AND
    2. (xmax IS NULL OR xmax > T.start_timestamp)

Transaction States:
  ACTIVE β†’ COMMITTED
         β†’ ABORTED

7. Indexing (modakdb/index/)

B+ Tree Structure

B+ Tree Properties:
  - Order M (max children per node)
  - All records in leaf nodes
  - Leaves linked for range queries
  - Search time: O(log n)

Node Layout:
  Internal Node:
    - Keys: [K1, K2, ..., Kn]
    - Children: [P0, P1, ..., Pn]
    - P0 < K1 ≀ P1 < K2 ≀ ... < Kn ≀ Pn
  
  Leaf Node:
    - Keys: [K1, K2, ..., Kn]
    - Records: [R1, R2, ..., Rn]
    - Next pointer: Link to sibling

8. Recovery (modakdb/recovery/)

Write-Ahead Logging (WAL)

Log Record Types:
  - BEGIN: Transaction start
  - COMMIT: Transaction commit
  - ABORT: Transaction rollback
  - UPDATE: Tuple modification (before/after)
  - INSERT: Tuple insertion
  - DELETE: Tuple deletion
  - CHECKPOINT: System checkpoint

Recovery Algorithm (ARIES):
  1. Analysis Pass: Scan log to find dirty pages
  2. Redo Pass: Replay all operations (idempotent)
  3. Undo Pass: Rollback uncommitted transactions

πŸ—ΊοΈ Implementation Roadmap

Phase 1: Foundation (Weeks 1-2)

β–‘ Storage Layer
  β–‘ Page structure and layout
  β–‘ Disk manager (file I/O)
  β–‘ Heap file implementation
  β–‘ Slotted page management
  
β–‘ Catalog Layer
  β–‘ Data type definitions
  β–‘ Table schema management
  β–‘ Column definitions
  β–‘ Catalog persistence

Phase 2: Query Processing (Weeks 3-4)

β–‘ Parser Layer
  β–‘ SQL grammar definition (Lark)
  β–‘ Lexer and parser
  β–‘ AST construction
  β–‘ Statement validation
  
β–‘ Executor Layer
  β–‘ Operator interface
  β–‘ Sequential scan
  β–‘ Filter operator
  β–‘ Projection operator
  β–‘ Simple DML execution

Phase 3: Buffer Management (Week 5)

β–‘ Buffer Pool Manager
  β–‘ Page frame allocation
  β–‘ LRU replacement policy
  β–‘ Pin/unpin mechanism
  β–‘ Dirty page tracking
  β–‘ Flush policies

Phase 4: Indexing (Weeks 6-7)

β–‘ B+ Tree Index
  β–‘ Node structure
  β–‘ Insert operation
  β–‘ Search operation
  β–‘ Delete operation
  β–‘ Range scan
  β–‘ Index scan operator

Phase 5: Transactions (Weeks 8-9)

β–‘ Transaction Management
  β–‘ Transaction context
  β–‘ Begin/Commit/Abort
  β–‘ Undo logging
  β–‘ MVCC implementation
  
β–‘ Concurrency Control
  β–‘ Lock manager
  β–‘ Two-phase locking
  β–‘ Deadlock detection

Phase 6: Recovery (Week 10)

β–‘ Write-Ahead Logging
  β–‘ Log manager
  β–‘ Log record formats
  β–‘ Log buffering
  β–‘ Checkpoint mechanism
  
β–‘ Recovery
  β–‘ ARIES algorithm
  β–‘ Crash recovery
  β–‘ Transaction rollback

Phase 7: Advanced Features (Weeks 11-12)

β–‘ Query Optimization
  β–‘ Cost model
  β–‘ Join ordering
  β–‘ Index selection
  
β–‘ Advanced Operators
  β–‘ Hash join
  β–‘ Sort-merge join
  β–‘ Aggregation
  β–‘ GROUP BY / HAVING
  
β–‘ Performance
  β–‘ Query caching
  β–‘ Statistics collection
  β–‘ Profiling tools

πŸ§ͺ Testing Strategy

Unit Tests Structure

tests/unit/
  β”œβ”€β”€ test_storage.py
  β”‚   β”œβ”€β”€ TestPage
  β”‚   β”œβ”€β”€ TestSlottedPage
  β”‚   β”œβ”€β”€ TestHeapFile
  β”‚   └── TestDiskManager
  β”‚
  β”œβ”€β”€ test_catalog.py
  β”‚   β”œβ”€β”€ TestColumnType
  β”‚   β”œβ”€β”€ TestColumn
  β”‚   β”œβ”€β”€ TestTableSchema
  β”‚   └── TestCatalog
  β”‚
  β”œβ”€β”€ test_parser.py
  β”‚   β”œβ”€β”€ TestLexer
  β”‚   β”œβ”€β”€ TestParser
  β”‚   β”œβ”€β”€ TestSelectStatement
  β”‚   └── TestDDLStatements
  β”‚
  β”œβ”€β”€ test_executor.py
  β”‚   β”œβ”€β”€ TestSeqScan
  β”‚   β”œβ”€β”€ TestFilter
  β”‚   β”œβ”€β”€ TestProjection
  β”‚   └── TestJoinOperators
  β”‚
  β”œβ”€β”€ test_buffer.py
  β”‚   β”œβ”€β”€ TestBufferPool
  β”‚   β”œβ”€β”€ TestLRUReplacer
  β”‚   └── TestPageEviction
  β”‚
  └── test_transaction.py
      β”œβ”€β”€ TestTransactionManager
      β”œβ”€β”€ TestMVCC
      └── TestLockManager

Integration Tests Structure

tests/integration/
  β”œβ”€β”€ test_end_to_end.py
  β”‚   β”œβ”€β”€ test_create_table_and_insert
  β”‚   β”œβ”€β”€ test_query_execution
  β”‚   β”œβ”€β”€ test_updates_and_deletes
  β”‚   └── test_complex_queries
  β”‚
  β”œβ”€β”€ test_transactions.py
  β”‚   β”œβ”€β”€ test_acid_properties
  β”‚   β”œβ”€β”€ test_concurrent_transactions
  β”‚   β”œβ”€β”€ test_deadlock_detection
  β”‚   └── test_rollback
  β”‚
  β”œβ”€β”€ test_recovery.py
  β”‚   β”œβ”€β”€ test_crash_recovery
  β”‚   β”œβ”€β”€ test_wal_replay
  β”‚   └── test_checkpoint
  β”‚
  └── test_performance.py
      β”œβ”€β”€ benchmark_insert
      β”œβ”€β”€ benchmark_scan
      └── benchmark_index_lookup

Test Coverage Goals

  • Unit Tests: 90%+ coverage
  • Integration Tests: All major workflows
  • Performance Tests: Regression detection
  • Edge Cases: Null values, boundary conditions, concurrent access

πŸ’» Development Guidelines

Project Structure

modakdb/
β”œβ”€β”€ __init__.py                 # Package initialization
β”œβ”€β”€ api/                        # Public API
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ database.py            # Database class (main entry)
β”‚   └── exceptions.py          # Custom exceptions
β”œβ”€β”€ storage/                    # Storage layer
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ disk_manager.py        # File I/O operations
β”‚   β”œβ”€β”€ page.py                # Page structure
β”‚   β”œβ”€β”€ heap_file.py           # Heap file management
β”‚   └── slotted_page.py        # Slotted page layout
β”œβ”€β”€ catalog/                    # Schema management
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ types.py               # Data types
β”‚   β”œβ”€β”€ schema.py              # Table schemas
β”‚   └── catalog.py             # System catalog
β”œβ”€β”€ parser/                     # SQL parsing
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ grammar.lark           # SQL grammar
β”‚   β”œβ”€β”€ parser.py              # Parser implementation
β”‚   └── ast_nodes.py           # AST definitions
β”œβ”€β”€ planner/                    # Query planning
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ planner.py             # Query planner
β”‚   └── optimizer.py           # Query optimizer
β”œβ”€β”€ executor/                   # Query execution
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ operator.py            # Base operator class
β”‚   β”œβ”€β”€ scan.py                # Scan operators
β”‚   β”œβ”€β”€ filter.py              # Filter operator
β”‚   β”œβ”€β”€ projection.py          # Projection operator
β”‚   β”œβ”€β”€ join.py                # Join operators
β”‚   β”œβ”€β”€ aggregate.py           # Aggregation
β”‚   └── sort.py                # Sort operator
β”œβ”€β”€ buffer/                     # Buffer pool
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ buffer_pool.py         # Buffer pool manager
β”‚   └── replacer.py            # LRU replacer
β”œβ”€β”€ transaction/                # Transactions
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ transaction.py         # Transaction context
β”‚   β”œβ”€β”€ lock_manager.py        # Lock management
β”‚   └── mvcc.py                # MVCC implementation
β”œβ”€β”€ recovery/                   # Recovery
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ log_manager.py         # WAL manager
β”‚   β”œβ”€β”€ log_record.py          # Log records
β”‚   └── recovery.py            # Recovery algorithm
└── index/                      # Indexing
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ index.py               # Base index interface
    └── btree.py               # B+ tree implementation

Code Style

# Use type hints
def insert_tuple(self, table_name: str, values: List[Any]) -> int:
    """Insert a tuple into a table.
    
    Args:
        table_name: Name of the table
        values: List of column values
        
    Returns:
        Row ID of inserted tuple
        
    Raises:
        TableNotFoundError: If table doesn't exist
    """
    pass

# Use dataclasses for structured data
@dataclass
class Page:
    page_id: int
    data: bytearray
    is_dirty: bool = False
    pin_count: int = 0

# Use enums for constants
class PageType(Enum):
    HEAP_PAGE = 1
    INDEX_PAGE = 2
    META_PAGE = 3

Error Handling

# Custom exception hierarchy
class ModakDBError(Exception):
    """Base exception for all ModakDB errors"""
    pass

class StorageError(ModakDBError):
    """Storage layer errors"""
    pass

class ParseError(ModakDBError):
    """SQL parsing errors"""
    pass

class ExecutionError(ModakDBError):
    """Query execution errors"""
    pass

Example Usage

from modakdb import Database

# Create database
db = Database("mydb.mdb")

# Create table
db.execute("""
    CREATE TABLE users (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        age INT,
        email VARCHAR(255)
    )
""")

# Insert data
db.execute("INSERT INTO users VALUES (1, 'Alice', 30, 'alice@example.com')")
db.execute("INSERT INTO users VALUES (2, 'Bob', 25, 'bob@example.com')")

# Query data
result = db.execute("SELECT name, age FROM users WHERE age > 25")
for row in result:
    print(f"{row['name']}: {row['age']}")

# Transactions
with db.transaction() as txn:
    txn.execute("UPDATE users SET age = 31 WHERE name = 'Alice'")
    txn.execute("DELETE FROM users WHERE name = 'Bob'")
    # Automatically commits if no exception

db.close()

πŸ“š Learning Resources

Recommended Reading

  1. Database System Concepts - Silberschatz, Korth, Sudarshan
  2. Database Management Systems - Ramakrishnan, Gehrke
  3. Architecture of a Database System - Hellerstein, Stonebraker, Hamilton
  4. The Internals of PostgreSQL - Hironobu SUZUKI (online)

Key Papers

  • ARIES Recovery Algorithm (Mohan et al.)
  • B+ tree Implementations (Comer)
  • MVCC in PostgreSQL (PostgreSQL Documentation)

πŸš€ Quick Start

# Clone repository
git clone https://github.com/rohitwtbs/modakdb.git
cd modakdb

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest --cov=modakdb --cov-report=html tests/

# Type checking
mypy modakdb/

# Linting
ruff check modakdb/

πŸ“ License

MIT License - See LICENSE for details


🀝 Contributing

This is an educational project, but contributions are welcome! Please:

  1. Write tests for new features
  2. Follow the existing code style
  3. Update documentation
  4. Add type hints

Happy Database Building! πŸš€

"The best way to understand databases is to build one."

MIT License - See LICENSE file

About

An RDMS made for learning purpose.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages