Skip to content


Repository files navigation


What I Cannot Create, I Do Not Understand -Richard Feynman

In the spirit of Feynman's immortal words, the goal of this project is to better understand the internals of databases by implementing a relational database management system (RDBMS) (sqlite clone) from scratch.

This project was motivated by a desire to: 1) understand databases more deeply and 2) work on a fun project. These dual goals led to a:

  • relatively simple code base
  • relatively complete RDBMS implementation
  • written in pure python
    • No build step
  • zero configuration
    • configuration can be overriden

This makes the learndb codebase great for tinkering with. But the product has some key limitations that means it shouldn't be used as an actual storage solution.


Learndb supports the following:

  • it has a rich sql (learndb-sql) with support for select, from, where, group by, having, limit, order by
  • custom lexer and parser built using lark
  • at a high-level, there is an engine that can accept some SQL statements. These statements expresses operations on a database (a collection of tables which contain data)
  • allows users/agents to connect to RDBMS in multiple ways:
    • REPL
    • importing python module
    • passing a file of commands to the engine
  • on-disk btree implementation as backing data structure


  • Very simplified 1 implementation of floating point number arithmetic, e.g. compared to IEEE754).
  • No support for common utility features, like wildcard column expansion, e.g. select * ...
  • More limitations

Getting Started: Tinkering and Beyond

  • To get started with learndb first start with
  • Then to understand the system at a deeper technical level read This is essentially a complete reference manual directed at a user of the system. This outlines the operations and capabilities of the system. It also describes what is (un)supported and undefined behavior.
  • ``` - this provides a component level breakdown of the repo and the system



  • System requirements
    • requires a linux/macos system, since it uses fcntl to get exclusive read access on database file
    • python >= 3.9
  • To install for development, i.e. src can be edited from without having to reinstall:
    • cd <repo_root>
    • create virtualenv: python3 -m venv venv
    • activate venv: source venv/bin/activate
    • install requirements: python -m pip install -r requirements.txt
    • install Learndb in edit mode: python3 -m pip install -e .


source venv/bin/activate
python repl

Run Tests

  • Run all tests:

  • python -m pytest tests/*.py

  • Run btree tests: -python -m pytest -s tests/ # stdout

  • python -m pytest tests/ # suppressed out

  • Run end-to-end tests: python -m pytest -s tests/

  • Run end-to-end tests (employees): python -m pytest -s tests/

python -m pytest -s tests/ -k test_equality_select

  • Run serde tests: ...

  • Run language parser tests: ...

  • Run specific test: python -m pytest -k test_name

  • Clear pytest cache python -m pytest --cache-clear

References consulted

Project Management

  • immanent work/issues are tracked in
  • long-term ideas are tracked in docs/


  1. When evaluating the difference between two floats, e.g. 3.2 > 4.2, I consider the condition True if the difference between the two is some fixed delta. The accepted epsilon should scale with the magnitude of the number


Learn database internals by implementing it from scratch.








No releases published


No packages published