Stratum

A log-structured, hierarchical storage engine with O(1) lookup, built in C++ and exposed over gRPC — so any language can use it as a persistent backend.

Stratum stores structured data in a hierarchy of three linked tiers: a document (any HTML/text blob), a chain of input nodes attached to that document, and a chain of output nodes attached to each input. All writes are append-only to log-structured segment files on disk. An in-memory hash index gives O(1) reads. A background compaction thread merges and garbage-collects old segments automatically.

Why Stratum?

Most databases make you pick between:

A full relational DB (too heavy, forces a schema upfront)
A key-value store (too flat, you have to model relationships yourself)
A document store (good for the top level, awkward for linked chains of sub-documents)

Stratum is purpose-built for the pattern: one document → many ordered inputs → one output per input, where both inputs and outputs can be any data type (integer, float, string, list, map). It handles durability, compaction, and concurrency internally, and speaks gRPC so your Python, Go, or Node backend calls it like a local function.

Architecture

Your application (Python / Go / Node / …)
        │
        │  gRPC (any language, auto-generated client)
        ▼
  Stratum server  ─────────────────────────────────────
  │                                                    │
  │  In-memory hash index                              │
  │  unordered_map<doc_id → {byte_offset, meta}>       │
  │  O(1) lookup — only id + offset loaded into RAM    │
  │                                                    │
  │  Segment manager (append-only log)                 │
  │  active.seg  seg_002.seg  seg_001.seg  merged.seg  │
  │                                                    │
  │  Compactor thread (background)                     │
  │  Merges segments, GCs tombstoned records           │
  ──────────────────────────────────────────────────────
        │
        ▼
  Disk  (log-structured segment files)
  Document blobs · Input linked list · Output linked list

On-disk data model

Each document is stored as a blob at a known byte offset. The document record contains a pointer to the first node in the input chain. Each input node stores its value (any supported type) and two pointers — one to the next input node, one to its corresponding output node.

Document ──→ Input node 1 ──→ Input node 2 ──→ Input node 3 ──→ …
                  │                 │                 │
                  ▼                 ▼                 ▼
             Output node 1    Output node 2    Output node 3

All writes (including updates) are appended to the end of the active segment. Old versions of records become garbage and are reclaimed by the compactor. Deletes write a tombstone record; the original data is never mutated.

Supported value types

Both input nodes and output nodes can hold any of:

Type	Example
`int64`	`42`
`double`	`3.14`
`string`	`"hello world"`
`list[int]`	`[2, 7, 11, 15]`
`list[string]`	`["foo", "bar"]`
`map[string, string]`	`{"key": "value"}`

File structure

stratum/
├── CMakeLists.txt
├── Dockerfile
├── docker-compose.yml
├── proto/
│   └── lse.proto               ← gRPC contract (edit this to add RPCs)
├── include/lse/
│   ├── types.h                 ← on-disk structs, ValueVariant
│   ├── serialisation.h         ← CRC-32, encode/decode
│   ├── segment.h               ← append-only segment file
│   ├── compactor.h             ← background GC thread
│   └── storage_engine.h        ← public C++ API
├── src/
│   ├── serialisation.cpp
│   ├── segment.cpp
│   ├── compactor.cpp
│   ├── storage_engine.cpp
│   └── grpc/
│       ├── lse_service.h       ← gRPC service declaration
│       ├── lse_service.cpp     ← every RPC wired to StorageEngine
│       └── server_main.cpp     ← binary entry point
└── client/
    └── python/
        ├── lse_client.py       ← Python wrapper (import this)
        ├── lse_pb2.py          ← auto-generated stubs
        ├── lse_pb2_grpc.py     ← auto-generated stubs
        └── requirements.txt

Getting started

Prerequisites

# C++ build tools + gRPC
sudo apt install -y \
  build-essential cmake pkg-config \
  libgrpc++-dev libprotobuf-dev \
  protobuf-compiler protobuf-compiler-grpc

# Python client
pip install grpcio grpcio-tools

Option A — Run with Docker (recommended)

git clone https://github.com/yourname/stratum
cd stratum
docker-compose up --build

The server starts on port 50051. Data persists in a named Docker volume across restarts. To stop: docker-compose down. To wipe data entirely: docker-compose down -v.

Option B — Build and run locally

git clone https://github.com/yourname/stratum
cd stratum

# 1. Generate gRPC C++ code from the proto file
protoc --proto_path=proto \
       --cpp_out=src/grpc \
       --grpc_out=src/grpc \
       --plugin=protoc-gen-grpc=/usr/bin/grpc_cpp_plugin \
       proto/lse.proto

# 2. Build
mkdir build && cd build
cmake .. && make -j$(nproc)

# 3. Start the server
./lse_server --port=50051 --data-dir=/path/to/your/data

Usage — Python client

Copy client/python/lse_client.py, lse_pb2.py, and lse_pb2_grpc.py into your project, then:

from lse_client import LseClient

lse = LseClient("localhost:50051")

# ── Create a document ─────────────────────────────────────────────────────────
doc_id = lse.create_problem(
    "<h1>My Document</h1><p>Any HTML or text content.</p>",
    name="Example",
    category="tutorial",
    version="1.0"
)

# ── Read it back ──────────────────────────────────────────────────────────────
content = lse.get_problem_html(doc_id)
meta    = lse.get_problem_meta(doc_id)   # {"name": "Example", "category": "tutorial", …}

# ── Update content (old version becomes garbage — GC handles it) ──────────────
lse.update_problem_html(doc_id, "<h1>Updated content</h1>")
lse.update_problem_column(doc_id, "version", "1.1")

# ── Add input nodes (any data type) ──────────────────────────────────────────
node1 = lse.add_test_case(doc_id, [2, 7, 11, 15])      # list of ints
node2 = lse.add_test_case(doc_id, "some string input")  # string
node3 = lse.add_test_case(doc_id, {"key": "value"})     # map

# ── Attach output nodes ───────────────────────────────────────────────────────
lse.set_expected_output(doc_id, node1, [0, 1])          # list answer
lse.set_expected_output(doc_id, node2, "output string") # string answer

# ── Read all input nodes for a document ──────────────────────────────────────
nodes = lse.get_all_test_cases(doc_id)
for n in nodes:
    print(n["tc_id"], n["value"])

# ── Read a specific output node ───────────────────────────────────────────────
output = lse.get_expected_output(doc_id, node1)
print(output["value"])   # [0, 1]

# ── Update a node value ───────────────────────────────────────────────────────
lse.update_test_case(doc_id, node1, [3, 2, 4])
lse.update_expected_output(doc_id, node1, [1, 2])

# ── Delete ────────────────────────────────────────────────────────────────────
lse.delete_expected_output(doc_id, node2)
lse.delete_test_case(doc_id, node3)
lse.delete_problem(doc_id)

# ── Admin ─────────────────────────────────────────────────────────────────────
stats = lse.get_stats()
# {"problems_in_index": 12, "active_segment_bytes": 4096, "segment_count": 3}

lse.flush()        # force active segment to disk
lse.compact_now()  # trigger immediate compaction (normally runs automatically)

gRPC API reference

All operations are defined in proto/lse.proto. The full service:

Documents

RPC	Description
`CreateProblem`	Create a document with optional metadata columns
`GetProblemHtml`	Fetch the document blob (O(1) index lookup + 1 disk seek)
`GetProblemMeta`	Fetch metadata only — no disk seek for the blob
`UpdateProblemHtml`	Append new version; old becomes garbage
`UpdateProblemColumn`	Update or add a metadata column
`DeleteProblem`	Tombstone a document
`ListProblems`	Return all live document IDs

Input nodes

RPC	Description
`AddTestCase`	Append a new input node to a document's chain
`GetTestCase`	Fetch a single input node (O(1) cache hit)
`GetAllTestCases`	Walk the full linked list for a document
`UpdateTestCase`	Rewrite a node's value
`DeleteTestCase`	Tombstone a node

Output nodes

RPC	Description
`SetExpectedOutput`	Attach an output node to an input node
`GetExpectedOutput`	Fetch the output node for an input node
`UpdateExpectedOutput`	Rewrite an output node's value
`DeleteExpectedOutput`	Remove the output node link

Admin

RPC	Description
`GetStats`	Index size, active segment size, segment count
`Flush`	Flush active segment to disk
`CompactNow`	Force immediate compaction (blocks until complete)

Using from other languages

Because the API is defined in lse.proto, you can generate a client in any language gRPC supports.

Go:

protoc --proto_path=proto --go_out=client/go --go-grpc_out=client/go proto/lse.proto

Node.js:

npm install @grpc/grpc-js @grpc/proto-loader
# then load proto dynamically — no codegen needed

Java / Kotlin / Rust / C# / Ruby — all follow the same pattern. One .proto file, one codegen command, done.

How compaction works

Stratum uses a strategy inspired by Bitcask and LSM-trees.

All writes go to active.seg in append-only fashion.
When active.seg reaches a configured size threshold, it is sealed (renamed to seg_NNN.seg) and a fresh active.seg is opened.
The background compactor wakes every 30 seconds and checks total segment size.
When total size exceeds the compaction threshold, it scans all sealed segments, keeps only the most recent version of each record (highest timestamp wins per record ID), discards tombstoned records, writes a single merged.seg, and atomically removes the old segments.
The in-memory index reloads from the merged segment.

This means updates and deletes are always O(1) writes. Storage only grows proportionally to live data, not total write history.

Configuration

Flag	Environment variable	Default	Description
`--port`	`LSE_PORT`	`50051`	gRPC listen port
`--data-dir`	`LSE_DATA_DIR`	`/data`	Directory for segment files

Contributing

Contributions are welcome. The codebase is structured so each concern is isolated:

Add a new RPC → edit proto/lse.proto, add the handler in src/grpc/lse_service.cpp
Change storage format → edit include/lse/types.h and src/serialisation.cpp
Tune compaction → edit src/compactor.cpp

Please open an issue before submitting a large PR.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
client/python		client/python
include/lse		include/lse
proto		proto
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
get_problems.py		get_problems.py
logo.png		logo.png
lse_pb2.py		lse_pb2.py
lse_pb2_grpc.py		lse_pb2_grpc.py
seed_db.py		seed_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stratum

Why Stratum?

Architecture

On-disk data model

Supported value types

File structure

Getting started

Prerequisites

Option A — Run with Docker (recommended)

Option B — Build and run locally

Usage — Python client

gRPC API reference

Documents

Input nodes

Output nodes

Admin

Using from other languages

How compaction works

Configuration

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stratum

Why Stratum?

Architecture

On-disk data model

Supported value types

File structure

Getting started

Prerequisites

Option A — Run with Docker (recommended)

Option B — Build and run locally

Usage — Python client

gRPC API reference

Documents

Input nodes

Output nodes

Admin

Using from other languages

How compaction works

Configuration

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages