GitHub - inline0/storh: File-first records for PHP: JSONC document store, append-only segmented log, and log-backed queue. Durable, queryable, zero dependencies.

File-first records for PHP: JSONC documents, append-only segmented logs, and log-backed queues.

What is storh?

storh is a standalone, framework-agnostic storage layer for PHP applications that want durable local records without a database server. It provides:

a JSONC document store with one file per record
an append-only segmented log with cursor and time-range reads
an append-only log-backed queue
Prisma/Drizzle-style fluent querying, secondary indexes, schema validation, caching, bulk JSONL import/export, maintenance APIs, benchmarks, and a CLI
an optional SQL mirror that pushes collections into SQLite or MySQL for joins, search, and reporting while the files stay canonical
UUIDv7 ids, UUID-tail sharding, atomic writes, torn-write recovery, retention, and compaction

The three engines store the same record shape (a UUIDv7 id plus an array of data) with three different disk layouts. Pick by workload:

Engine	Use it for	Avoid it for
`DocStore`	records you fetch by id, update in place, and query by indexed fields	non-indexed field scans past roughly 10k records
`SegmentedLog`	append-heavy streams read by cursor or time window, into roughly 1M records	frequent single-record rewrites
`Queue`	durable job handoff between worker processes	payloads you need to read back after completion

DocStore and SegmentedLog share the same fluent query API; only the execution differs (index lookups vs segment scans). Queue is claim-based rather than queryable.

Engines compose: one storage root can hold DocStore collections, SegmentedLog streams, and Queue directories side by side.

The caller provides a base directory. storh does not discover application paths or depend on a framework.

Quick Start

composer require storh/storh

use Storh\Cache;
use Storh\DocStore;
use Storh\Queue;
use Storh\Schema;
use Storh\SegmentedLog;
use Storh\StorageRoot;

$root = StorageRoot::resolve(__DIR__ . '/var/storh', 'app');

$schema = Schema::collection('pages')
    ->string('slug')->unique()
    ->string('kind')->index()
    ->int('publishedAt')->range()
    ->required(['slug', 'kind']);

$docs = new DocStore($root, 'pages', cache: Cache::memory(), schema: $schema);
$home = $docs->put([
    'slug' => 'home',
    'kind' => 'page',
    'title' => 'Home',
    'publishedAt' => time(),
]);

echo $docs->get($home->id())?->data()['title'];

$pages = $docs
    ->query()
    ->where('kind')->eq('page')
    ->where('publishedAt')->gte(time() - 86400)
    ->orderBy('publishedAt', 'desc')
    ->limit(50)
    ->get();

$events = new SegmentedLog($root, 'events');
$events->appendMany([[
    'type' => 'page.saved',
    'pageId' => $home->id(),
]]);

$queue = new Queue($root, 'jobs');
$queue->enqueue(['task' => 'render', 'pageId' => $home->id()]);

$job = $queue->claim();
if (null !== $job) {
    $queue->complete($job->id());
}

Engines

DocStore writes each record as a JSONC object under a UUID-tail-sharded path. It is best for point reads and modest field scans. putStream() ingests large iterables without retaining returned record objects.

SegmentedLog appends records to length and checksum guarded NDJSON segments. It is best for append-heavy workflows, cursor pagination, time-range scans, and compaction. appendStream() ingests large iterables without retaining returned record objects.

Queue stores job events in an append-only log and keeps pending, processing, and done state in memory. Claims, completions, requeues, and purges append bounded events instead of creating one file per job. Bulk enqueue, claim, and complete methods reduce lock and flush cost for large queues.

For a side-by-side comparison with rules of thumb, see Choosing an engine in the docs.

Durability & Concurrency

storh writes data files in place-specific temporary files and publishes them with an atomic rename inside the target directory. Reopen and repair() clean abandoned temp files whose owner process is gone, while leaving live writer temp files alone. Writer markers make that check constant-time: opening a store sweeps the collection only when a previous writer actually crashed.

SegmentedLog and Queue use length and checksum guarded log lines. On reopen or repair, torn tails are truncated to the last committed event and in-memory state is rebuilt from durable log contents. SegmentedLog::verify() compares the current derived state index with a fresh segment replay, and Queue::verify() compares in-memory queue state with a full durable log replay, so stale or externally modified state is reported. Log and queue writes are serialized with filesystem locks.

Queue claims and completions sync from the durable log while holding the queue lock, so multiple worker processes claim each pending job at most once. Jobs left in processing by a dead worker can be requeued by repair() or requeue_timed_out().

SegmentedLog compaction writes new compact-* segment files before swapping the manifest. If a process exits before that swap, reopen discards unreferenced compaction output and keeps replaying the old manifest segments. Completed compactions leave old sealed segments in place so readers that already opened them stay valid.

DocStore mutations are serialized with a collection-level write lock, so record files and secondary indexes update as one consistency boundary across processes. Concurrent writes to distinct record ids are safe under the same filesystem atomic-rename assumptions. Concurrent writes to the same id are last-rename-wins.

DocStore::verify() checks record parseability and secondary-index drift against a full record scan. repair() moves corrupt record files into .storh/corrupt/ for inspection and rebuilds indexes from the remaining valid records.

storh flushes and fsyncs file handles before rename or append completion, and attempts to fsync parent directories after atomic renames. Power-loss durability still depends on the underlying filesystem and mount options honoring those syncs.

Cache Validation

STAT cache validation checks record existence, mtime, and size. HASH also checks file contents and catches same-size rewrites that preserve mtime. TRUST skips filesystem validation and is intended for single-process or shared-cache workloads where explicit storh writes publish the newest value. Files changed outside storh are not detected by TRUST, and same-stat content edits require HASH.

Querying and Indexes

DocStore exposes a fluent query builder:

$docs
    ->query()
    ->where('slug')->prefix('ho')
    ->orWhere(fn ($q) => $q->where('kind')->eq('post'))
    ->orderBy('id')
    ->page(100)
    ->get();

Indexes are file-backed and rebuildable:

$docs->indexes()
    ->field('slug')->unique()
    ->field('kind')
    ->field('publishedAt')->range()
    ->sync();

$docs->reindex();
$docs->query()->where('slug')->eq('home')->explain();

Two eq predicates on non-unique equality-indexed fields use automatic compound buckets, so common filters like kind = page AND bucket = 4 avoid intersecting large single-field result sets.

SQL Mirrors

For joins, cross-table ordering, substring search, and reporting, push collections into SQLite or MySQL with SqlMirror. The files stay canonical; the mirror is a derived, disposable projection that push() keeps converged and rebuild() recreates from scratch:

$mirror = new Storh\SqlMirror(new PDO('sqlite:' . $root . '/mirror.db'));
$mirror->collection($docs, 'pages', $schema);
$mirror->collection($events, 'events');
$mirror->install();
$mirror->push();

$pdo->query('SELECT ... FROM storh_pages INNER JOIN storh_events ON ...');

push() reconciles by content hash and writes each collection in one transaction; flush() pushes specific ids for read-your-writes; pull() writes mirror rows back into files for restore and seeding flows; verify() reports drift. Connect with PDO (SQLite or MySQL) or an existing mysqli handle; the extensions are required only when the mirror is used.

Operations

vendor/bin/storh stats var/storh pages doc
vendor/bin/storh verify var/storh events log
vendor/bin/storh compact var/storh events log
vendor/bin/storh reindex var/storh pages

composer bench
composer bench -- --dataset=100000 --engine=doc
composer bench -- --dataset=100000 --engine=cache --cache-validation=trust
composer bench -- --dataset=100000 --engine=filter
composer bench:repeat -- --dataset=1000000 --engine=filter --repeat=5 --memory-limit=512M
composer bench:range -- --datasets=1000,10000,50000,100000
composer bench:compare build/bench-main.json build/bench-current.json
composer bench:gate -- build/bench-main.json build/bench-current.json --threshold=10 --metric=doc.put --metric=log.stream
composer bench:ci

Benchmarks

Medians of three full runs of the shipped harness, 50,000 records per engine, on an Apple M1 Pro (16 GB, macOS 26.5, APFS) with PHP 8.5 CLI, opcache and JIT off:

composer bench:repeat -- --dataset=50000 --engine=all --repeat=3 --memory-limit=512M

Every write is flushed and fsynced before storh reports it stored, so write rates are filesystem-bound. Reads run against the OS page cache. Rates are derived from the medians; run composer bench for your own hardware.

DocStore, 50k records	median
`put()`, one durable file per record	3.3k records/s
`putStream()` bulk ingest	3.7k records/s
`importJsonl()`	3.6k records/s
`get()` point read, STAT-validated	5.6 µs
reopen an existing store	0.26 ms
indexed equality query, `limit(100)`	1.1 ms
indexed `count()` across the collection	41 µs
index build, 2 equality + 1 range field	52k records/s
full `stream()` with STAT re-validation	103k records/s
`exportJsonl()`	104k records/s

SegmentedLog, 50k records, 16 KB segments	median
`put()`, fsync per append	16k appends/s
`appendStream()` bulk ingest	56k records/s
cursor read, 100 records from the midpoint	2.1 ms
time-range read	1.1 ms
equality `count()`	11 µs
`compact()` all sealed segments	69k records/s
reopen with torn-tail recovery	218k records/s

The benchmark seals a segment every 16 KB to stress segment rolls; the default segment size is 1 MiB.

Queue, 50k jobs	median
`enqueue()`, fsync per event	22k jobs/s
`claim()`	23k jobs/s
`complete()`	23k jobs/s
`enqueueMany()`	209k jobs/s
`claimMany()`	467k jobs/s
`completeMany()`	512k jobs/s

SQL Mirror, SQLite, 50k records	median
initial `push()`	61k rows/s
`push()` with nothing changed	86k records/s
`flush()`, 100 ids	7.1 ms
indexed SQL `COUNT` over the mirror	6.9 ms
`rebuild()`	65k rows/s
`pull()` restore, one durable file per record	3.2k records/s

Micro	median
cached `get()`, cold then warm (MemoryCache, STAT)	83 µs / 6.3 µs
UUIDv7 generate	1.3 µs
UUIDv7 validate	0.30 µs
in-memory predicate filtering	4.4M rows/s

API Stability

For 0.5.0, the documented API is the surface shown in the README and docs. Some classes expose extra public methods so storh engines can cooperate internally; treat those as implementation details unless they are documented. That keeps future performance work focused on internal storage, indexing, caching, and query-planner improvements without changing user-facing calls.

Scaling & Limits

Point access is effectively unbounded with sharding and a filesystem that can handle the file count.
Segmented-log scan and range reads are comfortable into roughly 1M records.
Per-file field scans degrade past roughly 10k records.
storh is not for ad-hoc relational queries, joins, or analytical filtering over many fields.

License

MIT © inline0.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.github/workflows		.github/workflows
bench		bench
bin		bin
docs		docs
src		src
tests		tests
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
composer.json		composer.json
phpcs.xml		phpcs.xml
phpstan.neon.dist		phpstan.neon.dist
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is storh?

Quick Start

Engines

Durability & Concurrency

Cache Validation

Querying and Indexes

SQL Mirrors

Operations

Benchmarks

API Stability

Scaling & Limits

License

About

Uh oh!

Releases 5

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

What is storh?

Quick Start

Engines

Durability & Concurrency

Cache Validation

Querying and Indexes

SQL Mirrors

Operations

Benchmarks

API Stability

Scaling & Limits

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Contributors

Uh oh!

Languages