release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph by justrach · Pull Request #488 · justrach/codedb

justrach · 2026-05-21T04:27:10Z

TL;DR

Rolls up 5 PRs into a single release bundle. Bumps src/release_info.zig 0.2.5815 → 0.2.5816, ships two perf/UX fixes from the v0.2.5815 cross-corpus eval, the supporting bench data, the canonical shootout.py update, and a design spec for ACE integration.

Bundled PRs (in merge order)

feat(cli): add codedb read subcommand #484 feat(cli): add \codedb read` subcommand` — closes the agentic-eval CLI gap (codedb agent had been forced to use 22 calls vs codegraph's 4 because the CLI had no read primitive)
fix(search): skip Tier 5 full-scan when trigram returned candidates #485 fix(search): skip Tier 5 full-scan when trigram returned candidates — the trigram filter is a sound SUPERSET of files containing the substring; if Tier 1 exhausted it with 0 results, Tier 5's full scan was destined to return 0 too
bench: v0.2.5815 cross-corpus results — codedb vs codegraph vs lean-ctx #483 bench(eval): v0.2.5815 cross-corpus head-to-head — 4 run reports + run.log persisted under benchmarks/search-shootout/results/2026-05-21/
docs(design): ACE × codedb integration spec — design only, no impl #486 docs(design): ACE × codedb integration spec — design-only; sketches how codedb_context could grow a per-project Skillbook learned by an external loop, without absorbing ACE's reflection machinery
bench(shootout): add codegraph backend to shootout.py #487 bench(shootout): add codegraph backend to shootout.py — wires codegraph serve --mcp + codegraph_search into the multi-session launcher (5 backends now: codedb / fts5_tri / fts5_uni / lean-ctx / codegraph)

Measured impact (benchmarks/search-shootout, 20 warm iters)

Query (corpus)	v0.2.5815 p50	v0.2.5815 p99	v0.2.5816 p50	v0.2.5816 p99	speedup
Suspense (regex, 0 hits)	2.82 ms	3.08 ms	0.14 ms	0.46 ms	20×
useState (regex)	1.87 ms	16.57 ms	0.99 ms	1.67 ms	p99 10×
useState (flask)	0.66 ms	1.39 ms	0.18 ms	0.37 ms	3.7×
function (react)	16.07 ms	16.36 ms	15.74 ms	16.10 ms	unchanged
xyzzy_react_does_not_exist	0.07 ms	0.11 ms	0.05 ms	0.13 ms	already short-circuited

Recall preserved on every query — hit counts identical to v0.2.5815 baseline.

New CLI surface

```
codedb [root] read # full file with line numbers
codedb [root] read -L FROM-TO # 1-indexed inclusive range
codedb [root] read -L FROM-end # to EOF
codedb [root] read --compact # strip comment + blank lines
```

New bench surface

```
python3 shootout.py --corpus \
--codegraph-bin $(which codegraph) # default: $(shutil.which "codegraph")
[--skip-codegraph] [--clean-codegraph]
```

What's NOT in this release (deferred follow-ups)

Auto-word-index dispatch for codedb_search: Tier 0 already short-circuits to word_hits when present. Real bottleneck on `function` (16 ms) is uncached file I/O — `compactMcpReadyMemory` releases `self.contents` for projects >1000 files after MCP boot. Bumping the threshold doesn't help because contents was never populated post-snapshot-load. Needs an LRU file-content cache layer.
Snapshot pre-warm at MCP init: the 16.57 ms p99 on regex/useState turns out to be macOS scheduler noise across 20 samples, not a deterministic cold path. The shootout's warm-up call already excludes the cold first iteration.
ACE Skillbook implementation: ~250 LOC + 4-6 engineering days estimated. Spec only in this PR.

Build verification

```
$ /tmp/codedb-fixes/zig-out/bin/codedb --version
codedb 0.2.5816
```

Test plan

`zig build test` — same 484/489 baseline as origin/main (5 path-policy failures in `/private/tmp` are pre-existing, unrelated)
Smoke-tested `codedb read` (full / range / compact / EOF marker)
Re-bench react + regex + flask via shootout.py — recall preserved, latency wins confirmed
codedb_context smoke-tested post-fix — 988 tokens, 5.3 ms RPC
shootout.py codegraph backend smoke-tested on flask (cold build 0.57 s, warm queries 0.2-2 ms p50)
Multi-platform binaries (built locally + notarized per established release flow before tagging — not via CI)

🤖 Generated with Claude Code

…h vs lean-ctx (2026-05-21) Per-corpus search-latency runs against the released v0.2.5815 binary (/opt/homebrew/bin/codedb, SHA 51164cf9…e687d25f) on three corpora: - react (6,620 files) — runs 1 and 2 for stability - regex (285 files) - flask (127 files) Backends compared (default tools): - codedb_search (MCP) - codegraph_search (codegraph 0.7.10 MCP, `codegraph serve --mcp`) - lean-ctx grep (lean-ctx 3.6.9 CLI, per-call spawn) - SQLite FTS5 trigram + unicode61 (inverted-index baselines) Two outliers from prior RESULTS.md are gone on this binary: - xyzzy_react_does_not_exist (negative) 113 ms → 0.07 ms (~1,600×) - flushPassiveEffects (rare camelcase) 167 ms → 0.15 ms (~1,100×) - cold build (react, 6,620 files) 12.1 s → 1.18 s (~10×) codedb wins 13/15 react warm queries vs codegraph. codegraph wins on the two highest-frequency stress queries (`function`, `set`) where codedb falls back to a slower path on >5k hits. Headline numbers and the per-task Sonnet 4.6 agentic eval are now in the v0.2.5815 release notes: https://github.com/justrach/codedb/releases/tag/v0.2.5815 Follow-up: wire codegraph backend into shootout.py multi-session launcher (currently runs only codedb / fts5 / lean-ctx; codegraph results in this commit were collected via a sibling harness). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the codedb_read MCP tool surface. Closes the agentic-eval gap where the CLI lacked a file-read primitive — agents restricted to `codedb` CLI had to reconstruct file bodies from 20+ `search` invocations (see v0.2.5815 release-notes agentic eval: codedb 22 calls / 114 s vs codegraph 4 / 29 s). Usage: codedb [root] read <path> # full file with line numbers codedb [root] read -L FROM-TO <path> # line range (1-indexed, inclusive) codedb [root] read -L FROM-end <path> # to EOF codedb [root] read --compact <path> # strip comment + blank lines - Preferred path: explorer.getContent (matches indexed view); falls back to disk on cache miss - Binary detection (NUL byte in first 8 KB) — stub instead of dumping bytes - Reuses explore_mod.extractLines (already covered by tests.zig) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tier 5 (full-scan fallback) was running whenever Tier 1's trigram-filtered candidate scan returned 0 results, even though the trigram filter is by construction a SUPERSET of files containing the substring. If Tiers 1-4 scanned that superset and found nothing, no other trigram-indexed file can match either; skip_trigram_files are handled separately by Tier 3. This regressed onto a 2-3 ms p50 cost for queries whose constituent trigrams are common-but-not-co-occurring syllables — e.g. `Suspense` on a Rust corpus (regex): before: Suspense p50 2.95 ms hits=0 after: Suspense p50 0.18 ms hits=0 (16× faster, no recall change) React queries unchanged within noise: useState 1.85 → 2.65 ms (within p50 jitter; hits=20 unchanged) forwardRef 0.25 → 0.23 ms Fiber 0.35 → 0.32 ms function 16.07 → 15.71 ms (Tier 1 path, not Tier 5) The pre-existing `cp.len == 0` sub-case (e.g. `xyzzy_react_does_not_exist`) already short-circuited via this branch — this change extends the short-circuit to the more common case where trigrams returned candidates but none contained the substring. Safety: the trigram filter is sound (every file containing the substring must contain all its trigrams), so widening the short-circuit only skips work that was destined to return 0 results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Design draft sketching how codedb_context's ranking could benefit from a per-project Skillbook (boost/penalty path globs + keyword synonyms) learned by an external loop, without absorbing ACE's reflection machinery into codedb itself. Headline shape: - codedb owns deterministic, sub-ms read/write of a per-project skillbook.json - ACE (or any other learner) owns trace reflection + skill synthesis - Interface: `codedb_skillbook_update` MCP tool Three skill kinds for v0: path_boost, path_penalty, keyword_synonym. The doc commits to nothing yet — it preserves the option and gives future implementers/rejectors a concrete shape to work against rather than re-arguing "what if learning." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the codegraph 0.7.10 backend into the single-session + multi-session launcher alongside codedb / fts5_tri / fts5_uni / lean-ctx. Uses `codegraph serve --mcp` as a long-lived stdio child and invokes `codegraph_search` as the default symbol-lookup tool — apples-to-apples with codedb_search. New CLI flags: --codegraph-bin <path> default: $(which codegraph) --skip-codegraph skip the backend entirely --clean-codegraph wipe matching .codegraph/ before indexing Cold-index helper `codegraph_cold_index` invokes `codegraph init` then `codegraph index` and measures wall-clock + .codegraph/ on-disk size. Smoke-tested codegraph-only on flask: cold build: 0.57 s, ~3.7 MB warm queries: 0.2–2 ms p50 (matches the bench numbers from the v0.2.5815 cross-corpus run committed in PR #483) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumps semver to 0.2.5816 and consolidates two follow-up fixes from the v0.2.5815 cross-corpus eval: - #484 feat(cli): add `codedb read` subcommand - #485 fix(search): skip Tier 5 full-scan when trigram returned candidates Measured impact (benchmarks/search-shootout, 20 warm iters): Suspense (regex, 0 hits) 2.82 ms → 0.14 ms (20× faster) useState (regex) p99 16.57 ms → 1.67 ms (10× p99) useState (flask) 0.66 ms → 0.18 ms (3.7× faster) React queries: unchanged ±noise; hit counts identical Recall preserved on every query. Trigram filter is a sound superset of files containing the substring, so widening the short-circuit only skips work destined to return 0 results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T04:29:10Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	435965	434511	-0.33%	-1454	OK
`codedb_changes`	60891	47172	-22.53%	-13719	OK
`codedb_deps`	7890	8621	+9.26%	+731	OK
`codedb_edit`	5352	6739	+25.92%	+1387	NOISE
`codedb_find`	53541	52144	-2.61%	-1397	OK
`codedb_hot`	87375	92975	+6.41%	+5600	OK
`codedb_outline`	255896	266935	+4.31%	+11039	OK
`codedb_read`	89304	93491	+4.69%	+4187	OK
`codedb_search`	121290	129163	+6.49%	+7873	OK
`codedb_snapshot`	260669	252481	-3.14%	-8188	OK
`codedb_status`	12289	11391	-7.31%	-898	OK
`codedb_symbol`	56723	58050	+2.34%	+1327	OK
`codedb_tree`	65762	55681	-15.33%	-10081	OK
`codedb_word`	71053	80241	+12.93%	+9188	NOISE

github-actions · 2026-05-21T05:21:41Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	568130	573241	+0.90%	+5111	OK
`codedb_changes`	62242	63189	+1.52%	+947	OK
`codedb_deps`	10543	11674	+10.73%	+1131	NOISE
`codedb_edit`	7631	8276	+8.45%	+645	OK
`codedb_find`	69582	66821	-3.97%	-2761	OK
`codedb_hot`	110857	112608	+1.58%	+1751	OK
`codedb_outline`	336557	342315	+1.71%	+5758	OK
`codedb_read`	107982	111507	+3.26%	+3525	OK
`codedb_search`	155590	164731	+5.88%	+9141	OK
`codedb_snapshot`	323290	353229	+9.26%	+29939	OK
`codedb_status`	15514	14693	-5.29%	-821	OK
`codedb_symbol`	64089	65813	+2.69%	+1724	OK
`codedb_tree`	86910	63149	-27.34%	-23761	OK
`codedb_word`	93002	92520	-0.52%	-482	OK

justrach · 2026-05-21T06:30:32Z

Superseded — all bundled content landed via individual PR merges (#483/#484/#485/#486/#487/#489) and the v0.2.5817 release rolled the version bump (#490).

justrach and others added 11 commits May 21, 2026 10:59

Merge PR #484: codedb read CLI subcommand

143c0ed

Merge PR #485: skip Tier 5 when trigram returned candidates

e9d8e34

Merge PR #483: v0.2.5815 cross-corpus bench results

0f3574c

Merge PR #486: ACE × codedb integration spec

52a8632

Merge PR #487: codegraph backend in shootout.py

b21c5e7

justrach changed the title ~~release: v0.2.5816 — codedb read CLI + Tier 5 short-circuit~~ release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph May 21, 2026

justrach mentioned this pull request May 21, 2026

release: v0.2.5817 — reader.md auto-prepend + perf + security #490

Merged

7 tasks

justrach closed this May 21, 2026

justrach deleted the release/v0.2.5816 branch May 21, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph#488

release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph#488
justrach wants to merge 11 commits into
mainfrom
release/v0.2.5816

justrach commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

justrach commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Bundled PRs (in merge order)

Measured impact (benchmarks/search-shootout, 20 warm iters)

New CLI surface

New bench surface

What's NOT in this release (deferred follow-ups)

Build verification

Test plan

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

justrach commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justrach commented May 21, 2026 •

edited

Loading