release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph#488
Closed
justrach wants to merge 11 commits into
Closed
release: v0.2.5816 — read CLI + Tier 5 fix + bench data + ACE spec + shootout codegraph#488justrach wants to merge 11 commits into
justrach wants to merge 11 commits into
Conversation
…h vs lean-ctx (2026-05-21) Per-corpus search-latency runs against the released v0.2.5815 binary (/opt/homebrew/bin/codedb, SHA 51164cf9…e687d25f) on three corpora: - react (6,620 files) — runs 1 and 2 for stability - regex (285 files) - flask (127 files) Backends compared (default tools): - codedb_search (MCP) - codegraph_search (codegraph 0.7.10 MCP, `codegraph serve --mcp`) - lean-ctx grep (lean-ctx 3.6.9 CLI, per-call spawn) - SQLite FTS5 trigram + unicode61 (inverted-index baselines) Two outliers from prior RESULTS.md are gone on this binary: - xyzzy_react_does_not_exist (negative) 113 ms → 0.07 ms (~1,600×) - flushPassiveEffects (rare camelcase) 167 ms → 0.15 ms (~1,100×) - cold build (react, 6,620 files) 12.1 s → 1.18 s (~10×) codedb wins 13/15 react warm queries vs codegraph. codegraph wins on the two highest-frequency stress queries (`function`, `set`) where codedb falls back to a slower path on >5k hits. Headline numbers and the per-task Sonnet 4.6 agentic eval are now in the v0.2.5815 release notes: https://github.com/justrach/codedb/releases/tag/v0.2.5815 Follow-up: wire codegraph backend into shootout.py multi-session launcher (currently runs only codedb / fts5 / lean-ctx; codegraph results in this commit were collected via a sibling harness). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the codedb_read MCP tool surface. Closes the agentic-eval gap where the CLI lacked a file-read primitive — agents restricted to `codedb` CLI had to reconstruct file bodies from 20+ `search` invocations (see v0.2.5815 release-notes agentic eval: codedb 22 calls / 114 s vs codegraph 4 / 29 s). Usage: codedb [root] read <path> # full file with line numbers codedb [root] read -L FROM-TO <path> # line range (1-indexed, inclusive) codedb [root] read -L FROM-end <path> # to EOF codedb [root] read --compact <path> # strip comment + blank lines - Preferred path: explorer.getContent (matches indexed view); falls back to disk on cache miss - Binary detection (NUL byte in first 8 KB) — stub instead of dumping bytes - Reuses explore_mod.extractLines (already covered by tests.zig) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier 5 (full-scan fallback) was running whenever Tier 1's trigram-filtered candidate scan returned 0 results, even though the trigram filter is by construction a SUPERSET of files containing the substring. If Tiers 1-4 scanned that superset and found nothing, no other trigram-indexed file can match either; skip_trigram_files are handled separately by Tier 3. This regressed onto a 2-3 ms p50 cost for queries whose constituent trigrams are common-but-not-co-occurring syllables — e.g. `Suspense` on a Rust corpus (regex): before: Suspense p50 2.95 ms hits=0 after: Suspense p50 0.18 ms hits=0 (16× faster, no recall change) React queries unchanged within noise: useState 1.85 → 2.65 ms (within p50 jitter; hits=20 unchanged) forwardRef 0.25 → 0.23 ms Fiber 0.35 → 0.32 ms function 16.07 → 15.71 ms (Tier 1 path, not Tier 5) The pre-existing `cp.len == 0` sub-case (e.g. `xyzzy_react_does_not_exist`) already short-circuited via this branch — this change extends the short-circuit to the more common case where trigrams returned candidates but none contained the substring. Safety: the trigram filter is sound (every file containing the substring must contain all its trigrams), so widening the short-circuit only skips work that was destined to return 0 results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Design draft sketching how codedb_context's ranking could benefit from a per-project Skillbook (boost/penalty path globs + keyword synonyms) learned by an external loop, without absorbing ACE's reflection machinery into codedb itself. Headline shape: - codedb owns deterministic, sub-ms read/write of a per-project skillbook.json - ACE (or any other learner) owns trace reflection + skill synthesis - Interface: `codedb_skillbook_update` MCP tool Three skill kinds for v0: path_boost, path_penalty, keyword_synonym. The doc commits to nothing yet — it preserves the option and gives future implementers/rejectors a concrete shape to work against rather than re-arguing "what if learning." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the codegraph 0.7.10 backend into the single-session + multi-session launcher alongside codedb / fts5_tri / fts5_uni / lean-ctx. Uses `codegraph serve --mcp` as a long-lived stdio child and invokes `codegraph_search` as the default symbol-lookup tool — apples-to-apples with codedb_search. New CLI flags: --codegraph-bin <path> default: $(which codegraph) --skip-codegraph skip the backend entirely --clean-codegraph wipe matching .codegraph/ before indexing Cold-index helper `codegraph_cold_index` invokes `codegraph init` then `codegraph index` and measures wall-clock + .codegraph/ on-disk size. Smoke-tested codegraph-only on flask: cold build: 0.57 s, ~3.7 MB warm queries: 0.2–2 ms p50 (matches the bench numbers from the v0.2.5815 cross-corpus run committed in PR #483) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps semver to 0.2.5816 and consolidates two follow-up fixes from the v0.2.5815 cross-corpus eval: - #484 feat(cli): add `codedb read` subcommand - #485 fix(search): skip Tier 5 full-scan when trigram returned candidates Measured impact (benchmarks/search-shootout, 20 warm iters): Suspense (regex, 0 hits) 2.82 ms → 0.14 ms (20× faster) useState (regex) p99 16.57 ms → 1.67 ms (10× p99) useState (flask) 0.66 ms → 0.18 ms (3.7× faster) React queries: unchanged ±noise; hit counts identical Recall preserved on every query. Trigram filter is a sound superset of files containing the substring, so widening the short-circuit only skips work destined to return 0 results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Rolls up 5 PRs into a single release bundle. Bumps
src/release_info.zig0.2.5815 → 0.2.5816, ships two perf/UX fixes from the v0.2.5815 cross-corpus eval, the supporting bench data, the canonical shootout.py update, and a design spec for ACE integration.Bundled PRs (in merge order)
codedb readsubcommand #484feat(cli): add \codedb read` subcommand` — closes the agentic-eval CLI gap (codedb agent had been forced to use 22 calls vs codegraph's 4 because the CLI had no read primitive)fix(search): skip Tier 5 full-scan when trigram returned candidates— the trigram filter is a sound SUPERSET of files containing the substring; if Tier 1 exhausted it with 0 results, Tier 5's full scan was destined to return 0 toobench(eval): v0.2.5815 cross-corpus head-to-head— 4 run reports + run.log persisted underbenchmarks/search-shootout/results/2026-05-21/docs(design): ACE × codedb integration spec— design-only; sketches how codedb_context could grow a per-project Skillbook learned by an external loop, without absorbing ACE's reflection machinerybench(shootout): add codegraph backend to shootout.py— wirescodegraph serve --mcp+codegraph_searchinto the multi-session launcher (5 backends now: codedb / fts5_tri / fts5_uni / lean-ctx / codegraph)Measured impact (benchmarks/search-shootout, 20 warm iters)
Recall preserved on every query — hit counts identical to v0.2.5815 baseline.
New CLI surface
```
codedb [root] read # full file with line numbers
codedb [root] read -L FROM-TO # 1-indexed inclusive range
codedb [root] read -L FROM-end # to EOF
codedb [root] read --compact # strip comment + blank lines
```
New bench surface
```$(which codegraph) # default: $ (shutil.which "codegraph")
python3 shootout.py --corpus \
--codegraph-bin
[--skip-codegraph] [--clean-codegraph]
```
What's NOT in this release (deferred follow-ups)
Build verification
```
$ /tmp/codedb-fixes/zig-out/bin/codedb --version
codedb 0.2.5816
```
Test plan
🤖 Generated with Claude Code