Release v0.2.5817 — reader.md + perf + security · justrach/codedb

TL;DR

v0.2.5817 ships reader.md — a hash-stable, agent-authored codebase map that codedb auto-prepends to codedb_context responses. Plus the perf + security bundle from v0.2.5816 (which never got tagged), plus three new codedb_context enhancements (inline symbol bodies, callers section, task-length gate).

Highlights vs the released v0.2.5815:

	v0.2.5815	v0.2.5817
`Suspense` regex p50	2.82 ms	0.18 ms (15.6× faster)
`useState` regex p99	16.57 ms	2.04 ms (8.1× faster)
`codedb read` CLI	absent	present (with security guards)
Sensitive-file blocking	n/a	blocked (`.env`, `id_rsa`, `.ssh/*`)
`.codedb/reader.md` support	n/a	present (auto-prepend, hash-verified)
codedb_context inline bodies	no	yes (≤6 lines for ≤3 symbols)
codedb_context callers section	no	yes (top 6 non-test execution sites)

End-to-end agent eval (Sonnet 4.6, n=3 per task) shows the v0.2.5817 binary cuts median tool calls on every task: T1 flask 5→4, T2 regex 13→7, T3 react 13→10.

New — reader.md auto-prepend

A .codedb/reader.md file (≤200 LOC of markdown, with a blake2b source_hash over up-to-20 listed source files) gets auto-prepended to every codedb_context response. When source files drift, codedb emits a "regenerate" hint; when missing, it's silent.

Lifecycle:

agent calls codedb_context
       ↓
       codedb loads .codedb/reader.md
       ↓
       blake2b(sorted source_files) == declared_hash?
       ├─ yes → prepend body with `<!-- reader.md (hash-verified): -->` markers
       ├─ no  → prepend "stale, regenerate" hint
       └─ missing/malformed → silent
       ↓
       (existing composer output follows)

The agent regenerates reader.md (≤200 LOC budget, picks ≤10 source_files, computes blake2b) when it sees the stale signal. Codedb itself never writes the file.

Security guards (all close P1 review findings):

source_files rejects absolute paths and .. traversal — no reading /etc/passwd via a hostile reader.md
source_files capped at 20 entries — no 600-entry × 8 MB DoS on every context call
loc_actual capped at 240 — no 60 KB body bloat
Golden blake2b roundtrip test locks the algorithm against std-library drift

The runtime overhead when reader.md is missing is ~0.1 ms (one stat + early return). When present and valid, recomputing the hash on every call adds another ~0.1 ms on small source_files.

Task-length gate: reader.md prepend is skipped for tasks ≤80 chars (narrow lookups where the composer's keyword extractor already pinpoints the answer). This avoids the ~5 KB body overhead on tasks that don't need orientation.

New — `codedb_context` symbol-body inline

When ## Symbol definitions has ≤3 entries, inline the first ~6 lines of each so the agent doesn't need a follow-up codedb_read:

## Symbol definitions
- before_request (function) — src/flask/sansio/scaffold.py:460
         460 |     def before_request(self, f: T_before_request) -> T_before_request:
         461 |         """Register a function to run before each request.
         462 |
         463 |         For example, this can be used to open a database connection, or
         464 |         to load the logged in user from the session.
         465 |

New — `codedb_context` callers section

For each ≤3 symbol_definition, surface up to 2 non-definition, non-test, non-import call sites with their enclosing scope:

## Callers (top non-test, non-import usages of these symbols)
- src/flask/app.py:1369: ... :attr:`before_request_funcs`
  [in preprocess_request (function, L1366-L1392)]

That's literally the execution site the agent would have followed up for — pre-resolved in the first response.

Bundled from v0.2.5816 (never tagged)

PR #484 — codedb read <path> CLI subcommand (full file, -L FROM-TO, --compact)
- P1 security: isPathSafe + watcher.isSensitivePath guards
- P2 correctness: opens project root, not cwd
PR #485 — fix(search): skip Tier 5 full-scan when trigram returned candidates
- Suspense regex query: 2.82 ms → 0.18 ms (15.6× faster)
- useState regex p99: 16.57 ms → 2.04 ms (8.1× faster)
- No recall regression — trigram filter is a sound superset
PR #487 — shootout.py codegraph backend (multi-session bench against codegraph 0.7.10)
PR #486 — ACE × codedb integration spec (design only)
PR #483 — v0.2.5815 cross-corpus bench data

Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)

Task	main median calls	exp median calls	Δ
T1 flask "find before_request decorator" (28 chars)	5	4	−1 ✓
T2 regex "where is pattern compiled" (235 chars)	13	7	−6 ✓
T3 react "passive effects flush" (230 chars)	13	10	−3 ✓

9/9 runs across the matrix returned correct answers.

Notarization & verification

All three binaries built locally on Apple Silicon. macOS binaries signed with Developer ID Application: Rachit Pradhan (WWP9DLJ27P) + hardened runtime + secure timestamp, notarized via Apple notary service.

binary	notary submission	gatekeeper
`codedb-darwin-arm64`	`576628b8-4f16-4a09-9e7b-917f51664033` — Accepted	`accepted, source=Notarized Developer ID`
`codedb-darwin-x86_64`	`5f763d62-01c2-4245-9e6c-cc37cceec996` — Accepted	`accepted, source=Notarized Developer ID`
`codedb-linux-x86_64`	n/a (statically linked, ~13 MB)	smoke-tested via emulated docker linux/amd64 — `codedb --version` + tree command both green

Verify the macOS download:

shasum -a 256 codedb-darwin-arm64
# expected: dea15a25a088f3b05d620e7a119377d09703c4e73512e35479819542c6c763c6

spctl -a -vv -t install codedb-darwin-arm64
# expected: accepted, source=Notarized Developer ID

Full SHA-256 list in checksums.sha256.

What's deferred (not blockers)

Critical-review pass from a Sonnet 4.6 sub-agent identified 11 issues. The 2 P1 (security) and 2 P2 (correctness) issues are closed in this release. P2/P3 follow-ups for the next cycle:

I04 schema_version parsed but not validated (cosmetic — only matters at format v2)
I05 reader.md not cached across calls (~0.1 ms per call; matters at scale)
I06 codedb_status doesn't surface reader.md state (small ergonomic gap)
I09 stale hint doesn't include the previous source_files list
I10 concurrent-write last-write-wins not documented
I11 cost-benefit gate for shallow workloads (partial — task-length gate handles the codedb_context side)

Full changelog: v0.2.5815…v0.2.5817

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.5817 — reader.md + perf + security

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

TL;DR

New — reader.md auto-prepend

New — `codedb_context` symbol-body inline

New — `codedb_context` callers section

Bundled from v0.2.5816 (never tagged)

Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)

Notarization & verification

What's deferred (not blockers)

Uh oh!

v0.2.5817 — reader.md + perf + security

TL;DR

New — reader.md auto-prepend

New — codedb_context symbol-body inline

New — codedb_context callers section

Bundled from v0.2.5816 (never tagged)

Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)

Notarization & verification

What's deferred (not blockers)

Uh oh!

New — `codedb_context` symbol-body inline

New — `codedb_context` callers section