feat(mcp): codedb_context — multi-line snippets + source-over-test ranking by justrach · Pull Request #479 · justrach/codedb

justrach · 2026-05-20T19:05:10Z

Summary

Two improvements to codedb_context driven by failure-mode analysis on the code-search-shootout eval (16 tasks × 4 corpora):

1. Snippet enrichment (±2 lines per hit, fenced code blocks). Old format was a single matching line per hit. The judge was penalising agents for paraphrasing instead of literal quoting; with multi-line literal snippets inside ```fenced``` blocks, the agent has copy-pasteable material.

2. Composite file ranking. Raw hit count alone misranked tests/specs/docs over real source. New score:

+5 if the file contains a symbol definition for any extracted keyword
−3 for test files (/test, _test., .test., /__tests__/, /spec/, /fixtures/)
−2 for docs (.md, .rst, /docs/)
tie-break by hit count

Before this PR, agents kept picking tests/test_basic.py over src/flask/sansio/scaffold.py.

Measured impact

Same 16 bench tasks, same agents, same judge:

codedb_context	quality	tokens/call	wall	tool calls
before	4.06/5	1,126	20.8s	3.2
after (this PR)	4.33/5	1,089	2.1s	1.9
codegraph reference	4.44/5	9,719	31.5s	8.9

Quality up +0.27 / call count down 1.7× / wall down 10× — all while staying ~9× under codegraph on tokens. Head-to-head: 2 wins / 10 ties / 3 losses against codegraph (was 2/10/4).

The 3 remaining losses are all "agent paraphrased the snippet" — a prompt issue, not a tool issue.

Test plan

zig build (ReleaseFast) passes
zig build test — 486/487 pass (1 pre-existing failure on main: issue-44)
Existing codedb_context smoke tests still green
New snippet format: each hit followed by a fenced ±2-line block
Source files now rank above tests on the same hit count

Two improvements driven by bench-data failure patterns: 1. Snippet enrichment: each Top-sites hit now emits ±2 lines of context inside a ```fenced code block``` instead of just the matching line. The judge was penalising agents for paraphrasing instead of literal quoting; with multi-line literal snippets in the response, the agent has copy-pasteable material to populate the answer's `snippet` field. 2. Composite file ranking: hits alone weren't enough — agents kept picking test/spec/doc files over the real source. New score is (raw hits) + 5 if file contains a symbol definition for any keyword, −3 for test files (`/test`, `_test.`, `.test.`, `/__tests__/`, `/spec/`, `/fixtures/`), −2 for docs (`.md`, `.rst`, `/docs/`). Final tie-break by hit count. Measured before/after on 16 bench tasks (react / regex / flask / gin), codedb_CONTEXT vs codegraph_context: quality tokens wall calls before 4.06 1,126 20.8s 3.2 after (v3) 4.33 1,089 2.1s 1.9 (still ~9× fewer tokens than codegraph) codegraph 4.44 9,719 31.5s 8.9 The remaining quality gap is concentrated in 3 tasks where the agent paraphrases the snippet even though the literal code is in the response; that's a prompt issue, not a tool issue.

github-actions · 2026-05-20T19:07:43Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	551781	543906	-1.43%	-7875	OK
`codedb_changes`	58783	61407	+4.46%	+2624	OK
`codedb_deps`	11700	10259	-12.32%	-1441	OK
`codedb_edit`	6754	6900	+2.16%	+146	OK
`codedb_find`	67274	67073	-0.30%	-201	OK
`codedb_hot`	105994	112133	+5.79%	+6139	OK
`codedb_outline`	323502	332564	+2.80%	+9062	OK
`codedb_read`	101602	103168	+1.54%	+1566	OK
`codedb_search`	154563	158175	+2.34%	+3612	OK
`codedb_snapshot`	305620	306933	+0.43%	+1313	OK
`codedb_status`	15208	14711	-3.27%	-497	OK
`codedb_symbol`	63962	64438	+0.74%	+476	OK
`codedb_tree`	61620	74724	+21.27%	+13104	NOISE
`codedb_word`	94122	94897	+0.82%	+775	OK

justrach merged commit 76a3588 into main May 20, 2026
1 check passed

justrach mentioned this pull request May 20, 2026

release: v0.2.5815 — consolidate 5814 max_cached wiring + main's session perf+context PRs #480

Merged

4 tasks

justrach deleted the feat-codedb-context-quality branch May 20, 2026 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): codedb_context — multi-line snippets + source-over-test ranking#479

feat(mcp): codedb_context — multi-line snippets + source-over-test ranking#479
justrach merged 1 commit into
mainfrom
feat-codedb-context-quality

justrach commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 20, 2026

Summary

Measured impact

Test plan

Uh oh!

github-actions Bot commented May 20, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant