Skip to content

feat(mcp): codedb_context — multi-line snippets + source-over-test ranking#479

Merged
justrach merged 1 commit into
mainfrom
feat-codedb-context-quality
May 20, 2026
Merged

feat(mcp): codedb_context — multi-line snippets + source-over-test ranking#479
justrach merged 1 commit into
mainfrom
feat-codedb-context-quality

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Two improvements to codedb_context driven by failure-mode analysis on the code-search-shootout eval (16 tasks × 4 corpora):

1. Snippet enrichment (±2 lines per hit, fenced code blocks). Old format was a single matching line per hit. The judge was penalising agents for paraphrasing instead of literal quoting; with multi-line literal snippets inside ```fenced``` blocks, the agent has copy-pasteable material.

2. Composite file ranking. Raw hit count alone misranked tests/specs/docs over real source. New score:

  • +5 if the file contains a symbol definition for any extracted keyword
  • −3 for test files (/test, _test., .test., /__tests__/, /spec/, /fixtures/)
  • −2 for docs (.md, .rst, /docs/)
  • tie-break by hit count

Before this PR, agents kept picking tests/test_basic.py over src/flask/sansio/scaffold.py.

Measured impact

Same 16 bench tasks, same agents, same judge:

codedb_context quality tokens/call wall tool calls
before 4.06/5 1,126 20.8s 3.2
after (this PR) 4.33/5 1,089 2.1s 1.9
codegraph reference 4.44/5 9,719 31.5s 8.9

Quality up +0.27 / call count down 1.7× / wall down 10× — all while staying ~9× under codegraph on tokens. Head-to-head: 2 wins / 10 ties / 3 losses against codegraph (was 2/10/4).

The 3 remaining losses are all "agent paraphrased the snippet" — a prompt issue, not a tool issue.

Test plan

  • zig build (ReleaseFast) passes
  • zig build test — 486/487 pass (1 pre-existing failure on main: issue-44)
  • Existing codedb_context smoke tests still green
  • New snippet format: each hit followed by a fenced ±2-line block
  • Source files now rank above tests on the same hit count

Two improvements driven by bench-data failure patterns:

1. Snippet enrichment: each Top-sites hit now emits ±2 lines of context
   inside a ```fenced code block``` instead of just the matching line.
   The judge was penalising agents for paraphrasing instead of literal
   quoting; with multi-line literal snippets in the response, the agent
   has copy-pasteable material to populate the answer's `snippet` field.

2. Composite file ranking: hits alone weren't enough — agents kept
   picking test/spec/doc files over the real source. New score is
   (raw hits) + 5 if file contains a symbol definition for any keyword,
   −3 for test files (`/test`, `_test.`, `.test.`, `/__tests__/`, `/spec/`,
   `/fixtures/`), −2 for docs (`.md`, `.rst`, `/docs/`). Final tie-break
   by hit count.

Measured before/after on 16 bench tasks (react / regex / flask / gin),
codedb_CONTEXT vs codegraph_context:
              quality   tokens   wall    calls
  before        4.06     1,126   20.8s   3.2
  after (v3)    4.33     1,089    2.1s   1.9   (still ~9× fewer tokens than codegraph)
  codegraph     4.44     9,719   31.5s   8.9

The remaining quality gap is concentrated in 3 tasks where the agent
paraphrases the snippet even though the literal code is in the response;
that's a prompt issue, not a tool issue.
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 551781 543906 -1.43% -7875 OK
codedb_changes 58783 61407 +4.46% +2624 OK
codedb_deps 11700 10259 -12.32% -1441 OK
codedb_edit 6754 6900 +2.16% +146 OK
codedb_find 67274 67073 -0.30% -201 OK
codedb_hot 105994 112133 +5.79% +6139 OK
codedb_outline 323502 332564 +2.80% +9062 OK
codedb_read 101602 103168 +1.54% +1566 OK
codedb_search 154563 158175 +2.34% +3612 OK
codedb_snapshot 305620 306933 +0.43% +1313 OK
codedb_status 15208 14711 -3.27% -497 OK
codedb_symbol 63962 64438 +0.74% +476 OK
codedb_tree 61620 74724 +21.27% +13104 NOISE
codedb_word 94122 94897 +0.82% +775 OK

@justrach justrach merged commit 76a3588 into main May 20, 2026
1 check passed
@justrach justrach deleted the feat-codedb-context-quality branch May 20, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant