feat(mcp): codedb_context — task-shaped composer (1 call replaces 3-5) by justrach · Pull Request #477 · justrach/codedb

justrach · 2026-05-20T17:10:09Z

Summary

Adds a new MCP tool codedb_context that takes a natural-language task and returns ONE composite block:

Keywords used — what the composer extracted from the task
Symbol definitions — findAllSymbols hits for each keyword
Most-relevant files — ranked by total content-match count across keywords
Top sites — per file, up to 3 file:line snippet lines

This is the codegraph-style "first-touch" tool — one call to orient on an unfamiliar task — but it composes only the engine primitives codedb already has, no new index structures.

Identifier extraction heuristic

Admits:

snake_case (any underscore): find_iter, pair_freq
all-caps acronyms, 3-8 chars: API, TODO, URL
camelCase/PascalCase with internal lower→upper transition: useEffect, getNextLanes, scheduleUpdateOnFiber

Rejects sentence-leading capitalized English words (Find, React, Want, Compare) that incidentally start with a capital but have no internal transition. Also accepts "…" and `…` quoted strings literally; apostrophes are deliberately NOT treated as quotes so it's doesn't open a fake quote.

Capped at 5 keywords / 20 content hits per keyword / 5 ranked files / 3 snippets per file — bounded output, no pathological responses.

Measured tokens on the 4 react bench tasks

task	keywords extracted	tokens
T0_getNextLanes	`getNextLanes`	347
T1_setState_trace	`setState`, `enqueueUpdate`, `scheduleUpdateOnFiber`	689
T2_snapshot_flag_sites	`renderRootSync`, `CompleteWork`	353
T3_compare	`ensureRootIsScheduled`, `scheduleUpdateOnFiber`	629

Average: ≈505 tokens/call. For comparison, codegraph_context on the same corpus averaged ≈1050 tokens/call in the §17 shootout; codedb_search after PR #476 averages ≈645 tokens/call but returns a single flat search rather than a composed bundle.

When to pick each tool

codedb_context — first call on an unfamiliar task (composes search + symbols + ranking).
codedb_search — narrow follow-up: "more matches for that one substring".
codedb_symbol — exact-name definition lookup.
codedb_callers — call-site walk after you know the symbol.

The composer doesn't replace the focused tools; it replaces the 3-5 chained calls an agent would otherwise issue to bootstrap context.

Test plan

zig build (ReleaseFast) passes
zig build test — 486/487 pass (1 pre-existing failure on main: issue-44)
All 4 react bench tasks: keywords extracted correctly, top file in answer
Empty / too-short / too-long task strings rejected with clear error
No regression on existing tools (codedb_search, codedb_word, etc.) — only adds an enum value + dispatch case + handler fn

Adds a new MCP tool that takes a natural-language task and returns ONE tight bundle: extracted keywords + symbol definitions + ranked files + top file:line snippets per file. Targets feature parity with codegraph_context (the codegraph project's killer first-touch tool) on per-call token economy. How it works: - Extract candidate identifiers from the task string. Heuristic admits snake_case (any underscore), all-caps acronyms 3-8 chars (API, TODO), and camelCase/PascalCase with an internal lower→upper transition (useEffect, getNextLanes). Rejects sentence-leading English words ("Find", "React", "Want") that just happen to be capitalized. - For each candidate (capped at 5): one findAllSymbols + one searchContent call, both into a function-scoped arena so we don't track per-result frees. Aggregate hits per file. - Render: keywords block, symbol definitions block, top 5 files ranked by total match count, then 3 file:line snippets per file. Measured on 4 react bench tasks (T0-T3): T0_getNextLanes 347 tok → correctly extracts `getNextLanes`, top file ReactFiberRootScheduler.js T1_setState_trace 689 tok → extracts setState, enqueueUpdate, scheduleUpdateOnFiber T2_snapshot_flag_sites 353 tok → renderRootSync + CompleteWork, top file ReactFiberWorkLoop.js (11 hits) T3_compare 629 tok → ensureRootIsScheduled + scheduleUpdateOnFiber ------ AVG ~505 tokens/call vs ~1050 tokens/call observed for codegraph_context on the same corpus; vs ~645 tokens/call for codedb_search default (single-keyword, no symbol defs, no ranking). For narrow follow-ups, codedb_search and codedb_symbol still win on focus. codedb_context is the first-touch tool for unfamiliar territory.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68473734b6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T17:12:51Z

        .codedb_query => handleQuery(alloc, args, out, ctx.explorer, ctx.store),
        .codedb_glob => handleGlob(alloc, args, out, ctx.explorer),
        .codedb_ls => handleLs(alloc, args, out, ctx.explorer),
+        .codedb_context => handleContext(alloc, args, out, ctx.explorer),


Gate codedb_context on scan readiness

The new codedb_context path is dispatched here, but it is not included in toolDependsOnScannedIndex, so it skips the startup wait and also skips the incomplete-results hint logic. In the common "call immediately after server start" case, this tool can return partial/empty ranked files and snippets while presenting them as final output, which is a correctness regression compared with other index-backed tools. Please include codedb_context in the scan-dependent tool set so behavior matches codedb_search/codedb_symbol during initial indexing.

Useful? React with 👍 / 👎.

github-actions · 2026-05-20T17:13:50Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	508721	505493	-0.63%	-3228	OK
`codedb_changes`	54091	58184	+7.57%	+4093	OK
`codedb_deps`	9466	9138	-3.47%	-328	OK
`codedb_edit`	10677	7260	-32.00%	-3417	OK
`codedb_find`	61909	63248	+2.16%	+1339	OK
`codedb_hot`	104731	111826	+6.77%	+7095	OK
`codedb_outline`	310505	319360	+2.85%	+8855	OK
`codedb_read`	97679	99443	+1.81%	+1764	OK
`codedb_search`	145082	148848	+2.60%	+3766	OK
`codedb_snapshot`	302867	299817	-1.01%	-3050	OK
`codedb_status`	13161	13094	-0.51%	-67	OK
`codedb_symbol`	64909	65653	+1.15%	+744	OK
`codedb_tree`	83896	87428	+4.21%	+3532	OK
`codedb_word`	84896	88692	+4.47%	+3796	OK

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

justrach merged commit 1d91a0d into main May 20, 2026
1 check passed

justrach mentioned this pull request May 20, 2026

release: v0.2.5815 — consolidate 5814 max_cached wiring + main's session perf+context PRs #480

Merged

4 tasks

justrach deleted the feat-codedb-context branch May 20, 2026 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): codedb_context — task-shaped composer (1 call replaces 3-5)#477

feat(mcp): codedb_context — task-shaped composer (1 call replaces 3-5)#477
justrach merged 1 commit into
mainfrom
feat-codedb-context

justrach commented May 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 20, 2026

Summary

Identifier extraction heuristic

Measured tokens on the 4 react bench tasks

When to pick each tool

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant