feat(mcp): overhaul tools — get_symbol exposed, get_answer reworked, page leads tuned by RaghavChamadiya · Pull Request #210 · repowise-dev/repowise

RaghavChamadiya · 2026-05-18T10:30:44Z

Summary

Overhaul of the Repowise MCP surface plus the wiki-page generator that feeds it, so the two compound instead of fighting each other.

MCP server (8 tools, was 7)

get_symbol exposed — agents can now fetch bounded source bytes by canonical path::Name ID without computing line offsets
get_context trimmed to a triage card — removed include=[\"source\"], added a single hotspot bit, decision-record titles, and symbol_id pointers. Ownership, last_change, decisions are now include=-gated
get_risk(changed_files=...) no longer bloats to 76k tokens — drops global hotspots in PR mode, caps co-change/transitive lists, emits a directive block (will_break, missing_cochanges, missing_tests) for one-glance PR review
search_codebase — new kind filter (implementation/test/config/doc), search_method per result (embedding vs bm25 fallback), Grep hint when query is a bareword identifier
Staleness signal in _meta — index_age_days always present; stale_warning fires only when repository.head_commit ≠ live .git/HEAD. Silence is the signal; the field stays trustworthy
Every tool's docstring rewritten to lead with what only that tool answers

`get_answer` pipeline (new modules `_answer_pipeline.py`, `_answer_context.py`)

Hybrid retrieval — FTS + vector store in parallel, merged via Reciprocal Rank Fusion, scaled into BM25 range so existing gates calibrate
PageRank bias — damped, normalised within the candidate set; tie-breaker not override
1-hop graph expansion — rescues near-misses where retrieval landed in the right module but on the wrong file (consumer vs orchestrator)
Decision-record fusion for "why" questions — get_why ADRs are injected into the LLM context block so get_answer and get_why finally cohere
Structured prelude — top symbols, recent significant commits, decision titles prepended before file excerpts
Calibrated confidence + separate retrieval_quality — synthesis quality and retrieval quality reported on different axes
best_guesses on low-confidence path — agent gets ranked candidates with one-line justifications instead of an empty answer
Schema-versioned cache — bumped to v3; older payloads auto-invalidate

Wiki page generation

The MCP retrieval upgrades only matter if the corpus has signal-dense leads. The first 200–500 chars of a page dominate the embedding signal and were previously boilerplate metadata.

file_page / module_page / repo_overview system prompts now mandate a role-leading first sentence with architectural vocabulary (orchestrator, entry point, parser, adapter, …)
file_page.j2 collapses 8 bullet metadata into a compact line plus a prose **Architectural signals:** line that translates raw metrics (PageRank 0.0234) into vocabulary ("entry point", "central in the dependency graph", "bridge between modules"). Raw numerics moved to a "do NOT recite verbatim" footer
module_page.j2 and repo_overview.j2 get the same treatment

CLAUDE.md generator

The stale 9-tool table referenced three tools that don't exist (update_decision_records, get_dependency_path, get_architecture_diagram). Replaced with the actual 8-tool surface, with each row leading with the tool's unique value and a "verify when…" line tied to the new stale_warning / retrieval_quality / search_method signals.

Test plan

Smoke import: python -c \"from repowise.server.mcp_server import get_symbol, get_answer\" succeeds
repowise update --force regenerates pages under the new prompts
Restart MCP, ask get_answer(\"how does a repository get indexed end-to-end?\") — should now surface orchestrator.py via vector retrieval + graph expansion, not tool_symbol.py
get_context([\"packages/cli/src/repowise/cli/commands/init_cmd.py\"]) returns hotspot: true, symbol_id per symbol, no ownership/last_change unless requested
get_risk(targets=[\"X\"], changed_files=[\"X\", \"Y\"]) returns the new directive block and stays under ~8k tokens
search_codebase(\"PageGenerator\") returns the grep_hint for bareword identifier queries
On a clean repo, _meta.stale_warning is absent; after committing without re-indexing, it appears with the live HEAD short SHA

…page leads tuned MCP server (8 tools, was 7): - Expose get_symbol so agents can fetch bounded source bytes by ID - Remove include=["source"] from get_context; trim defaults to a triage card (hotspot bit, decision-record titles, symbol_id pointers) so the tool is a router into get_symbol / get_risk / get_why, not a kitchen sink - get_risk PR mode: drop global hotspots, cap co-change/transitive lists, emit a directive block (will_break, missing_cochanges, missing_tests) - search_codebase: kind filter, search_method per result, grep hint for bareword identifier queries - Every response carries _meta.index_age_days and indexed_commit; a stale_warning fires only when the indexed HEAD diverges from live .git/HEAD (calibrated to stay silent on fresh indexes so the field remains trustworthy) - Rewrite every tool's docstring to lead with what only that tool answers get_answer pipeline (new modules _answer_pipeline.py, _answer_context.py): - Hybrid retrieval: FTS + vector store run in parallel, merged via RRF, scaled into the BM25 score range so existing gates still calibrate - PageRank bias on retrieval scores (damped, normalised within candidate set) - 1-hop graph expansion of top hits to rescue near-misses - Decision-record fusion for "why"-shaped questions - Structured prelude block (top symbols, recent significant commits, decision titles) prepended to the LLM context - Calibrated confidence + separate retrieval_quality signal so the two axes (synthesis quality vs retrieval quality) report independently - Structured best_guesses with one-line justifications on low-confidence return path instead of an empty answer - Schema-versioned cache (bumped to v3) so payloads from earlier pipelines auto-invalidate without manual migration Wiki page generation (so the MCP retrieval upgrade actually finds what it should): - file_page / module_page / repo_overview system prompts now mandate a role-leading first sentence with architectural vocabulary (orchestrator, entry point, parser, adapter, …) — the first 200–500 chars dominate the embedding signal and were previously boilerplate - file_page.j2 collapses bullet metadata into a compact line plus an "Architectural signals" prose line that translates raw metrics (PageRank 0.0234) into vocabulary ("entry point", "central in the dependency graph", "bridge between modules") - module_page.j2 and repo_overview.j2 get the same treatment, with the raw numerics moved to a "do NOT recite verbatim" footer CLAUDE.md generator (claude_md.j2, workspace_claude_md.j2): - Replace the stale 9-tool table that listed three unimplemented tools (update_decision_records, get_dependency_path, get_architecture_diagram) with the actual 8-tool surface - Each row leads with what only that tool answers; add composition tips and a "verify when..." line tied to the new stale_warning / retrieval_quality / search_method signals

…text overhaul - Bump 'Seven MCP tools' → 'Eight MCP tools' wherever it appears - Rewrite the MCP-tools table so each row leads with what only that tool answers; add get_symbol, drop include=["source"] from get_context, and surface the new signals (retrieval_quality, best_guesses, search_method, hotspot bit, decision_records pointer, PR-mode directive block) - Note the _meta staleness signal (index_age_days / indexed_commit / stale_warning fires only on real HEAD divergence) - Update tool-count cells in the task-comparison table and the competitor matrix

RaghavChamadiya requested a review from swati510 as a code owner May 18, 2026 10:30

swati510 approved these changes May 18, 2026

View reviewed changes

RaghavChamadiya merged commit ee1b71f into main May 18, 2026
5 checks passed

RaghavChamadiya deleted the feat/mcp-tools-overhaul branch May 18, 2026 10:39

RaghavChamadiya mentioned this pull request May 18, 2026

release: v0.10.0 — code health layer, MCP overhaul, doc-gen upgrade, C4 diagrams #213

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): overhaul tools — get_symbol exposed, get_answer reworked, page leads tuned#210

feat(mcp): overhaul tools — get_symbol exposed, get_answer reworked, page leads tuned#210
RaghavChamadiya merged 2 commits into
mainfrom
feat/mcp-tools-overhaul

RaghavChamadiya commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RaghavChamadiya commented May 18, 2026

Summary

MCP server (8 tools, was 7)

get_answer pipeline (new modules _answer_pipeline.py, _answer_context.py)

Wiki page generation

CLAUDE.md generator

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`get_answer` pipeline (new modules `_answer_pipeline.py`, `_answer_context.py`)