feat(mcp): overhaul tools — get_symbol exposed, get_answer reworked, page leads tuned#210
Merged
Merged
Conversation
…page leads tuned
MCP server (8 tools, was 7):
- Expose get_symbol so agents can fetch bounded source bytes by ID
- Remove include=["source"] from get_context; trim defaults to a triage
card (hotspot bit, decision-record titles, symbol_id pointers) so the
tool is a router into get_symbol / get_risk / get_why, not a kitchen sink
- get_risk PR mode: drop global hotspots, cap co-change/transitive lists,
emit a directive block (will_break, missing_cochanges, missing_tests)
- search_codebase: kind filter, search_method per result, grep hint for
bareword identifier queries
- Every response carries _meta.index_age_days and indexed_commit; a
stale_warning fires only when the indexed HEAD diverges from live
.git/HEAD (calibrated to stay silent on fresh indexes so the field
remains trustworthy)
- Rewrite every tool's docstring to lead with what only that tool answers
get_answer pipeline (new modules _answer_pipeline.py, _answer_context.py):
- Hybrid retrieval: FTS + vector store run in parallel, merged via RRF,
scaled into the BM25 score range so existing gates still calibrate
- PageRank bias on retrieval scores (damped, normalised within candidate set)
- 1-hop graph expansion of top hits to rescue near-misses
- Decision-record fusion for "why"-shaped questions
- Structured prelude block (top symbols, recent significant commits,
decision titles) prepended to the LLM context
- Calibrated confidence + separate retrieval_quality signal so the two
axes (synthesis quality vs retrieval quality) report independently
- Structured best_guesses with one-line justifications on low-confidence
return path instead of an empty answer
- Schema-versioned cache (bumped to v3) so payloads from earlier pipelines
auto-invalidate without manual migration
Wiki page generation (so the MCP retrieval upgrade actually finds what it
should):
- file_page / module_page / repo_overview system prompts now mandate a
role-leading first sentence with architectural vocabulary (orchestrator,
entry point, parser, adapter, …) — the first 200–500 chars dominate the
embedding signal and were previously boilerplate
- file_page.j2 collapses bullet metadata into a compact line plus an
"Architectural signals" prose line that translates raw metrics
(PageRank 0.0234) into vocabulary ("entry point", "central in the
dependency graph", "bridge between modules")
- module_page.j2 and repo_overview.j2 get the same treatment, with the
raw numerics moved to a "do NOT recite verbatim" footer
CLAUDE.md generator (claude_md.j2, workspace_claude_md.j2):
- Replace the stale 9-tool table that listed three unimplemented tools
(update_decision_records, get_dependency_path, get_architecture_diagram)
with the actual 8-tool surface
- Each row leads with what only that tool answers; add composition tips
and a "verify when..." line tied to the new stale_warning /
retrieval_quality / search_method signals
…text overhaul - Bump 'Seven MCP tools' → 'Eight MCP tools' wherever it appears - Rewrite the MCP-tools table so each row leads with what only that tool answers; add get_symbol, drop include=["source"] from get_context, and surface the new signals (retrieval_quality, best_guesses, search_method, hotspot bit, decision_records pointer, PR-mode directive block) - Note the _meta staleness signal (index_age_days / indexed_commit / stale_warning fires only on real HEAD divergence) - Update tool-count cells in the task-comparison table and the competitor matrix
swati510
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Overhaul of the Repowise MCP surface plus the wiki-page generator that feeds it, so the two compound instead of fighting each other.
MCP server (8 tools, was 7)
get_symbolexposed — agents can now fetch bounded source bytes by canonicalpath::NameID without computing line offsetsget_contexttrimmed to a triage card — removedinclude=[\"source\"], added a singlehotspotbit, decision-record titles, andsymbol_idpointers. Ownership, last_change, decisions are nowinclude=-gatedget_risk(changed_files=...)no longer bloats to 76k tokens — drops global hotspots in PR mode, caps co-change/transitive lists, emits adirectiveblock (will_break,missing_cochanges,missing_tests) for one-glance PR reviewsearch_codebase— newkindfilter (implementation/test/config/doc),search_methodper result (embeddingvsbm25fallback), Grep hint when query is a bareword identifier_meta—index_age_daysalways present;stale_warningfires only whenrepository.head_commit ≠live.git/HEAD. Silence is the signal; the field stays trustworthyget_answerpipeline (new modules_answer_pipeline.py,_answer_context.py)get_whyADRs are injected into the LLM context block soget_answerandget_whyfinally cohereretrieval_quality— synthesis quality and retrieval quality reported on different axesbest_guesseson low-confidence path — agent gets ranked candidates with one-line justifications instead of an empty answerWiki page generation
The MCP retrieval upgrades only matter if the corpus has signal-dense leads. The first 200–500 chars of a page dominate the embedding signal and were previously boilerplate metadata.
file_page/module_page/repo_overviewsystem prompts now mandate a role-leading first sentence with architectural vocabulary (orchestrator, entry point, parser, adapter, …)file_page.j2collapses 8 bullet metadata into a compact line plus a prose**Architectural signals:**line that translates raw metrics (PageRank 0.0234) into vocabulary ("entry point", "central in the dependency graph", "bridge between modules"). Raw numerics moved to a "do NOT recite verbatim" footermodule_page.j2andrepo_overview.j2get the same treatmentCLAUDE.md generator
The stale 9-tool table referenced three tools that don't exist (
update_decision_records,get_dependency_path,get_architecture_diagram). Replaced with the actual 8-tool surface, with each row leading with the tool's unique value and a "verify when…" line tied to the newstale_warning/retrieval_quality/search_methodsignals.Test plan
python -c \"from repowise.server.mcp_server import get_symbol, get_answer\"succeedsrepowise update --forceregenerates pages under the new promptsget_answer(\"how does a repository get indexed end-to-end?\")— should now surfaceorchestrator.pyvia vector retrieval + graph expansion, nottool_symbol.pyget_context([\"packages/cli/src/repowise/cli/commands/init_cmd.py\"])returnshotspot: true,symbol_idper symbol, no ownership/last_change unless requestedget_risk(targets=[\"X\"], changed_files=[\"X\", \"Y\"])returns the newdirectiveblock and stays under ~8k tokenssearch_codebase(\"PageGenerator\")returns thegrep_hintfor bareword identifier queries_meta.stale_warningis absent; after committing without re-indexing, it appears with the live HEAD short SHA