Skip to content

feat(mcp): overhaul tools — get_symbol exposed, get_answer reworked, page leads tuned#210

Merged
RaghavChamadiya merged 2 commits into
mainfrom
feat/mcp-tools-overhaul
May 18, 2026
Merged

feat(mcp): overhaul tools — get_symbol exposed, get_answer reworked, page leads tuned#210
RaghavChamadiya merged 2 commits into
mainfrom
feat/mcp-tools-overhaul

Conversation

@RaghavChamadiya
Copy link
Copy Markdown
Member

Summary

Overhaul of the Repowise MCP surface plus the wiki-page generator that feeds it, so the two compound instead of fighting each other.

MCP server (8 tools, was 7)

  • get_symbol exposed — agents can now fetch bounded source bytes by canonical path::Name ID without computing line offsets
  • get_context trimmed to a triage card — removed include=[\"source\"], added a single hotspot bit, decision-record titles, and symbol_id pointers. Ownership, last_change, decisions are now include=-gated
  • get_risk(changed_files=...) no longer bloats to 76k tokens — drops global hotspots in PR mode, caps co-change/transitive lists, emits a directive block (will_break, missing_cochanges, missing_tests) for one-glance PR review
  • search_codebase — new kind filter (implementation/test/config/doc), search_method per result (embedding vs bm25 fallback), Grep hint when query is a bareword identifier
  • Staleness signal in _metaindex_age_days always present; stale_warning fires only when repository.head_commit ≠ live .git/HEAD. Silence is the signal; the field stays trustworthy
  • Every tool's docstring rewritten to lead with what only that tool answers

get_answer pipeline (new modules _answer_pipeline.py, _answer_context.py)

  • Hybrid retrieval — FTS + vector store in parallel, merged via Reciprocal Rank Fusion, scaled into BM25 range so existing gates calibrate
  • PageRank bias — damped, normalised within the candidate set; tie-breaker not override
  • 1-hop graph expansion — rescues near-misses where retrieval landed in the right module but on the wrong file (consumer vs orchestrator)
  • Decision-record fusion for "why" questionsget_why ADRs are injected into the LLM context block so get_answer and get_why finally cohere
  • Structured prelude — top symbols, recent significant commits, decision titles prepended before file excerpts
  • Calibrated confidence + separate retrieval_quality — synthesis quality and retrieval quality reported on different axes
  • best_guesses on low-confidence path — agent gets ranked candidates with one-line justifications instead of an empty answer
  • Schema-versioned cache — bumped to v3; older payloads auto-invalidate

Wiki page generation

The MCP retrieval upgrades only matter if the corpus has signal-dense leads. The first 200–500 chars of a page dominate the embedding signal and were previously boilerplate metadata.

  • file_page / module_page / repo_overview system prompts now mandate a role-leading first sentence with architectural vocabulary (orchestrator, entry point, parser, adapter, …)
  • file_page.j2 collapses 8 bullet metadata into a compact line plus a prose **Architectural signals:** line that translates raw metrics (PageRank 0.0234) into vocabulary ("entry point", "central in the dependency graph", "bridge between modules"). Raw numerics moved to a "do NOT recite verbatim" footer
  • module_page.j2 and repo_overview.j2 get the same treatment

CLAUDE.md generator

The stale 9-tool table referenced three tools that don't exist (update_decision_records, get_dependency_path, get_architecture_diagram). Replaced with the actual 8-tool surface, with each row leading with the tool's unique value and a "verify when…" line tied to the new stale_warning / retrieval_quality / search_method signals.

Test plan

  • Smoke import: python -c \"from repowise.server.mcp_server import get_symbol, get_answer\" succeeds
  • repowise update --force regenerates pages under the new prompts
  • Restart MCP, ask get_answer(\"how does a repository get indexed end-to-end?\") — should now surface orchestrator.py via vector retrieval + graph expansion, not tool_symbol.py
  • get_context([\"packages/cli/src/repowise/cli/commands/init_cmd.py\"]) returns hotspot: true, symbol_id per symbol, no ownership/last_change unless requested
  • get_risk(targets=[\"X\"], changed_files=[\"X\", \"Y\"]) returns the new directive block and stays under ~8k tokens
  • search_codebase(\"PageGenerator\") returns the grep_hint for bareword identifier queries
  • On a clean repo, _meta.stale_warning is absent; after committing without re-indexing, it appears with the live HEAD short SHA

…page leads tuned

MCP server (8 tools, was 7):
- Expose get_symbol so agents can fetch bounded source bytes by ID
- Remove include=["source"] from get_context; trim defaults to a triage
  card (hotspot bit, decision-record titles, symbol_id pointers) so the
  tool is a router into get_symbol / get_risk / get_why, not a kitchen sink
- get_risk PR mode: drop global hotspots, cap co-change/transitive lists,
  emit a directive block (will_break, missing_cochanges, missing_tests)
- search_codebase: kind filter, search_method per result, grep hint for
  bareword identifier queries
- Every response carries _meta.index_age_days and indexed_commit; a
  stale_warning fires only when the indexed HEAD diverges from live
  .git/HEAD (calibrated to stay silent on fresh indexes so the field
  remains trustworthy)
- Rewrite every tool's docstring to lead with what only that tool answers

get_answer pipeline (new modules _answer_pipeline.py, _answer_context.py):
- Hybrid retrieval: FTS + vector store run in parallel, merged via RRF,
  scaled into the BM25 score range so existing gates still calibrate
- PageRank bias on retrieval scores (damped, normalised within candidate set)
- 1-hop graph expansion of top hits to rescue near-misses
- Decision-record fusion for "why"-shaped questions
- Structured prelude block (top symbols, recent significant commits,
  decision titles) prepended to the LLM context
- Calibrated confidence + separate retrieval_quality signal so the two
  axes (synthesis quality vs retrieval quality) report independently
- Structured best_guesses with one-line justifications on low-confidence
  return path instead of an empty answer
- Schema-versioned cache (bumped to v3) so payloads from earlier pipelines
  auto-invalidate without manual migration

Wiki page generation (so the MCP retrieval upgrade actually finds what it
should):
- file_page / module_page / repo_overview system prompts now mandate a
  role-leading first sentence with architectural vocabulary (orchestrator,
  entry point, parser, adapter, …) — the first 200–500 chars dominate the
  embedding signal and were previously boilerplate
- file_page.j2 collapses bullet metadata into a compact line plus an
  "Architectural signals" prose line that translates raw metrics
  (PageRank 0.0234) into vocabulary ("entry point", "central in the
  dependency graph", "bridge between modules")
- module_page.j2 and repo_overview.j2 get the same treatment, with the
  raw numerics moved to a "do NOT recite verbatim" footer

CLAUDE.md generator (claude_md.j2, workspace_claude_md.j2):
- Replace the stale 9-tool table that listed three unimplemented tools
  (update_decision_records, get_dependency_path, get_architecture_diagram)
  with the actual 8-tool surface
- Each row leads with what only that tool answers; add composition tips
  and a "verify when..." line tied to the new stale_warning /
  retrieval_quality / search_method signals
@RaghavChamadiya RaghavChamadiya requested a review from swati510 as a code owner May 18, 2026 10:30
…text overhaul

- Bump 'Seven MCP tools' → 'Eight MCP tools' wherever it appears
- Rewrite the MCP-tools table so each row leads with what only that tool
  answers; add get_symbol, drop include=["source"] from get_context, and
  surface the new signals (retrieval_quality, best_guesses, search_method,
  hotspot bit, decision_records pointer, PR-mode directive block)
- Note the _meta staleness signal (index_age_days / indexed_commit /
  stale_warning fires only on real HEAD divergence)
- Update tool-count cells in the task-comparison table and the competitor
  matrix
@RaghavChamadiya RaghavChamadiya merged commit ee1b71f into main May 18, 2026
5 checks passed
@RaghavChamadiya RaghavChamadiya deleted the feat/mcp-tools-overhaul branch May 18, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants