Skip to content

MCP tool lookup can fail across resumed turn slices after prior discovery #492

@dcramer

Description

@dcramer

Summary

A resumed Junior turn can fail with:

MCP tool is not active for this turn: mcp__linear__create_issue

The production investigation suggests the model had previously discovered or used Linear MCP tools in another trace for the same gen_ai.conversation.id, then a later resumed slice continued Pi history in a fresh process and attempted callMcpTool with mcp__linear__create_issue. The resumed slice rebuilt the MCP manager/catalog and the exact tool was not active in the rebuilt catalog, so callMcpTool threw and Sentry captured it as an unexpected exception.

Observed production signal

  • Sentry issue: JUNIOR-30 / issue id 7513266773
  • Error: MCP tool is not active for this turn: mcp__linear__create_issue
  • Failing transaction: POST /api/webhooks/slack
  • Environment: production
  • Conversation id: slack:D0ASJ2VKP1U:1780423649.750179
  • Failing trace id observed: 91b6122364e55867ab1334bcae70a0c3
  • Search by gen_ai.conversation.id showed MCP search/call activity for the same conversation. The important case is when searchMcpTools happened in a different trace/slice than the failing resumed callMcpTool.

Why this can happen in current branch

Current code creates a fresh McpToolManager for each generateAssistantReply call:

  • packages/junior/src/chat/respond.ts creates a new manager per reply generation.
  • Resume restores Pi messages and calls agent.continue().
  • MCP providers are inferred from durable Pi history via inferActiveMcpProvidersFromPiMessages(...) and re-activated.
  • The exact MCP tool catalog returned by a previous searchMcpTools call is not persisted across the resume boundary.
  • packages/junior/src/chat/tools/skill/call-mcp-tool.ts activates the provider from the requested tool name, then requires an exact match in getResolvedActiveTools(). If the fresh catalog does not contain that tool name, it throws.

That means Pi history can remember a planned or prior tool name while the resumed runtime only has a newly listed provider catalog.

Span attributes note

The outer bridge span does attach tool args using gen_ai.tool.call.arguments in packages/junior/src/chat/tools/agent-tools.ts. For private Slack conversations (slack:D...), the value is payload metadata rather than raw JSON, so Sentry may show argument keys and sizes but not the literal tool_name. The thrown error currently reveals the exact requested tool name.

Expected behavior

A resumed turn should not fail unexpectedly just because the MCP bridge catalog was rebuilt in a later slice. Either:

  • resume should preserve enough MCP catalog/tool identity to make the continued tool call deterministic,
  • the resumed slice should rediscover/refresh and guide the model back through searchMcpTools when the requested tool is missing,
  • or this should be returned as a normal model-visible tool error instead of captured as an unexpected Sentry exception.

Possible fixes

  • Persist active MCP provider catalog summaries, or at least searched tool names, in turn session state and rehydrate them across resume.
  • On missing callMcpTool exact match, add diagnostic span attributes: requested tool name, parsed provider, active providers, active tool count, and matching provider tool names/count.
  • Consider returning a tool error instructing the model to call searchMcpTools({ provider }) again instead of throwing an unexpected exception.
  • If Linear create_issue is expected to always exist, add an allowed-tools pin for Linear so provider activation fails clearly when Linear does not expose it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions