MCP tool lookup can fail across resumed turn slices after prior discovery


## Summary

A resumed Junior turn can fail with:

```text
MCP tool is not active for this turn: mcp__linear__create_issue
```

The production investigation suggests the model had previously discovered or used Linear MCP tools in another trace for the same `gen_ai.conversation.id`, then a later resumed slice continued Pi history in a fresh process and attempted `callMcpTool` with `mcp__linear__create_issue`. The resumed slice rebuilt the MCP manager/catalog and the exact tool was not active in the rebuilt catalog, so `callMcpTool` threw and Sentry captured it as an unexpected exception.

## Observed production signal

- Sentry issue: JUNIOR-30 / issue id `7513266773`
- Error: `MCP tool is not active for this turn: mcp__linear__create_issue`
- Failing transaction: `POST /api/webhooks/slack`
- Environment: production
- Conversation id: `slack:D0ASJ2VKP1U:1780423649.750179`
- Failing trace id observed: `91b6122364e55867ab1334bcae70a0c3`
- Search by `gen_ai.conversation.id` showed MCP search/call activity for the same conversation. The important case is when `searchMcpTools` happened in a different trace/slice than the failing resumed `callMcpTool`.

## Why this can happen in current branch

Current code creates a fresh `McpToolManager` for each `generateAssistantReply` call:

- `packages/junior/src/chat/respond.ts` creates a new manager per reply generation.
- Resume restores Pi messages and calls `agent.continue()`.
- MCP providers are inferred from durable Pi history via `inferActiveMcpProvidersFromPiMessages(...)` and re-activated.
- The exact MCP tool catalog returned by a previous `searchMcpTools` call is not persisted across the resume boundary.
- `packages/junior/src/chat/tools/skill/call-mcp-tool.ts` activates the provider from the requested tool name, then requires an exact match in `getResolvedActiveTools()`. If the fresh catalog does not contain that tool name, it throws.

That means Pi history can remember a planned or prior tool name while the resumed runtime only has a newly listed provider catalog.

## Span attributes note

The outer bridge span does attach tool args using `gen_ai.tool.call.arguments` in `packages/junior/src/chat/tools/agent-tools.ts`. For private Slack conversations (`slack:D...`), the value is payload metadata rather than raw JSON, so Sentry may show argument keys and sizes but not the literal `tool_name`. The thrown error currently reveals the exact requested tool name.

## Expected behavior

A resumed turn should not fail unexpectedly just because the MCP bridge catalog was rebuilt in a later slice. Either:

- resume should preserve enough MCP catalog/tool identity to make the continued tool call deterministic,
- the resumed slice should rediscover/refresh and guide the model back through `searchMcpTools` when the requested tool is missing,
- or this should be returned as a normal model-visible tool error instead of captured as an unexpected Sentry exception.

## Possible fixes

- Persist active MCP provider catalog summaries, or at least searched tool names, in turn session state and rehydrate them across resume.
- On missing `callMcpTool` exact match, add diagnostic span attributes: requested tool name, parsed provider, active providers, active tool count, and matching provider tool names/count.
- Consider returning a tool error instructing the model to call `searchMcpTools({ provider })` again instead of throwing an unexpected exception.
- If Linear `create_issue` is expected to always exist, add an `allowed-tools` pin for Linear so provider activation fails clearly when Linear does not expose it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP tool lookup can fail across resumed turn slices after prior discovery #492

Summary

Observed production signal

Why this can happen in current branch

Span attributes note

Expected behavior

Possible fixes

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

MCP tool lookup can fail across resumed turn slices after prior discovery #492

Description

Summary

Observed production signal

Why this can happen in current branch

Span attributes note

Expected behavior

Possible fixes

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions