Managed MCP calls can repeatedly exhaust turn budget and keep Discord typing until retry succeeds

## Summary

A Hermes Discord runner with a managed MCP search tool can get stuck in a long active turn where Discord continues to show the agent as typing, but no visible reply is posted. The logs suggest the model/tool loop keeps running until cllama returns budget/deadline errors, and an interrupt from a follow-up Discord message does not fully clean up the outstanding managed tool/MCP work.

This is not project-specific. The concrete evidence below came from one deployed pod, but the failure mode appears to be a generic interaction between:

- Hermes Discord turn lifecycle / typing indicator handling
- cllama managed tool budgeting and deadlines
- stdio MCP sidecar cancellation behavior
- a realtime search MCP tool that can take long enough to hit those limits

## Environment observed

- clawdapus: `v0.13.2`
- cllama: `v0.5.0`
- Hermes runner image: `ghcr.io/mostlydev/hermes-base:v2026.3.17-claw.2`
- MCP stdio sidecar image: `ghcr.io/mostlydev/claw-mcp-stdio:v0.12.0`
- Managed MCP tool: Perplexity search via `perplexity-mcp`
- Model route: `openrouter/minimax/minimax-m2.7`
- Surface: Discord mention-triggered Hermes agent turn

## Repro shape

1. Run a Hermes Discord agent with a managed MCP search tool exposed through the stdio MCP wrapper.
2. Ask the agent to perform realtime search checks.
3. While the turn is still active, send another Discord mention asking for status.
4. Observe that the agent may remain typing for minutes and never post a visible response for that turn.

## Evidence from one observed pod

Discord/user timeline, local time:

```text
18:35:14 user asked the agent to run realtime Perplexity/search checks
18:37:37 user asked "Any luck?" while the agent was still shown as typing
```

Hermes/Discord runner log showed a follow-up message interrupting an active session:

```text
[Discord] New message while session ... is active - triggering interrupt
⚡ Interrupted during API call.
```

cllama logs for the same window showed repeated model requests and 120s-ish failures:

```text
2026-04-28T22:35:17Z request ... model=openrouter/minimax/minimax-m2.7
...
2026-04-28T22:37:17Z error latency_ms=120685 status_code=502 error="managed tool budget exhausted"
2026-04-28T22:39:18Z error latency_ms=120809 status_code=502 error="context deadline exceeded"
2026-04-28T22:39:37Z error latency_ms=120018 status_code=502 error="managed tool budget exhausted"
2026-04-28T22:41:38Z error latency_ms=120729 status_code=502 error="context deadline exceeded"
2026-04-28T22:42:17Z error latency_ms=38346 status_code=502 error="context canceled"
```

The MCP sidecar logged late/unknown responses around the same failure:

```text
claw-mcp-stdio ignored response for unknown id 32
claw-mcp-stdio ignored response for unknown id 66
```

Additional observations:

- The agent process stayed healthy.
- Memory pressure was not involved; the container was well below its memory limit.
- The Hermes session file for the affected turn contained the user prompt, but no assistant/tool messages for the turn.
- Restarting only the affected agent cleared the stuck state.

## Expected behavior

- Managed MCP calls should have a bounded failure path that returns a structured error to the runner/agent quickly enough for the agent to report failure.
- Discord typing should stop when cllama returns `managed tool budget exhausted`, `context deadline exceeded`, or when the turn is interrupted.
- A follow-up Discord interrupt should cancel outstanding cllama managed-tool work and propagate cancellation through the MCP sidecar.
- Late MCP responses after expected cancellation should either be handled cleanly or logged in a way that distinguishes normal cancellation from protocol mismatch.

## Possible fixes

- Clamp managed MCP/tool-loop budgets per turn and surface a structured tool error instead of allowing repeated 120s failures.
- Propagate Discord turn interruption through cllama and the stdio MCP sidecar so outstanding calls are actually canceled.
- Ensure Hermes/Discord typing lifecycle is finalized on managed tool budget exhaustion, deadline, and cancellation paths.
- Add telemetry that identifies which managed tool consumed the budget, e.g. `perplexity.search` rather than only the model route.
- Add a regression test using a slow or looping MCP tool plus a follow-up Discord interrupt.

## Related

This is related to, but distinct from, #190. That issue covers Hermes Discord tool-only/final-message behavior and visible progress cards. This issue is about managed MCP budget/deadline/cancellation behavior causing a long active turn and stuck typing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managed MCP calls can repeatedly exhaust turn budget and keep Discord typing until retry succeeds #191

Summary

Environment observed

Repro shape

Evidence from one observed pod

Expected behavior

Possible fixes

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Managed MCP calls can repeatedly exhaust turn budget and keep Discord typing until retry succeeds #191

Description

Summary

Environment observed

Repro shape

Evidence from one observed pod

Expected behavior

Possible fixes

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions