Summary
A Hermes Discord runner with a managed MCP search tool can get stuck in a long active turn where Discord continues to show the agent as typing, but no visible reply is posted. The logs suggest the model/tool loop keeps running until cllama returns budget/deadline errors, and an interrupt from a follow-up Discord message does not fully clean up the outstanding managed tool/MCP work.
This is not project-specific. The concrete evidence below came from one deployed pod, but the failure mode appears to be a generic interaction between:
- Hermes Discord turn lifecycle / typing indicator handling
- cllama managed tool budgeting and deadlines
- stdio MCP sidecar cancellation behavior
- a realtime search MCP tool that can take long enough to hit those limits
Environment observed
- clawdapus:
v0.13.2
- cllama:
v0.5.0
- Hermes runner image:
ghcr.io/mostlydev/hermes-base:v2026.3.17-claw.2
- MCP stdio sidecar image:
ghcr.io/mostlydev/claw-mcp-stdio:v0.12.0
- Managed MCP tool: Perplexity search via
perplexity-mcp
- Model route:
openrouter/minimax/minimax-m2.7
- Surface: Discord mention-triggered Hermes agent turn
Repro shape
- Run a Hermes Discord agent with a managed MCP search tool exposed through the stdio MCP wrapper.
- Ask the agent to perform realtime search checks.
- While the turn is still active, send another Discord mention asking for status.
- Observe that the agent may remain typing for minutes and never post a visible response for that turn.
Evidence from one observed pod
Discord/user timeline, local time:
18:35:14 user asked the agent to run realtime Perplexity/search checks
18:37:37 user asked "Any luck?" while the agent was still shown as typing
Hermes/Discord runner log showed a follow-up message interrupting an active session:
[Discord] New message while session ... is active - triggering interrupt
⚡ Interrupted during API call.
cllama logs for the same window showed repeated model requests and 120s-ish failures:
2026-04-28T22:35:17Z request ... model=openrouter/minimax/minimax-m2.7
...
2026-04-28T22:37:17Z error latency_ms=120685 status_code=502 error="managed tool budget exhausted"
2026-04-28T22:39:18Z error latency_ms=120809 status_code=502 error="context deadline exceeded"
2026-04-28T22:39:37Z error latency_ms=120018 status_code=502 error="managed tool budget exhausted"
2026-04-28T22:41:38Z error latency_ms=120729 status_code=502 error="context deadline exceeded"
2026-04-28T22:42:17Z error latency_ms=38346 status_code=502 error="context canceled"
The MCP sidecar logged late/unknown responses around the same failure:
claw-mcp-stdio ignored response for unknown id 32
claw-mcp-stdio ignored response for unknown id 66
Additional observations:
- The agent process stayed healthy.
- Memory pressure was not involved; the container was well below its memory limit.
- The Hermes session file for the affected turn contained the user prompt, but no assistant/tool messages for the turn.
- Restarting only the affected agent cleared the stuck state.
Expected behavior
- Managed MCP calls should have a bounded failure path that returns a structured error to the runner/agent quickly enough for the agent to report failure.
- Discord typing should stop when cllama returns
managed tool budget exhausted, context deadline exceeded, or when the turn is interrupted.
- A follow-up Discord interrupt should cancel outstanding cllama managed-tool work and propagate cancellation through the MCP sidecar.
- Late MCP responses after expected cancellation should either be handled cleanly or logged in a way that distinguishes normal cancellation from protocol mismatch.
Possible fixes
- Clamp managed MCP/tool-loop budgets per turn and surface a structured tool error instead of allowing repeated 120s failures.
- Propagate Discord turn interruption through cllama and the stdio MCP sidecar so outstanding calls are actually canceled.
- Ensure Hermes/Discord typing lifecycle is finalized on managed tool budget exhaustion, deadline, and cancellation paths.
- Add telemetry that identifies which managed tool consumed the budget, e.g.
perplexity.search rather than only the model route.
- Add a regression test using a slow or looping MCP tool plus a follow-up Discord interrupt.
Related
This is related to, but distinct from, #190. That issue covers Hermes Discord tool-only/final-message behavior and visible progress cards. This issue is about managed MCP budget/deadline/cancellation behavior causing a long active turn and stuck typing.
Summary
A Hermes Discord runner with a managed MCP search tool can get stuck in a long active turn where Discord continues to show the agent as typing, but no visible reply is posted. The logs suggest the model/tool loop keeps running until cllama returns budget/deadline errors, and an interrupt from a follow-up Discord message does not fully clean up the outstanding managed tool/MCP work.
This is not project-specific. The concrete evidence below came from one deployed pod, but the failure mode appears to be a generic interaction between:
Environment observed
v0.13.2v0.5.0ghcr.io/mostlydev/hermes-base:v2026.3.17-claw.2ghcr.io/mostlydev/claw-mcp-stdio:v0.12.0perplexity-mcpopenrouter/minimax/minimax-m2.7Repro shape
Evidence from one observed pod
Discord/user timeline, local time:
Hermes/Discord runner log showed a follow-up message interrupting an active session:
cllama logs for the same window showed repeated model requests and 120s-ish failures:
The MCP sidecar logged late/unknown responses around the same failure:
Additional observations:
Expected behavior
managed tool budget exhausted,context deadline exceeded, or when the turn is interrupted.Possible fixes
perplexity.searchrather than only the model route.Related
This is related to, but distinct from, #190. That issue covers Hermes Discord tool-only/final-message behavior and visible progress cards. This issue is about managed MCP budget/deadline/cancellation behavior causing a long active turn and stuck typing.