v0.7.1 — managed-tool duplicate replay
Fixes
- Replay cached result on duplicate managed tool calls — when a model repeats an identical managed tool call, cllama now replays the earlier model-facing result for the new
tool_call_idinstead of returning a data-less HTTP 409. Previously a model that never received the earlier data on the repeat could re-issue the same call until the managed-tool round budget was exhausted, never finalizing. Gated byCLLAMA_MANAGED_DUPLICATE_POLICY(replay, default, orrejectfor the legacy 409 path). - Consecutive-duplicate streak cutoff — after
CLLAMA_MANAGED_DUPLICATE_STREAK_CUTOFFconsecutive identical duplicate calls (default3), cllama forces graceful finalization by disabling tools and injecting a final-answer instruction beforeMaxRoundsis reached, covering models that loop even after receiving the replayed result.
Both OpenAI-compatible and Anthropic mediation paths are handled. Session-history tool traces now record duplicate_streak and policy, and a duplicate_managed_tool_call_finalization:<tool> intervention is logged when the cutoff fires.