v0.11.1 — gateway msg_too_long hardening
Why
A live Slack thread on 2026-05-07 returned pmk 內部錯誤:An API error occurred: msg_too_long after several mra-ask rounds. Root-cause analysis surfaced four issues; v0.11.1 layers defenses against all of them so the failure mode does not reach production users again.
Spec: apps/docs/docs/plans/2026-05-07-gateway-msg-too-long-hardening.md. TDD plan: apps/docs/docs/plans/2026-05-07-gateway-msg-too-long-hardening-implementation.md.
Fixed
msg_too_longno longer reaches end users. Three layered defenses:- (a)
pruneSessionIfNeedednow runs before the LLM call (was after — closed a fail-loop introduced in v0.8.1 where a session over budget could never recover because prune only fired after a successful call). - (b) The PKB seed and
mra-askresults are capped at write-time so a single bloated message cannot single-handedly exhaust the input window. - (c) Any residual
msg_too_longtriggers a typedPmkContextTooLongError, an automaticforcePruneToMinimum, and a retry. The reply is prefixed with:scissors: 對話過長,已自動裁掉 N 輪舊訊息. Hard failure (both calls reject) shows:x: 對話太長,請開新 thread 重新提問instead of the raw API error.
- (a)
Changed
PMK_MAX_SESSION_TOKENSdefault lowered 60_000 → 25_000 to leave headroom for system prompt, retrieval prefix, the SDK-inherited host context (claude-agent-sdkspawns the localclaudeCLI, which inherits~/.claude/skills/hooks/MCP descriptions), the new turn, and the model's reply.
Added
- New env vars
PMK_SEED_CAP(default 12_000 chars) andPMK_MRA_RESULT_CAP(default 16_000 chars) for per-host tuning. The previously-hardcoded 24_000-charmra-asktruncation inbuildMraSuccessMessageis replaced byPMK_MRA_RESULT_CAP. - New event types in
events-YYYY-MM.log:context.exceeded(withphase: \"first-call\" | \"synthesise\"),context.force-pruned,message.capped(withkind: \"seed\" | \"mra-result\"). pmk gateway auditgains aContext safetysection rolling up the new events. Tighten the*_CAPenv vars ifcontext.exceededappears in your weekly audit.- Helper
chatWithContextRetryextracted topackages/cli/src/gateway/slack/context-retry.tsso the retry+force-prune+events pattern is unit-testable in isolation (noSlackGatewayintegration harness needed) and reused at both LLM call sites (`runFreeChatTurn` first-call, `synthesiseAfterMra` mra-ask round).
Tests
`@pmk/cli` 274 → 304 (+30): unit coverage for `capMessageContent`, `forcePruneToMinimum`, `pruneSessionIfNeeded` extras-aware budgeting, `approxTokensFor` with `extra` param, `PmkContextTooLongError` detection, the six-discriminant `chatWithContextRetry` (happy / non-context error / context-then-success-with-scissors / context-then-fail / `dropped=0` degenerate / `phase=synthesise` audit), audit `contextSafety` rollup, formatter `Context safety` section non-zero + zero-count rendering, and the three new event-type round-trip in `gateway-events.test.ts`.
The seed-cap and mra-result-cap wiring sites in `slack/index.ts` and the `runFreeChatTurn` retry-prefix wiring rely on the constituent helpers' unit tests + manual verification (no `SlackGateway` integration harness in this release; tracked as a follow-up for v0.11.2).
Verified live
Same kind of mra-ask round that crashed yesterday (2026-05-07 in `#新頻道` thread `1778139665.927099`) completes cleanly today: `turn.processed audience=tech hadMraAsk=true atomsInjected=1` with no `msg_too_long`. `pmk gateway audit --days 1` renders the new `Context safety` section as designed.
Forward-looking (v0.12)
v0.12 is planned to switch the gateway provider from `claude-agent-sdk` to `anthropic-api`, removing the SDK-inherited host-context as a budget unknown. The cap mechanism from v0.11.1 stays; only the budgets relax toward the model's true context window.
Upgrade
`git pull && npm run cli:build` — no schema migration. Existing sessions on disk: nothing to do; the new caps apply going forward at write-time. See v0.11 migration notes for the operator-facing summary.
PR #50.