v0.12.0 — gateway anthropic-api as default provider
Why
v0.11.1 hardened the gateway against msg_too_long by lowering caps and adding an auto-retry path, but cause #2 from the 2026-05-07 incident — claude-agent-sdk spawning the local claude CLI and inheriting the host's ~/.claude/ config (skills/hooks/MCP descriptions) as un-budgeted system context — was absorbed by tighter caps, not eliminated. v0.12.0 flips the default to the direct Anthropic SDK so SDK overhead is no longer a budget unknown, and restores cap headroom.
Spec: apps/docs/docs/plans/2026-05-08-gateway-anthropic-api-default.md. Migration: v0.12 migration notes.
Changed
- Default LLM provider auto-resolves to
anthropic-apifirst (wasclaude-agent). Soft flip — users withANTHROPIC_API_KEYset auto-switch; users without it stay onclaude-agentwith no behavioural change.PMK_PROVIDER=claude-agentstill pins the legacy path explicitly. - Cap defaults restored to operationally useful values now that SDK overhead is gone on the default path:
PMK_MAX_SESSION_TOKENS25_000 → 60_000PMK_SEED_CAP12_000 → 30_000PMK_MRA_RESULT_CAP16_000 → 40_000
gateway initprompts forANTHROPIC_API_KEYafter Slack tokens; stored in~/.pmk/gateway.jsonapiKeyfield at mode 0600. Empty input keeps existing value or falls back to env var. Already-running gateway needs a graceful restart to pick up a newly-set apiKey.
Added
token.usageevent inevents-YYYY-MM.log— emitted byAnthropicApiKeyProvider.chat()after each successful stream completion, when anactoris provided inChatOptions. Fields:actor,provider,model,inputTokens,outputTokens, optionalcacheReadTokens/cacheCreationTokens. Best-effort write — failures don't break the chat.Token usagesection inpmk gateway auditrolls up the new events: total in/out, cache read (when non-zero), top-3 per-actor by input tokens, per-model breakdown.ChatOptions.actoroptional field on theLlmProvider.chat()interface for usage attribution. Threaded throughchatWithContextRetryautomatically; CLI command-side wiring is future work.
Tests
@pmk/cli 304 → 312 (+8): resolver.ts autoResolve order (apiKey-preferred + fail path), AnthropicApiKeyProvider.chat() token-usage emission with mocked stream + finalMessage(), no-emission when actor undefined, events.ts round-trip for token.usage, audit.ts aggregation, audit-format.ts Token usage rendering for non-zero + zero cases. Cap-default test assertions flipped from v0.11.1 values to v0.12.0 values.
Forward-looking
claude-agent provider stays as a soft-flip fallback indefinitely. Re-evaluate deprecation in v0.13+ based on usage data from the new Token usage audit section. $-cost calculation is a v0.13+ candidate, gated on a stable price-table source. SlackGateway integration harness remains tracked as a v0.11.2 follow-up.
Upgrade
`git pull && npm run cli:build` — no schema migration. Existing sessions on disk: nothing to do; the new caps apply going forward at write-time. See v0.12 migration notes for the operator-facing summary.
PR #51.