v0.1.3 — DeepSeek/Kimi reasoning + cache accounting (eval harness)
Eval-harness-only release. The production MCP server is unchanged — all changes are under evals/.
Changes since v0.1.2
- DeepSeek V4 reasoning in the L4 harness —
reasoning_contentcaptured to the.thinkingsidecar and replayed on tool-call turns (V4 requires it, same as Kimi/Moonshot; verified against the live API). (#9, LEO-233) - Per-request output cap raised 4096 → 32K so reasoning turns aren't truncated mid-thought (
finish_reason: length). (#7) - Prompt-cache accounting for DeepSeek + Moonshot — cache hits billed at the cache-read rate (
prompt_cache_hit_tokens/prompt_tokens_details.cached_tokens), with cached clamped ≤ prompt tokens. Fixes v1 over-billing (a DeepSeek run dropped $13.43 → $0.74).
Full Changelog: v0.1.2...v0.1.3