Skip to content

v0.1.3 — DeepSeek/Kimi reasoning + cache accounting (eval harness)

Choose a tag to compare

@lcj-claude-coder lcj-claude-coder released this 07 Jun 15:51
e83b8de

Eval-harness-only release. The production MCP server is unchanged — all changes are under evals/.

Changes since v0.1.2

  • DeepSeek V4 reasoning in the L4 harness — reasoning_content captured to the .thinking sidecar and replayed on tool-call turns (V4 requires it, same as Kimi/Moonshot; verified against the live API). (#9, LEO-233)
  • Per-request output cap raised 4096 → 32K so reasoning turns aren't truncated mid-thought (finish_reason: length). (#7)
  • Prompt-cache accounting for DeepSeek + Moonshot — cache hits billed at the cache-read rate (prompt_cache_hit_tokens / prompt_tokens_details.cached_tokens), with cached clamped ≤ prompt tokens. Fixes v1 over-billing (a DeepSeek run dropped $13.43 → $0.74).

Full Changelog: v0.1.2...v0.1.3