v0.6.75
Highlights
HarmonyStreamingRouter— delegates gpt-oss tool-call streaming to openai-harmony'sStreamableParser(same library vLLM and SGLang use). Closes #444 #455 #468 #480 #513.- Structured tool-call passthrough — surfaces
(name, arguments)natively through the engine viaGenerationOutput.tool_calls; routes bypass regex-based reconstruction. Eliminates the class of sentinel-anchored parsing bugs that produced "harmony channel markers leak into content delta" symptoms across the four upstream issues.
Architecture
The legacy custom state machine remains as a fallback (auto-selected when the strict three-layer compat gate — tokenizer-identity allowlist + 7-marker parity + body-vocab probe — fails). All other models (Gemma 4 OutputRouter, think-tag, harmony with mismatched IDs) continue on the existing path.
Streaming fast-path enforces:
tool_calls[*].indexmonotonicity across router chunks (OpenAI delta-merge requirement)parallel_tool_calls=falseexternal contract cap (matching non-streaming post-parse trim)
Issues closed
- #444 — harmony streaming tool calls leak as raw content deltas
- #455 — Anthropic /v1/messages streaming + tools: harmony commentary leaks as text block
- #468 — `tool_choice="required"` not enforced + harmony channel markers leak
- #480 — harmony streaming tool calls leak channel syntax into content (gpt-oss-20b)
- #513 — marker-preserving router redesign for end-to-end fix of the above
Validation
- 16 rounds of codex adversarial review (final round: 0 BLOCKINGs)
- pr_validate 3×3 stress matrix: qwen3.5-35B-A3B-8bit / qwen3.6-27B-MLX-8bit / gpt-oss-20b-MXFP4-Q8
- 4484 unit tests pass (+9 new for structured-streaming index + cap)
- Release-SOP gates G1-G9 walked. G10 N/A. G11 deferred to #516 (waiver: auto-upgrade falls back to in-production legacy router; no behavioral regression vs pre-PR baseline).
Dependencies
- New: `openai-harmony>=0.0.6` (soft-imported at runtime)
Installation
```
pip install --upgrade rapid-mlx
brew upgrade rapid-mlx
```