Skip to content

v0.6.75

Choose a tag to compare

@raullenchai raullenchai released this 05 Jun 04:57

Highlights

  • HarmonyStreamingRouter — delegates gpt-oss tool-call streaming to openai-harmony's StreamableParser (same library vLLM and SGLang use). Closes #444 #455 #468 #480 #513.
  • Structured tool-call passthrough — surfaces (name, arguments) natively through the engine via GenerationOutput.tool_calls; routes bypass regex-based reconstruction. Eliminates the class of sentinel-anchored parsing bugs that produced "harmony channel markers leak into content delta" symptoms across the four upstream issues.

Architecture

The legacy custom state machine remains as a fallback (auto-selected when the strict three-layer compat gate — tokenizer-identity allowlist + 7-marker parity + body-vocab probe — fails). All other models (Gemma 4 OutputRouter, think-tag, harmony with mismatched IDs) continue on the existing path.

Streaming fast-path enforces:

  • tool_calls[*].index monotonicity across router chunks (OpenAI delta-merge requirement)
  • parallel_tool_calls=false external contract cap (matching non-streaming post-parse trim)

Issues closed

  • #444 — harmony streaming tool calls leak as raw content deltas
  • #455 — Anthropic /v1/messages streaming + tools: harmony commentary leaks as text block
  • #468 — `tool_choice="required"` not enforced + harmony channel markers leak
  • #480 — harmony streaming tool calls leak channel syntax into content (gpt-oss-20b)
  • #513 — marker-preserving router redesign for end-to-end fix of the above

Validation

  • 16 rounds of codex adversarial review (final round: 0 BLOCKINGs)
  • pr_validate 3×3 stress matrix: qwen3.5-35B-A3B-8bit / qwen3.6-27B-MLX-8bit / gpt-oss-20b-MXFP4-Q8
  • 4484 unit tests pass (+9 new for structured-streaming index + cap)
  • Release-SOP gates G1-G9 walked. G10 N/A. G11 deferred to #516 (waiver: auto-upgrade falls back to in-production legacy router; no behavioral regression vs pre-PR baseline).

Dependencies

  • New: `openai-harmony>=0.0.6` (soft-imported at runtime)

Installation

```
pip install --upgrade rapid-mlx
brew upgrade rapid-mlx
```