v0.6.68

raullenchai released this 28 May 15:54

22d7488

v0.6.68 — chat launch polish + MTP guard

First release since v0.6.66 (0.6.67 was skipped — squash subject didn't match auto-release regex).

Highlights

🛡️ MTP injection no longer crashes on hybrid VLM models (#477, #483). rapid-mlx serve <Qwen3.6-VL-MTP-model> --enable-mtp --force-spec-decode previously crashed twice — once on the outer VLM args lookup, once on the hybrid Gated-DeltaNet _step missing. Both paths now fail cleanly with a single warning each; request continues without MTP. Proper VLM+MTP support is tracked as a follow-up.
💬 rapid-mlx chat pre-launch UX polish (#482):
- Auto-bump max_tokens to 4096 when --think is set (empty-answer fix on small reasoning models)
- Atexit zombie reap for spawned serve subprocesses + SIGTERM handler
- --port range validator + pre-flight TCP probe
- rapid-mlx run alias for chat; rapid-mlx ls for cached models
- Download confirmation gate ([y/N] for ≥10 GiB models, RAPID_MLX_AUTO_PULL=1 to skip)
- /bye /? REPL aliases; info box truncation; first-launch codex tip
- New vllm_mlx/_download_gate.py module

Bug fixes

fix(api): honor max_completion_tokens on chat completions (#459)
fix(api): honor parallel_tool_calls=false by capping response to 1 call (#464)
fix(api): honor legacy functions/function_call by normalizing to tools (#465)
fix(anthropic): forward stop_sequences to engine on /v1/messages (#462)
fix(chat): preserve channel split on logprobs non-stream path (#460)
fix(routes): reject audio_url on text-only models (mirror image/video gate) (#466)
fix(mllm): propagate VLM image fetch errors to HTTP 400 (#458)
fix(engine): propagate per-token logprobs through OutputRouter (#456)
fix(streaming): populate reasoning_tokens in usage chunk for OutputRouter models (#454)
fix(usage): proportional reasoning_tokens split when content non-empty (#453)

Tooling

feat(pr_validate): integrate Google eng-practices code-review tiering (#474)
docs(benchmarks): add community DFlash bench for Qwen3.6-35B-A3B-8bit on M3 Ultra (#473)

Install

```bash
brew upgrade rapid-mlx

or

pip install -U rapid-mlx==0.6.68
```

Assets 2