Release vMLX 1.5.56 · jjang-ai/mlxstudio

vMLX 1.5.56

Highlights:

Fixes Gemma 4 12B VLM image prefill false rejection on high-memory Macs by scaling the default single-buffer guard from the Metal working set instead of a fixed 8GB.
Keeps explicit VMLX_VLM_IMAGE_PREFILL_BUFFER_GB behavior intact for users who need a hard guard.
Prevents failed media turns from poisoning later text prompts by rolling back the failed media user message in the chat panel.
Defaults Gemma 4 unified chat/image requests to visible-answer mode unless thinking is explicitly requested.
Removes deprecated mx.metal.clear_cache runtime calls in favor of mx.clear_cache with fallback.
Fixes model-family lookup for served aliases so registry defaults use the loaded model path.

Verification:

Source live Gemma 4 12B JANG_4M image, Responses image, text, multi-turn recall, cache-hit, oversized 413 recovery, and no-traceback guard behavior verified.
Source live Gemma 4 12B MXFP4 and MXFP8 image prompts verified with visible output.
Source live Qwen3.6 35B MXFP8 MTP verified without gdn_sink crash, including Chat, Responses, 160-token decode, TurboQuant KV, paged hybrid cache, and native MTP accepted-token logging.
Source live Step 3.7 Flash JANG_2L text Chat and Responses verified stable; image request no longer crashes, but output-quality overgeneration remains tracked separately.
Packaged installed /Applications/vMLX.app 1.5.56 verified: app version, bundled engine version, pycache-clean bundle, strict codesign, Gatekeeper Notarized Developer ID, GUI launch, OpenAI Chat, Responses, Anthropic, Ollama, multi-turn recall, prefix/cache hit, cache stats, and JIT soft wake on Gemma 4 12B JANG_4M.
Sequoia and Tahoe DMGs are Developer ID signed, Apple notarized, stapled, Gatekeeper accepted, and verified by the release DMG verifier.

Known follow-up:

DSV4 long full-output/code-generation exactness remains deferred for separate runtime-quality clearance.
Full cross-family real Electron UI live matrix remains tracked separately.
Structured JSON/XML repair and optional guided schema decoding support are planned for the next release so benchmark/database pipelines can validate, normalize, repair, and retry malformed model structured output.

DMG SHA256:

Provide feedback