Optimal settings for Qwen models on high-end Apple Silicon #42

jsirish · 2026-04-02T15:01:12Z

jsirish
Apr 2, 2026

I’ve been using mlx-node on a Mac Studio M3 Ultra with 256GB of RAM. The performance is significantly better than other servers I've tested, and I'd like to make this my primary hosting solution.

Since the library is so performant, it would be incredibly helpful to have a "Performance Tuning" or "Model-Specific Optimization" section in the documentation—specifically for the Qwen 3.5 series.

Given my hardware overhead, I'm curious if there are "ideal" settings or environment variables for:

Memory Allocation: Best practices for max-kv-cache-size or similar flags when 200GB+ of unified memory is available.
Metal Performance: Any flags for maximizing GPU utilization on the M3 Ultra’s 76-core GPU?
Quantization vs. Speed: For Qwen models, is there a recommended quantization (e.g., Q4_K_M vs Q8_0) that hits the sweet spot for mlx-node specifically?
Concurrency: Recommended batch_size or concurrent request settings to maximize throughput without hitting thermal throttling.

I would be happy to run benchmarks on this hardware and contribute the results to the docs if there is a preferred format!

Brooooooklyn · 2026-04-03T04:13:59Z

Brooooooklyn
Apr 3, 2026
Maintainer

hey @jsirish, I didn't expect someone to start trying this project so quickly, which makes me very excited! Although the inference core of this project is already quite complete, I think there are still some areas that need improvement. I can only conduct large-scale testing and answer your questions once these functions are refined:

OpenAI/Anthropic compatible inference API
User friendly ToolCalls API
Make sure our APIs can be used with major LLM Agents

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimal settings for Qwen models on high-end Apple Silicon #42

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Optimal settings for Qwen models on high-end Apple Silicon #42

Uh oh!

jsirish Apr 2, 2026

Replies: 1 comment

Uh oh!

Brooooooklyn Apr 3, 2026 Maintainer

jsirish
Apr 2, 2026

Brooooooklyn
Apr 3, 2026
Maintainer