Replies: 1 comment
-
|
hey @jsirish, I didn't expect someone to start trying this project so quickly, which makes me very excited! Although the inference core of this project is already quite complete, I think there are still some areas that need improvement. I can only conduct large-scale testing and answer your questions once these functions are refined:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’ve been using mlx-node on a Mac Studio M3 Ultra with 256GB of RAM. The performance is significantly better than other servers I've tested, and I'd like to make this my primary hosting solution.
Since the library is so performant, it would be incredibly helpful to have a "Performance Tuning" or "Model-Specific Optimization" section in the documentation—specifically for the Qwen 3.5 series.
Given my hardware overhead, I'm curious if there are "ideal" settings or environment variables for:
I would be happy to run benchmarks on this hardware and contribute the results to the docs if there is a preferred format!
Beta Was this translation helpful? Give feedback.
All reactions