Mixed-stack home lab: ~7× throughput meshing MLX/oMLX + LM Studio + llama.cpp + vLLM #3646
matthewdcage
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
If you run MLX inference locally — oMLX, MLX vanilla, llama.cpp server, LM Studio, or vLLM / Docker model runner — you may still hit the same problem I did: every machine has its own port, and every editor wants a different
base_url.I'm an avid oMLX user with a mixed home lab (2× Linux, 3× Apple Silicon Macs, M4 Max 64 GB daily driver). I built llm-swarm-router as a thin mesh coordinator on
:11400— not an inference runtime.Each node keeps its own MLX/oMLX/llama.cpp stack; the agent:
netllm models --lan)/v1+ Anthropic Messages API for Cursor, Claude Code, Codex, etc.First result on my LAN: ~7× throughput vs. routing everything through one machine.
v0.3.0.1: App Intents/Shortcuts, routing policies, Anthropic streaming
tool_use, macOS menubar + Linux/Windows alpha on Releases.MLX/oMLX stays the inference layer — the router sits above it.
Looking for multi-machine testers (especially Apple Silicon + Linux mixed labs). ⭐ and issues with
./netllm doctoroutput very welcome.Beta Was this translation helpful? Give feedback.
All reactions