Mixed-stack home lab: ~7× throughput meshing MLX/oMLX + LM Studio + llama.cpp + vLLM #3646

matthewdcage · 2026-06-09T09:06:45Z

matthewdcage
Jun 9, 2026

If you run MLX inference locally — oMLX, MLX vanilla, llama.cpp server, LM Studio, or vLLM / Docker model runner — you may still hit the same problem I did: every machine has its own port, and every editor wants a different base_url.

I'm an avid oMLX user with a mixed home lab (2× Linux, 3× Apple Silicon Macs, M4 Max 64 GB daily driver). I built llm-swarm-router as a thin mesh coordinator on :11400 — not an inference runtime.

Each node keeps its own MLX/oMLX/llama.cpp stack; the agent:

Auto-discovers local backends and custom OpenAI-compat URLs
Meshes siblings over mDNS (netllm models --lan)
Exposes one stable OpenAI /v1 + Anthropic Messages API for Cursor, Claude Code, Codex, etc.

First result on my LAN: ~7× throughput vs. routing everything through one machine.

v0.3.0.1: App Intents/Shortcuts, routing policies, Anthropic streaming tool_use, macOS menubar + Linux/Windows alpha on Releases.

MLX/oMLX stays the inference layer — the router sits above it.

Looking for multi-machine testers (especially Apple Silicon + Linux mixed labs). ⭐ and issues with ./netllm doctor output very welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-stack home lab: ~7× throughput meshing MLX/oMLX + LM Studio + llama.cpp + vLLM #3646

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Mixed-stack home lab: ~7× throughput meshing MLX/oMLX + LM Studio + llama.cpp + vLLM #3646

Uh oh!

matthewdcage Jun 9, 2026

Replies: 0 comments

matthewdcage
Jun 9, 2026