English · 简体中文
Native macOS LLM inference, powered by Apple MLX.
macMLX brings local LLM inference to Apple Silicon with a first-class native macOS experience. No cloud, no telemetry, no Electron — just your Mac running models at full speed.
macMLX is for everyone: a polished SwiftUI app for newcomers, and a proper CLI for developers.
| macMLX | LM Studio | Ollama | oMLX | |
|---|---|---|---|---|
| Native macOS GUI | ✅ SwiftUI | ❌ Electron | ❌ | ❌ Web UI |
| MLX-native inference | ✅ | ❌ GGUF | ❌ GGUF | ✅ |
| CLI | ✅ | ❌ | ✅ | ✅ |
| Resumable downloads + mirrors | ✅ | ⚠ partial | ⚠ partial | ❌ |
| OpenAI-compatible API | ✅ always-on | ✅ | ✅ | ✅ |
| Zero Python required | ✅ | ✅ | ✅ | ❌ |
- macOS 14.0 (Sonoma) or later
- Apple Silicon (M1 / M2 / M3 / M4)
- No Python required
Download macMLX-vX.X.X.dmg from Releases, mount it,
and drag macMLX.app to /Applications.
The DMG is not notarized (no paid Apple Developer account yet — #19), so Gatekeeper blocks it on first launch. Pick one of the two unblocks:
Option A — terminal (recommended, always works):
xattr -cr /Applications/macMLX.app # clear quarantine attribute
open /Applications/macMLX.app # first launchOption B — right-click: right-click macMLX.app → Open → then
click Open again in the dialog. On newer macOS versions this
fallback dialog sometimes doesn't appear — if so, use Option A.
Want to see what Gatekeeper thinks of the app?
spctl --assess --verbose /Applications/macMLX.appTwelve-ish shipped releases since the v0.1 MVP. Pick the ones that matter:
Downloads
- Resumable downloads survive cancels AND app quits (background URLSession + persisted resume data) — #5/#6/#8
- Live speed (MB/s) + ETA + per-file progress bar — #7
- Configurable Hugging Face endpoint for mirrors like
https://hf-mirror.com(GUI + CLI, both) — #21
Chat
- Conversation sidebar: switch between saved chats, rename, delete, rewind to here (truncate after any message) — v0.3.2
- Streaming Markdown rendering with paragraph breaks preserved — #10 (+ v0.3.1 fix)
- Right-click any message: Copy / Edit / Regenerate / Delete — #11
- Per-model Parameters Inspector (⌘⌥I) — temperature, top_p, max tokens, system prompt persist to disk — #15
- Chat model switcher in toolbar loads on tap — v0.3.1
Benchmark — v0.3.0 tab for local tok/s, TTFT, peak memory, and history, with Share to Community to a GitHub-issue leaderboard — #22
Logs — v0.3.4 tab reads Pulse's store directly: search, level filter, live tail, clear
API (OpenAI-compat)
- Cold-swap:
/v1/chat/completionsauto-loads any locally-downloaded model by ID, serialises concurrent swaps — v0.3.3 /x/statusreports real RSS
CLI — native ANSI dashboards (macmlx pull, serve, run), honours preferredEngine + per-model ModelParameters + HF mirror settings — v0.3.1 / v0.3.3 / v0.3.5
Stability / polish — chat survives sidebar tab switches (#1), single-instance enforcement (#2), Quit in menu bar (#17), macmlx list segfault fix (v0.3.1), ConversationStore date-precision fix (v0.3.3), and a 3-commit independent code-review sweep in v0.3.0
Full per-tag breakdown: CHANGELOG.md.
- Launch macMLX — the setup wizard points you at
~/.mac-mlx/modelsand picks the MLX Swift engine - Download a model from the built-in HuggingFace browser (resumable, works through mirrors)
- Load it and start chatting
macmlx pull mlx-community/Qwen3-8B-4bit # download
macmlx list # local models
macmlx run Qwen3-8B-4bit "Hello, world" # single prompt
macmlx run Qwen3-8B-4bit # interactive
macmlx serve # start API on :8000
macmlx ps # is serve running?
macmlx stop # graceful SIGTERMmacMLX's OpenAI-compatible server runs on http://localhost:8000/v1
whenever you load a model (or whenever macmlx serve is running).
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Qwen3-8B-4bit","messages":[{"role":"user","content":"Hi"}],"stream":true}'Any OpenAI-compatible client works — point it at
http://localhost:8000/v1 with any key:
- Cursor / Continue / Cline: set the custom base URL in settings
- Open WebUI: add as an OpenAI provider
- Raycast, Zed, etc.: same pattern
| Engine | Status | Notes |
|---|---|---|
| MLX Swift (default) | ✅ Shipping | Apple's mlx-swift-lm, in-process. Supports models up to ~70B on 64 GB+ Macs. |
| SwiftLM (100B+ MoE) | 🕒 Deferred to v0.3 | Subprocess launch blocked by App Sandbox policy; revisit when there's a concrete user ask (#12). |
| Python mlx-lm | 🕒 Deferred to v0.3 | Same sandbox blocker (#13). |
Settings → Engine shows Install Guide links for the deferred engines; selecting them today surfaces a graceful "engine not available" state.
macMLX.app (SwiftUI) macmlx (CLI)
│ │
└─────── MacMLXCore ─┘ (Swift SPM package)
│
InferenceEngine
│
MLXSwiftEngine (in-process, mlx-swift-lm 3.31.x)
│
HummingbirdServer → http://localhost:8000/v1
│
Apple Silicon (Metal / ANE)
Data lives under ~/.mac-mlx/:
~/.mac-mlx/
├── models/ # weights (default, changeable in Settings)
├── conversations/ # chat history JSON
├── model-params/ # per-model parameter overrides
├── downloads/ # resume-data for interrupted downloads
├── logs/ # Pulse logs
├── settings.json # user preferences
└── macmlx.pid # CLI daemon coordination
This path is deliberately a dotfile under real $HOME: macOS App
Sandbox's dotfile exemption lets a sandboxed app read/write here
without user-selected.read-write entitlements or security-scoped
bookmarks, while staying visible to power users.
git clone https://github.com/magicnight/mac-mlx
cd mac-mlx
brew bundle # dev tools
# GUI app
open macMLX/macMLX.xcodeproj # or: xcodebuild -scheme macMLX build
# CLI
swift build --package-path macmlx-cli
# Core + tests
swift test --package-path MacMLXCore # 90 tests, runs in ~3s- v0.1.0 — native SwiftUI GUI, menu bar, CLI (
serve/pull/run/list/ps/stop), HuggingFace downloader, OpenAI-compatible API, Sparkle auto-update, memory-aware onboarding. - v0.2.0 — Download + chat polish (10 issues): resumable downloads, HF mirrors, Markdown rendering, message edit/regenerate, Parameters Inspector.
- v0.3.x — six patch releases: Benchmark feature, cross-cutting gap fixes, UX patches, Chat history sidebar, API cold-swap, Logs tab, native ANSI CLI dashboards. See
CHANGELOG.mdfor the per-tag breakdown.
macmlx --versionauto-bumped from the release tagmacmlx search <query>command (queriesmlx-communityby default)- Release binary slim-down via
strip -S+ dynamic Swift stdlib - CLI
--log-level+--log-stderrflags so Pulse logging surfaces from the terminal
- #23 Vision-Language Model support —
MLXVLMalready in the dependency tree, 16 architectures (Qwen2.5-VL, SmolVLM, Gemma-3, Paligemma, …). Full plan in.omc/plans/v0.4-vlm-plan.md.
- v0.5 — LoRA adapter loading (drop in existing HF adapters, no training) + conversation/dataset export
- v0.6 — Speech I/O: WhisperKit for ASR (mic input in chat) + AVSpeechSynthesizer for TTS (play assistant replies)
- #20 Homebrew tap for the CLI (scheduled around v0.3.6–v0.4 once the CLI tarball lands as a release asset)
- #19 Signed + notarized DMG — needs a paid Apple Developer account
- Full native-MLX Whisper in Swift — upstream
mlx-swift-lmdoesn't ship audio models yet; WhisperKit (Core ML) covers the UX in the meantime - #12 / #13 Subprocess-based engines (SwiftLM, Python mlx-lm) — closed as not planned because App Sandbox blocks spawning external binaries. Reopenable if sandbox policy is revisited or a Swift-native 100B+ MoE inference path appears.
See CONTRIBUTING.md. Issues and PRs welcome.
Apache 2.0 — see LICENSE
- MLX and mlx-swift-lm by Apple
- Swama — Swift inference architecture inspiration
- SwiftLM — 100B+ MoE engine (future integration)
- oMLX — feature depth reference
- Hummingbird — Swift HTTP server
- Sparkle — auto-update framework
- Pulse — logging framework
- SwiftTUI — TUI framework
Full BibTeX citations: CITATIONS.bib