A lightweight web UI for llama.cpp. Manage local GGUF models, browse and download from HuggingFace, configure launch profiles based on your hardware, and chat with running models.
Built with SvelteKit, Bun, and SQLite.
Scan your local GGUF files, detect your hardware (NVIDIA/AMD/CPU), and auto-generate optimized launch profiles. lmux estimates VRAM for weights + KV cache (including hybrid SSM/attention models like Qwen3.5), maximizes GPU layers, and picks the best context size for your hardware. Browse and download models directly from HuggingFace with VRAM fit indicators per quant variant.
Project mode turns your local model into a coding agent with file read/write/edit, search, and sandboxed command execution. A research-backed self-planning pipeline (inspired by Self-Planning and BRAID) generates a step plan before execution:
- Retrieval -- searches the codebase for relevant files based on your request
- Planning -- generates a numbered step plan with no tools (forces thinking before acting)
- Execution -- follows the plan using tools, with a verify-and-repair loop at the end
Commands run inside a Landlock sandbox with an approval flow. Background processes (dev servers, watchers) are tracked with auto-kill. Workspace UI includes file tree, git diff view, and a control panel with sampling sliders tuned to Qwen3.5 coding defaults.
Streaming chat with markdown, KaTeX math, syntax highlighting, and <think> block support. Built-in web search (SearXNG) and URL fetch tools. Multi-turn editing, image/vision support, adjustable sampling and reasoning budget, conversation tags, export, and prompt presets.
- HuggingFace search, trending, download with progress/resume
- Per-model system prompts with template variables
- KV cache persistence across sessions
- Server log viewer, model comparison page
- Keyboard shortcuts (Ctrl+N, Escape, etc.)
- Bun (runtime and package manager)
- llama.cpp server binary (
llama-server) - Linux (hardware detection reads
/proc/cpuinfo,/proc/meminfo, usesnvidia-smi/rocm-smi) - Optional: landlock-restrict for sandboxed command execution in coding agent (
go install github.com/landlock-lsm/go-landlock/cmd/landlock-restrict@latest) - Optional: ripgrep (
rg) for file search in coding agent
# Install dependencies
bun install
# Start dev server
bun run dev
# Open http://localhost:5173On first launch, lmux will:
- Create its database at
~/.local/share/lmux/lmux.db - Create a models directory at
~/.local/share/lmux/models/ - Auto-detect your hardware and llama-server path
Go to Settings to configure paths and your HuggingFace token, then browse Models to scan local files or Search to find models on HuggingFace.
bun run dev # Start dev server
bun run check # TypeScript type checking
bun run lint # Prettier + ESLint
bun run format # Auto-format
bun run test # Run testsSchema migrations use a synchronous runner built on bun:sqlite. SQL files live in src/lib/server/migrations/ and are applied automatically on startup. To add a migration, create a new numbered file like 012_add_column.sql.
- Runtime: Bun (bun:sqlite, Bun.spawn)
- Framework: SvelteKit with Svelte 5 runes
- Styling: Tailwind CSS v4 with dark theme
- Markdown: marked + DOMPurify + KaTeX
- Syntax Highlighting: highlight.js
- Sandbox: Landlock via go-landlock (optional)
MIT