feat(llm): add local MLX LLM provider for fully offline voice input by erning · Pull Request #66 · missuo/koe

erning · 2026-04-07T06:39:56Z

Summary

Add local MLX LLM provider for on-device text correction on Apple Silicon, enabling Koe to run fully offline — both ASR and LLM correction — with no network access or API keys required
Add mode field to model manifests to distinguish ASR vs LLM models, with mode-based filtering in the Setup Wizard
Add MLX as a selectable LLM provider in the Setup Wizard with model picker, download/status controls

Motivation

Privacy-conscious users and those in restricted network environments can now use Koe entirely on-device. By pairing local ASR (MLX Qwen3-ASR or Apple Speech) with local LLM correction (MLX Qwen3), all voice data stays on the machine — nothing is sent to any server.

GPU memory for fully offline mode (MLX ASR + MLX LLM):

Lightest (ASR 0.6B + LLM 0.6B): ~1.2 GB — runs comfortably on any Apple Silicon Mac
Heaviest (ASR 1.7B + LLM 1.7B): ~2.9 GB — still fits easily in 8 GB unified memory

Performance: The 4-bit quantized Qwen3 models generate at hundreds of tokens per second on Apple Silicon via MLX. A typical voice input correction completes in under a second with the 0.6B model.

Changes

Model manifests:

Add Qwen3-0.6B-4bit and Qwen3-1.7B-4bit LLM model manifests
Add mode field ("asr" / "llm") to all MLX manifests for filtering
Add mode to ModelManifest struct and FFI JSON output

Setup Wizard:

ASR pane: filter model popup by mode == "asr"
LLM pane: add provider popup (OpenAI Compatible / MLX) with independent model selection, download, and status controls for MLX

Runtime:

Swift MLXLlmManager: model loading, text generation with thinking token stripping, KV cache cleanup
Swift C bridge: koe_mlx_llm_generate, koe_mlx_llm_free_string, koe_mlx_llm_unload_model
Rust MlxLlmProvider: implements LlmProvider trait via C FFI to Swift, with timeout support
LlmProvider trait migrated to #[async_trait] for dynamic dispatch (Box<dyn LlmProvider>)
run_session() dispatches to MLX or OpenAI provider based on llm.provider config
llm_is_ready() and warmup logic updated for MLX (no HTTP needed)
Skip interim history for MLX (small models don't benefit, reduces inference time)

Config:

New fields: llm.provider ("openai" default, or "mlx"), llm.mlx.model
Default config template updated

Test plan

Build: make build succeeds
LLM pane: selecting OpenAI shows API fields; selecting MLX shows model picker
ASR pane: only ASR models shown (no LLM models mixed in)
Set llm.provider: mlx, download model, dictate — local correction works
Set llm.provider: openai — existing cloud LLM flow unchanged
Fully offline: set both ASR and LLM to local providers, disable network — works end-to-end

- Add `mode` field to ModelManifest struct and FFI JSON output - Add `mode: "asr"` to existing ASR manifests for explicit filtering - Register Qwen3 LLM manifests in DEFAULT_MANIFESTS - Filter ASR model popup to only show mode=="asr" models - Add LLM provider popup (OpenAI Compatible / MLX) to wizard LLM pane - Add independent MLX model selection, download, and status controls - Save/load llm.provider and llm.mlx.model config keys

- Update LLM section to describe both OpenAI and MLX providers - Add provider/mlx config fields to config examples - Update architecture description and diagram - Add MLX LLM models to available models list - Add fully offline mode section with GPU memory estimates - Add mode field to manifest example - Note MLX LLM support in wizard, pipeline, and summary

erning added 4 commits April 7, 2026 14:01

feat(mlx): add Qwen3 0.6B and 1.7B 4-bit LLM model manifests

10bd1fd

feat(llm): add local MLX LLM provider for offline ASR text correction

6fc5372

missuo merged commit a9d6ef5 into missuo:main Apr 7, 2026

erning deleted the feature/local-mlx-llm branch April 8, 2026 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): add local MLX LLM provider for fully offline voice input#66

feat(llm): add local MLX LLM provider for fully offline voice input#66
missuo merged 4 commits into
missuo:mainfrom
erning:feature/local-mlx-llm

erning commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erning commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erning commented Apr 7, 2026 •

edited

Loading