Skip to content

feat(llm): add local MLX LLM provider for fully offline voice input#66

Merged
missuo merged 4 commits into
missuo:mainfrom
erning:feature/local-mlx-llm
Apr 7, 2026
Merged

feat(llm): add local MLX LLM provider for fully offline voice input#66
missuo merged 4 commits into
missuo:mainfrom
erning:feature/local-mlx-llm

Conversation

@erning

@erning erning commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add local MLX LLM provider for on-device text correction on Apple Silicon, enabling Koe to run fully offline — both ASR and LLM correction — with no network access or API keys required
  • Add mode field to model manifests to distinguish ASR vs LLM models, with mode-based filtering in the Setup Wizard
  • Add MLX as a selectable LLM provider in the Setup Wizard with model picker, download/status controls

Motivation

Privacy-conscious users and those in restricted network environments can now use Koe entirely on-device. By pairing local ASR (MLX Qwen3-ASR or Apple Speech) with local LLM correction (MLX Qwen3), all voice data stays on the machine — nothing is sent to any server.

GPU memory for fully offline mode (MLX ASR + MLX LLM):

  • Lightest (ASR 0.6B + LLM 0.6B): ~1.2 GB — runs comfortably on any Apple Silicon Mac
  • Heaviest (ASR 1.7B + LLM 1.7B): ~2.9 GB — still fits easily in 8 GB unified memory

Performance: The 4-bit quantized Qwen3 models generate at hundreds of tokens per second on Apple Silicon via MLX. A typical voice input correction completes in under a second with the 0.6B model.

Changes

Model manifests:

  • Add Qwen3-0.6B-4bit and Qwen3-1.7B-4bit LLM model manifests
  • Add mode field ("asr" / "llm") to all MLX manifests for filtering
  • Add mode to ModelManifest struct and FFI JSON output

Setup Wizard:

  • ASR pane: filter model popup by mode == "asr"
  • LLM pane: add provider popup (OpenAI Compatible / MLX) with independent model selection, download, and status controls for MLX

Runtime:

  • Swift MLXLlmManager: model loading, text generation with thinking token stripping, KV cache cleanup
  • Swift C bridge: koe_mlx_llm_generate, koe_mlx_llm_free_string, koe_mlx_llm_unload_model
  • Rust MlxLlmProvider: implements LlmProvider trait via C FFI to Swift, with timeout support
  • LlmProvider trait migrated to #[async_trait] for dynamic dispatch (Box<dyn LlmProvider>)
  • run_session() dispatches to MLX or OpenAI provider based on llm.provider config
  • llm_is_ready() and warmup logic updated for MLX (no HTTP needed)
  • Skip interim history for MLX (small models don't benefit, reduces inference time)

Config:

  • New fields: llm.provider ("openai" default, or "mlx"), llm.mlx.model
  • Default config template updated

Test plan

  • Build: make build succeeds
  • LLM pane: selecting OpenAI shows API fields; selecting MLX shows model picker
  • ASR pane: only ASR models shown (no LLM models mixed in)
  • Set llm.provider: mlx, download model, dictate — local correction works
  • Set llm.provider: openai — existing cloud LLM flow unchanged
  • Fully offline: set both ASR and LLM to local providers, disable network — works end-to-end

erning added 4 commits April 7, 2026 14:01
- Add `mode` field to ModelManifest struct and FFI JSON output
- Add `mode: "asr"` to existing ASR manifests for explicit filtering
- Register Qwen3 LLM manifests in DEFAULT_MANIFESTS
- Filter ASR model popup to only show mode=="asr" models
- Add LLM provider popup (OpenAI Compatible / MLX) to wizard LLM pane
- Add independent MLX model selection, download, and status controls
- Save/load llm.provider and llm.mlx.model config keys
- Update LLM section to describe both OpenAI and MLX providers
- Add provider/mlx config fields to config examples
- Update architecture description and diagram
- Add MLX LLM models to available models list
- Add fully offline mode section with GPU memory estimates
- Add mode field to manifest example
- Note MLX LLM support in wizard, pipeline, and summary
@missuo missuo merged commit a9d6ef5 into missuo:main Apr 7, 2026
@erning erning deleted the feature/local-mlx-llm branch April 8, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants