feat(llm): add local MLX LLM provider for fully offline voice input#66
Merged
Conversation
- Add `mode` field to ModelManifest struct and FFI JSON output - Add `mode: "asr"` to existing ASR manifests for explicit filtering - Register Qwen3 LLM manifests in DEFAULT_MANIFESTS - Filter ASR model popup to only show mode=="asr" models - Add LLM provider popup (OpenAI Compatible / MLX) to wizard LLM pane - Add independent MLX model selection, download, and status controls - Save/load llm.provider and llm.mlx.model config keys
- Update LLM section to describe both OpenAI and MLX providers - Add provider/mlx config fields to config examples - Update architecture description and diagram - Add MLX LLM models to available models list - Add fully offline mode section with GPU memory estimates - Add mode field to manifest example - Note MLX LLM support in wizard, pipeline, and summary
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
modefield to model manifests to distinguish ASR vs LLM models, with mode-based filtering in the Setup WizardMotivation
Privacy-conscious users and those in restricted network environments can now use Koe entirely on-device. By pairing local ASR (MLX Qwen3-ASR or Apple Speech) with local LLM correction (MLX Qwen3), all voice data stays on the machine — nothing is sent to any server.
GPU memory for fully offline mode (MLX ASR + MLX LLM):
Performance: The 4-bit quantized Qwen3 models generate at hundreds of tokens per second on Apple Silicon via MLX. A typical voice input correction completes in under a second with the 0.6B model.
Changes
Model manifests:
modefield ("asr"/"llm") to all MLX manifests for filteringmodetoModelManifeststruct and FFI JSON outputSetup Wizard:
mode == "asr"Runtime:
MLXLlmManager: model loading, text generation with thinking token stripping, KV cache cleanupkoe_mlx_llm_generate,koe_mlx_llm_free_string,koe_mlx_llm_unload_modelMlxLlmProvider: implementsLlmProvidertrait via C FFI to Swift, with timeout supportLlmProvidertrait migrated to#[async_trait]for dynamic dispatch (Box<dyn LlmProvider>)run_session()dispatches to MLX or OpenAI provider based onllm.providerconfigllm_is_ready()and warmup logic updated for MLX (no HTTP needed)Config:
llm.provider("openai"default, or"mlx"),llm.mlx.modelTest plan
make buildsucceedsllm.provider: mlx, download model, dictate — local correction worksllm.provider: openai— existing cloud LLM flow unchanged