Skip to content

perf(asr): cache MLX model across sessions to avoid repeated loads#30

Merged
missuo merged 1 commit into
missuo:mainfrom
erning:perf/mlx-model-cache
Mar 29, 2026
Merged

perf(asr): cache MLX model across sessions to avoid repeated loads#30
missuo merged 1 commit into
missuo:mainfrom
erning:perf/mlx-model-cache

Conversation

@erning

@erning erning commented Mar 29, 2026

Copy link
Copy Markdown
Collaborator

Problem

The MLX model is reloaded from disk on every session's connect() call. For Qwen3-ASR models (0.6B–1.7B parameters), this adds significant latency that dominates the "press and speak" interaction.

Solution

Track the loaded model path in MLXAsrManager. When loadModel is called with the same path that's already loaded, return immediately. The cached path is cleared on unloadModel.

Files changed

  • Packages/KoeMLX/Sources/KoeMLX/MLXAsrManager.swift

Test plan

  • make build passes
  • First session loads model (log: "model loaded from ...")
  • Subsequent sessions skip loading (log: "model already loaded ... reusing")
  • Changing model path in config triggers a fresh load

The MLX model was reloaded from disk on every session connect, adding
significant latency for Qwen3-ASR models.  Track the loaded model
path and skip loading when the same path is requested again.
@missuo missuo merged commit 5852917 into missuo:main Mar 29, 2026
@erning erning deleted the perf/mlx-model-cache branch March 29, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants