A menu bar voice input and translation app for macOS. Press to talk, release to paste.
CleanShot.2026-02-28.at.09.16.52.mp4
- Global hotkey voice input from any app.
- Two shortcut actions:
Transcription(normal speech-to-text)Translation(speech-to-text then translation)
- Two trigger modes:
Long Press (Release to End)/Tap (Press to Toggle). - Two STT engines:
MLX Audio (On-device)with local downloadable modelsDirect Dictationpowered by Apple Speech
- Two LLM paths:
Apple Intelligence (Foundation Models)Custom LLM(local model)
- Translation target languages: English / Chinese (Simplified) / Japanese / Korean / Spanish / French / German.
- Live floating overlay: waveform, scrolling partial text, processing animation, completion state.
- Smart output option: copy-only when no writable text input is focused.
- Clipboard-safe paste flow: restores previous clipboard content.
- Local transcription history with pagination, copy, delete, clear-all, and
Normal / Translationtags. - Model download manager with progress, cancel, delete, size display, validation, and
hf-mirror.comsupport. - System controls: microphone selection, interaction sounds, launch at login, show in Dock.
CGEvent taplistens for global shortcuts (transcription and translation are separate).AVAudioEnginecaptures audio and updates live levels.- Voxt picks the STT engine based on settings:
- MLX: staged correction (intermediate + final pass)
- Dictation: streaming
SFSpeechRecognizeroutput
- Text pipeline by mode:
- Transcription mode: optional enhancement (Off / Apple Intelligence / Custom LLM)
- Translation mode: optional enhancement first, then translate to target language
- Output is injected with clipboard + simulated
Cmd+V, and metadata can be saved to history.
| Engine | Description | Strength | Typical Use |
|---|---|---|---|
| MLX Audio | Runs local MLX STT models | Offline, private, model-selectable | Privacy-focused and tunable setup |
| Direct Dictation | Apple Speech (SFSpeechRecognizer) |
Zero setup | Fast onboarding without model download |
| Engine | Tech Path | Strength | Notes |
|---|---|---|---|
| Apple Intelligence | FoundationModels |
Native system experience, no extra LLM download | Depends on system availability |
| Custom LLM | Local MLXLMCommon + Hugging Face model |
Fully local, customizable prompts | Requires model download first |
mlx-community/Qwen3-ASR-0.6B-4bit(default): balanced speed and quality, lower memory usage.mlx-community/Qwen3-ASR-1.7B-bf16: quality-first, higher resource usage.mlx-community/Voxtral-Mini-4B-Realtime-2602-fp16: realtime-oriented, larger footprint.mlx-community/parakeet-tdt-0.6b-v3: lightweight and fast, especially good for English.mlx-community/GLM-ASR-Nano-2512-4bit: smallest footprint for quick drafts.
Qwen/Qwen2-1.5B-Instruct(default): general enhancement/translation with lower resource pressure.Qwen/Qwen2.5-3B-Instruct: stronger formatting/reasoning with higher resource usage.
Notes: this table is a relative guide based on model positioning and common usage experience, not a fixed cross-device benchmark.
| Model | Speed | Accuracy | Resource Usage | Recommended For |
|---|---|---|---|---|
| Qwen3-ASR 0.6B (4bit) | Medium-High | Medium-High | Low | Daily default |
| Qwen3-ASR 1.7B (bf16) | Medium | High | High | Quality-first usage |
| Voxtral Realtime Mini 4B (fp16) | High | Medium-High | High | Realtime feedback priority |
| Parakeet 0.6B | High | Medium | Low | Fast English input |
| GLM-ASR Nano (4bit) | High | Medium-Low | Very Low | Low-resource devices / drafts |
| Model | Output Quality | Speed | Resource Usage | Recommended For |
|---|---|---|---|---|
| Qwen2 1.5B Instruct | Medium-High | High | Low-Medium | General enhancement and translation |
| Qwen2.5 3B Instruct | High | Medium | Medium-High | Better formatting and consistency |
- macOS
26.0+ - Microphone permission
- Accessibility permission (global hotkeys and simulated paste)
- Speech Recognition permission for
Direct Dictation
- Download release directly:
- Install steps:
- Download and unzip the latest
.zippackage from the release page. - Drag
Voxt.appintoApplications. - Launch
Voxtand grant required permissions on first run. - If Gatekeeper blocks launch, right-click
Voxt.app->Open.
- Download and unzip the latest
- Open
Voxt.xcodeprojin Xcode and run. - Or build from terminal:
xcodebuild -project Voxt.xcodeproj -scheme Voxt -destination 'platform=macOS' build- mlx-audio-swift
- Kaze
- Apple
Speech/FoundationModels/ AppKit / SwiftUI
MIT, see LICENSE.