Voxt

A menu bar voice input and translation app for macOS. Press to talk, release to paste.

中文导航

Video

CleanShot.2026-02-28.at.09.16.52.mp4

macos install

Features

Global hotkey voice input from any app.
Two shortcut actions:
- Transcription (normal speech-to-text)
- Translation (speech-to-text then translation)
Two trigger modes: Long Press (Release to End) / Tap (Press to Toggle).
Two STT engines:
- MLX Audio (On-device) with local downloadable models
- Direct Dictation powered by Apple Speech
Two LLM paths:
- Apple Intelligence (Foundation Models)
- Custom LLM (local model)
Translation target languages: English / Chinese (Simplified) / Japanese / Korean / Spanish / French / German.
Live floating overlay: waveform, scrolling partial text, processing animation, completion state.
Smart output option: copy-only when no writable text input is focused.
Clipboard-safe paste flow: restores previous clipboard content.
Local transcription history with pagination, copy, delete, clear-all, and Normal / Translation tags.
Model download manager with progress, cancel, delete, size display, validation, and hf-mirror.com support.
System controls: microphone selection, interaction sounds, launch at login, show in Dock.

Implementation

CGEvent tap listens for global shortcuts (transcription and translation are separate).
AVAudioEngine captures audio and updates live levels.
Voxt picks the STT engine based on settings:
- MLX: staged correction (intermediate + final pass)
- Dictation: streaming SFSpeechRecognizer output
Text pipeline by mode:
- Transcription mode: optional enhancement (Off / Apple Intelligence / Custom LLM)
- Translation mode: optional enhancement first, then translate to target language
Output is injected with clipboard + simulated Cmd+V, and metadata can be saved to history.

Engines

STT Engines

Engine	Description	Strength	Typical Use
MLX Audio	Runs local MLX STT models	Offline, private, model-selectable	Privacy-focused and tunable setup
Direct Dictation	Apple Speech (`SFSpeechRecognizer`)	Zero setup	Fast onboarding without model download

Enhancement / Translation Engines

Engine	Tech Path	Strength	Notes
Apple Intelligence	`FoundationModels`	Native system experience, no extra LLM download	Depends on system availability
Custom LLM	Local `MLXLMCommon` + Hugging Face model	Fully local, customizable prompts	Requires model download first

Models

MLX STT Models

mlx-community/Qwen3-ASR-0.6B-4bit (default): balanced speed and quality, lower memory usage.
mlx-community/Qwen3-ASR-1.7B-bf16: quality-first, higher resource usage.
mlx-community/Voxtral-Mini-4B-Realtime-2602-fp16: realtime-oriented, larger footprint.
mlx-community/parakeet-tdt-0.6b-v3: lightweight and fast, especially good for English.
mlx-community/GLM-ASR-Nano-2512-4bit: smallest footprint for quick drafts.

Custom LLM Models

Qwen/Qwen2-1.5B-Instruct (default): general enhancement/translation with lower resource pressure.
Qwen/Qwen2.5-3B-Instruct: stronger formatting/reasoning with higher resource usage.

Model Comparison (Relative)

Notes: this table is a relative guide based on model positioning and common usage experience, not a fixed cross-device benchmark.

STT Model Comparison

Model	Speed	Accuracy	Resource Usage	Recommended For
Qwen3-ASR 0.6B (4bit)	Medium-High	Medium-High	Low	Daily default
Qwen3-ASR 1.7B (bf16)	Medium	High	High	Quality-first usage
Voxtral Realtime Mini 4B (fp16)	High	Medium-High	High	Realtime feedback priority
Parakeet 0.6B	High	Medium	Low	Fast English input
GLM-ASR Nano (4bit)	High	Medium-Low	Very Low	Low-resource devices / drafts

LLM Model Comparison

Model	Output Quality	Speed	Resource Usage	Recommended For
Qwen2 1.5B Instruct	Medium-High	High	Low-Medium	General enhancement and translation
Qwen2.5 3B Instruct	High	Medium	Medium-High	Better formatting and consistency

Install & Build

Requirements

macOS 26.0+
Microphone permission
Accessibility permission (global hotkeys and simulated paste)
Speech Recognition permission for Direct Dictation

Distribution

Download release directly:
- https://github.com/hehehai/voxt/releases/latest
Install steps:
1. Download and unzip the latest .zip package from the release page.
2. Drag Voxt.app into Applications.
3. Launch Voxt and grant required permissions on first run.
4. If Gatekeeper blocks launch, right-click Voxt.app -> Open.

Local Build

Open Voxt.xcodeproj in Xcode and run.
Or build from terminal:

xcodebuild -project Voxt.xcodeproj -scheme Voxt -destination 'platform=macOS' build

Thanks

mlx-audio-swift
Kaze
Apple Speech / FoundationModels / AppKit / SwiftUI

License

MIT, see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Voxt.xcodeproj		Voxt.xcodeproj
Voxt		Voxt
updates		updates
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voxt

Video

Features

Implementation

Engines

STT Engines

Enhancement / Translation Engines

Models

MLX STT Models

Custom LLM Models

Model Comparison (Relative)

STT Model Comparison

LLM Model Comparison

Install & Build

Requirements

Distribution

Local Build

Thanks

License

About

Uh oh!

Releases 6

Contributors 3

Languages

License

hehehai/voxt

Folders and files

Latest commit

History

Repository files navigation

Voxt

Video

Features

Implementation

Engines

STT Engines

Enhancement / Translation Engines

Models

MLX STT Models

Custom LLM Models

Model Comparison (Relative)

STT Model Comparison

LLM Model Comparison

Install & Build

Requirements

Distribution

Local Build

Thanks

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Contributors 3

Languages