Open Typeless. Local Typeless. Typeless in your box.
A macOS menu bar voice input tool — hold a hotkey, speak, and your words are recognized, polished, and pasted into the active app. Your voice, straight to text. No cloud. No typing.
The original algorithm and code are based on the generous contribution of @jiamingkong. We just wanted to see if we could build a typeless in ten minutes.
- On-device inference — Powered by Apple Silicon MLX, no cloud dependency, privacy-first
- Speech recognition — Qwen3-ASR-0.6B-4bit, supports Chinese and English
- Text polishing — Qwen3-4B-Instruct, removes filler words, adds punctuation, smart formatting
- Customizable prompts — 4 preset templates (strict/light/meeting/tech) or write your own system prompt
- Selection edit mode — Select text, speak a command ("fix the typo", "make it casual"), and it's done
- Floating status indicator — Always-on-top overlay showing recording/processing state with audio level meter
- Global hotkey — Customizable key recorder, Push-to-Talk / Toggle modes
- Audio device selection — Pick your input device, defaults to built-in microphone
- Auto-paste — Injects text via Accessibility API (Cmd+V simulation)
- Vocabulary learning — Edit polished text in history, system learns corrections for future use
- Idle memory management — Auto-unload models after inactivity, reload on demand
All inference runs on-device via Apple Silicon GPU. No network required after model download.
| Stage | Latency | Notes |
|---|---|---|
| ASR (3-5s audio) | 0.07 - 0.18s | 17-51x faster than real-time |
| LLM polish (short text) | 0.35 - 0.50s | ~30 chars input |
| LLM polish (long text) | 1.1 - 1.2s | ~120 chars input |
| Full pipeline | ~1s | ASR + LLM combined (models warm) |
| ASR model load | 2s | Cold start, one-time |
| LLM model load | 10s | Cold start, one-time — bottleneck |
Memory usage:
| State | RSS |
|---|---|
| ASR model loaded | ~1.6 GB |
| Both models loaded | ~5.4 GB |
Full benchmark details and reproduction steps: docs/BENCHMARK.md
Run
make benchmarkto reproduce on your machine.
- macOS 26.2+
- Apple Silicon (M1/M2/M3/M4)
- Xcode 26.3+
- 16 GB RAM recommended (8 GB works with lightweight model — coming soon)
- ~3 GB disk space (model files)
# Clone the project
git clone https://github.com/platx-ai/Talk.git
cd Talk
# Full setup: resolve dependencies + download models
make setup
# Run
make runmake build # Debug build
make build-release # Release build
make test # Run unit tests (43 tests)
make benchmark # Run performance benchmarks
make run # Build and run
make clean # Clean build artifacts
make resolve # Resolve SPM dependencies only
make download-models # Download ML models from HuggingFace
make setup # Full setup: resolve + download models
make lint # Run SwiftLint (if installed)Record(AVAudioEngine) → ASR(Qwen3-ASR) → LLM Polish(Qwen3-4B) → Text Inject(Cmd+V)
↑ 0.1s 0.5s ↑
CoreAudio Accessibility
Device Selection API Permission
| Module | Responsibility |
|---|---|
Audio/ |
Recording engine, global hotkeys (Carbon API), audio device management, text injection |
ASR/ |
Speech recognition (MLXAudioSTT + Qwen3-ASR) |
LLM/ |
Text polishing (MLXLLM + Qwen3-4B-Instruct) |
Models/ |
Data models (AppSettings, HotKeyCombo, HistoryItem) |
Data/ |
History and vocabulary JSON persistence |
UI/ |
SwiftUI menu bar, settings panel, key recorder, floating indicator, history browser |
Utils/ |
Logging system, Metal runtime validation |
All dependencies managed via Swift Package Manager, pinned to specific commits:
| Package | Source | Purpose |
|---|---|---|
| mlx-swift | ml-explore/mlx-swift | MLX core array operations |
| mlx-swift-lm | ml-explore/mlx-swift-lm | LLM inference framework |
| mlx-audio-swift | platx-ai/mlx-audio-swift (fork) | Audio STT framework |
| swift-huggingface | huggingface/swift-huggingface | Model downloading |
mlx-audio-swift uses the platx-ai fork to fix an upstream bug where MLXAudioCodecs is missing the MLXFast dependency.
| Model | Size | Load Time | Memory | Purpose |
|---|---|---|---|---|
| Qwen3-ASR-0.6B-4bit | ~400 MB | 2s | ~1.6 GB | Speech recognition |
| Qwen3-4B-Instruct-2507-4bit | ~2.5 GB | 10s | ~4 GB | Text polishing |
Models are automatically downloaded from HuggingFace on first run to ~/.cache/huggingface/. Pre-download with make download-models.
Talk learns from your corrections to improve future polishing.
How it works: When you edit polished text in the history view, the system records the mapping (original -> corrected). The top 20 learned corrections are injected into the LLM system prompt as learned corrections, so the model applies them automatically in future polishing.
Usage:
- Automatic learning -- Edit any polished text in the history view. The system learns the correction automatically.
- Manual entry -- Settings -> Advanced -> Personal Vocabulary -> Manage Vocabulary. Add original words and their corrected forms.
- Import/Export -- JSON format. Use Manage Vocabulary to export for backup or import to share across machines.
Example: If ASR repeatedly outputs "la laam" but you correct it to "LLM", the system learns this mapping. Future polishing will automatically correct "la laam" to "LLM" without manual editing.
On first launch, you need to grant:
- Microphone — macOS will prompt automatically
- Accessibility — Manually add Talk.app in System Settings → Privacy & Security → Accessibility
# Open in Xcode
open Talk.xcodeproj
# Set your signing team: Xcode → Signing & Capabilities → Team
# Build & Run: ⌘Rmake test # 43 unit tests (HotKeyCombo, AppSettings, AudioDevice, FloatingIndicator, VocabularyManager)
make benchmark # Performance benchmarks (ASR/LLM load, inference, pipeline, memory)All changes require tests. Bugs require regression tests before fixing. See CLAUDE.md for testing rules.
DEVELOPMENT_TEAM is left empty in the project. Each developer sets their own signing team in Xcode. CLI builds use ad-hoc signing.
See ROADMAP.md for the full product roadmap.
Next up
- Custom lightweight polish model (0.5-1.5B) — < 1s load, < 1 GB memory
- Real-time transcription preview overlay
- Model auto-select by hardware (8 GB → lightweight, 16 GB+ → full)
Mid-term
- Project-aware vocabulary & prompt profiles (per-repo
.talk/config) - iCloud vocabulary sync across devices
- iOS companion app with offline on-device inference
Long-term
- Team shared terminology libraries
- Plugin system for custom post-processing pipelines