A macOS menubar application that provides system-wide voice-to-text transcription using on-device Whisper models powered by WhisperKit.
- Privacy-first: All transcription runs on-device using CoreML; network access only for model downloads
- Low friction: Single hotkey to toggle (default:
Ctrl+V), minimal UI, instant output - Intelligent output: Punctuation inferred from speech timing and patterns, noise filtering removes artifacts
- Smart model management: Starts fast with a lightweight model, automatically upgrades to higher quality in the background
- Performance monitoring: Real-time factor tracking with suggestions when system is under load
- Unobtrusive: Small overlay, menubar-only presence, no dock icon
- macOS 13.0 (Ventura) or later
- Apple Silicon (M1+) required for WhisperKit CoreML acceleration
- Microphone access permission
- Accessibility permission (for global hotkey and paste functionality)
- Xcode 15.0 or later
- Swift 5.9 or later
cd voicey
# Build debug version
make build
# Build release version and create app bundle
make bundle
# Sign and install to /Applications
make install# Build with Swift Package Manager
swift build -c release
# The binary will be in .build/release/Voicey# Generate Xcode project
make xcode
# Open in Xcode
open Voicey.xcodeproj- Launch Voicey from Applications or build output
- Grant microphone permission when prompted
- Grant accessibility permission in System Settings > Privacy & Security > Accessibility
- Download a transcription model (Large v3 Turbo recommended for best speed/accuracy balance)
- Press
Ctrl+V(or your custom hotkey) to start recording - Speak naturally - the app will detect punctuation from your speech timing
- Press the hotkey again to stop and transcribe
- Press
ESCto cancel without transcribing
Access settings from the menubar icon:
- General: Output mode, launch at login, dock icon visibility
- Hotkey: Customize the recording hotkey
- Audio: Select input device, test microphone
- Model: Download/manage Whisper models
- Voice Commands: Enable optional voice commands (new line, new paragraph, scratch that)
| Model | Disk Size | Memory | Notes |
|---|---|---|---|
| Large v3 Turbo | ~1.5GB | ~3GB | Recommended - Fast & accurate, 8x faster than Large |
| Large v3 | ~3GB | ~6GB | Maximum accuracy, slower |
| Distil Large v3 | ~800MB | ~2GB | Distilled model, good balance |
| Small (English) | ~250MB | ~600MB | Fast, English-only |
| Base (English) | ~80MB | ~200MB | Very fast, basic accuracy |
| Tiny (English) | ~40MB | ~100MB | Fastest, lowest accuracy |
Note: First load of each model requires CoreML compilation (1-3 minutes). Subsequent loads are instant.
Voicey includes intelligent post-processing:
- Noise filtering: Removes Whisper artifacts like
[music],*click*, breathing sounds, etc. - Intelligent punctuation: Adds periods, commas, and question marks based on speech timing and patterns
- Text expansions: Converts common phrases (e.g., "etcetera" → "etc.", "mister" → "Mr.")
- Voice commands (optional): "new line", "new paragraph", "scratch that"
Voicey/
├── App/ # App entry, lifecycle, menubar
├── Audio/ # Audio capture and level monitoring
├── Transcription/ # WhisperKit engine, model management, post-processing
├── Output/ # Clipboard and keyboard simulation
├── UI/ # Overlay, settings, onboarding views
├── Input/ # Hotkey management and keybinding recorder
└── Utilities/ # Permissions, settings, notifications, logging
- WhisperKit - CoreML-optimized Whisper inference for Apple Silicon
- KeyboardShortcuts - Global hotkey management
| Permission | Purpose |
|---|---|
| Microphone | Audio capture for transcription |
| Accessibility | Global hotkey registration, paste simulation |
| Network | Model downloads from Hugging Face |
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for the Whisper model
- Argmax for WhisperKit
- Sindre Sorhus for KeyboardShortcuts