VoiceFlow

Lightning-fast voice-to-text for macOS
Hold Option+Space, speak, release to paste. It's that simple.

Features • Installation • Usage • Voice Commands • Building

Features

Push-to-talk — Hold ⌥ Space to record, release to transcribe and paste
100% local — All processing on-device with Metal GPU acceleration
Multiple STT engines — Moonshine (ONNX), Whisper (whisper.cpp), and Qwen3-ASR
Multiple LLM backends — mistral.rs and llama.cpp for wide model coverage
Smart formatting — Automatic punctuation, capitalization, em-dashes, bullet lists
Voice commands — Say "new paragraph", "bullet point", "question mark" and more
Filler word removal — "um", "uh", "hmm" removed automatically
Customizable dictionary — Editable replacement file for technical terms and proper nouns
Context-aware — Reads cursor context via Accessibility API for seamless spacing
Application detection — Adapts formatting for email, Slack, code editors
Menu bar app — Minimal footprint, no dock icon

How It Works

VoiceFlow supports two pipeline modes:

STT + LLM (default) — Separate speech-to-text and formatting stages:

Audio → STT Engine (Moonshine/Whisper) → Prosody Processing → LLM Formatting → Output

Consolidated — A single model handles both transcription and formatting:

Audio → Qwen3-ASR (Python daemon) → Post-processing → Output

┌──────────────────────────────────────────────────────┐
│               macOS SwiftUI App                      │
│   Menu bar, hotkeys, audio recording, settings       │
└─────────────────┬────────────────────────────────────┘
                  │ FFI (C bindings)
┌─────────────────▼────────────────────────────────────┐
│            voiceflow-ffi                             │
│   C-compatible API, panic safety                     │
└─────────────────┬────────────────────────────────────┘
                  │
┌─────────────────▼────────────────────────────────────┐
│           voiceflow-core                             │
│                                                      │
│  Pipeline: Audio → STT → Prosody → LLM → Output     │
│                                                      │
│  STT Engines:         LLM Backends:    Prosody:      │
│  - Moonshine (ONNX)   - mistral.rs    - Voice cmds  │
│  - Whisper (cpp)       - llama.cpp     - Pause det.  │
│  - Qwen3-ASR (ext.)                   - Pitch det.  │
│                                        - Filler rem. │
│                                        - Replacements│
└──────────────────────────────────────────────────────┘

Installation

Requirements

macOS 13.0 (Ventura) or later
Apple Silicon (M1/M2/M3/M4) recommended for best performance

Download

Download the latest VoiceFlow.dmg from Releases
Open the DMG and drag VoiceFlow to Applications
Launch VoiceFlow from Applications
Grant Microphone and Accessibility permissions when prompted

⚠️ Important: VoiceFlow is not yet notarized by Apple.

macOS Gatekeeper will block the app on first launch. To open it:

Option A: Right-click VoiceFlow.app → click Open → click Open again in the dialog

Option B: Go to System Settings → Privacy & Security, scroll down, and click Open Anyway

This only needs to be done once. After that, the app opens normally.

Building from Source

Prerequisites

Rust 1.70+ — Install via rustup
Xcode Command Line Tools — xcode-select --install
Swift 5.9+ — Included with Xcode
For Qwen3-ASR mode: Python 3.10+ with pip install qwen-asr torch soundfile

Build

# Clone the repository
git clone https://github.com/Era-Laboratories/voiceflow.git
cd voiceflow

# Build the Rust library
cargo build --release

# Build the macOS app
cd VoiceFlowApp
./build.sh

# Run
open build/VoiceFlow.app

Download Models

# Download default models (Moonshine Base + Qwen3 1.7B)
cargo run -p voiceflow-cli -- setup

# Specify different models
cargo run -p voiceflow-cli -- setup --whisper base --llm qwen3-4b

# Download all models for benchmarking
cargo run -p voiceflow-cli -- setup --benchmark

Usage

Launch VoiceFlow — Look for the icon in your menu bar
Hold ⌥ Space — Start speaking
Release ⌥ Space — Text is transcribed, formatted, and pasted at your cursor

Status Indicator

The menu bar icon changes color to show status:

Color	State
Default	Ready
Red	Recording
Yellow	Processing

Settings

Access settings from the menu bar icon. Configure your STT engine, LLM model, pipeline mode, and hotkey preferences.

CLI Reference

The CLI provides full access to VoiceFlow's capabilities:

cargo run -p voiceflow-cli -- <command>

Command	Description	Key Flags
`record`	Record from microphone and transcribe	`--clipboard`, `--context <type>`, `--raw`
`file <path>`	Transcribe an audio file	`--context <type>`, `--raw`
`setup`	Download required models	`--whisper <size>`, `--llm <model>`, `--benchmark`
`config show`	Show current configuration
`config set-model <model>`	Set the LLM model
`config set-whisper <size>`	Set the Whisper model size
`config set-mode <mode>`	Set pipeline mode	`stt-plus-llm` or `consolidated`
`config set-consolidated-model <model>`	Set the consolidated model	`qwen3-asr-0.6b` or `qwen3-asr-1.7b`
`config add-word <word>`	Add to personal dictionary
`config path`	Show config file path
`bench`	Run performance benchmark	`--iterations <n>`, `--file <path>`
`eval`	Evaluate transcription quality (LibriSpeech)	`--limit <n>`, `--samples`, `--raw`, `--analyze`, `--stt <model>`, `--llm <model>`, `--benchmark`
`models`	List available models

All commands support --verbose for debug output and --config <path> for a custom config file.

Voice Commands

Punctuation

Say	Output
"period" / "full stop"	`.`
"comma"	`,`
"question mark"	`?`
"exclamation mark" / "bang"	`!`
"colon"	`:`
"semicolon"	`;`
"ellipsis" / "dot dot dot"	`...`

Formatting

Say	Output
"new line" / "line break"	Line break
"new paragraph"	Paragraph break
"open quote" / "close quote"	`"`
"apostrophe"	`'`
"dash" / "em dash"	`—`
"hyphen"	`-`

Brackets & Grouping

Say	Output
"open paren" / "close paren"	`(` `)`
"open bracket" / "close bracket"	`[` `]`
"open brace" / "close brace"	`{` `}`

Special Characters

Say	Output
"ampersand"	`&`
"at sign"	`@`
"hashtag" / "hash"	`#`
"dollar sign"	`$`
"percent"	`%`
"asterisk" / "star"	`*`
"underscore"	`_`
"slash" / "forward slash"	`/`
"backslash"	`\`

Programming

Say	Output
"equals"	`=`
"plus"	`+`
"minus"	`-`
"greater than"	`>`
"less than"	`<`
"pipe"	`\|`
"tilde"	`~`
"caret"	`^`

Automatic Features

Punctuation — Added automatically based on speech patterns
Capitalization — Sentences capitalized after punctuation
Lists — Enumerated items converted to bullet points
Em-dashes — Mid-sentence pauses become — dashes
Filler removal — "um", "uh", "ah", "hmm", "er" removed
Spelled-out words — "S M O L L M" becomes "SMOLLM"
Technical terms — Configurable replacement dictionary (e.g., "G P T" → "GPT")

Supported Models

STT Engines

Moonshine (ONNX Runtime — default):

Model	Parameters	Size	Notes
Moonshine Tiny	27M	~190 MB	Fastest
Moonshine Base	62M	~400 MB	Default, best balance

Whisper (whisper.cpp):

Model	Parameters	Size	Expected WER
Tiny	39M	75 MB	~7.5%
Base	74M	142 MB	~5.0%
Small	244M	466 MB	~4.2%
Medium	769M	1.5 GB	~3.5%
Large V3	1.5B	3.0 GB	~2.9%
Large V3 Turbo	809M	1.6 GB	~3.0%
Distil-Large V3	756M	1.5 GB	~3.5%

Qwen3-ASR (Python daemon, consolidated mode):

Model	Parameters	Size
Qwen3-ASR 0.6B	0.6B	~1.2 GB
Qwen3-ASR 1.7B	1.7B	~3.4 GB

LLM Models

Model	Backend	Size (Q4_K_M)	License
Qwen3 1.7B	mistral.rs	1.3 GB	Apache 2.0
Qwen3 4B	mistral.rs	2.5 GB	Apache 2.0
SmolLM3 3B	llama.cpp	1.9 GB	Apache 2.0
Gemma 2 2B	mistral.rs	1.7 GB	Gemma
Gemma 3n E2B	llama.cpp	1.8 GB	Gemma
Gemma 3n E4B	llama.cpp	3.2 GB	Gemma
Phi-4 Mini 3.8B	llama.cpp	2.4 GB	MIT
Phi-2	mistral.rs	1.6 GB	MIT

Custom GGUF models are also supported via llama.cpp.

Configuration

VoiceFlow stores its configuration in a TOML file:

~/Library/Application Support/com.era-laboratories.voiceflow/config.toml

Key Settings

# Pipeline mode: "stt-plus-llm" (default) or "consolidated"
pipeline_mode = "stt-plus-llm"

# STT engine: "moonshine" (default), "whisper", or "qwen3-asr"
stt_engine = "moonshine"
moonshine_model = "base"
whisper_model = "base"

# Consolidated mode model (used when pipeline_mode = "consolidated")
consolidated_model = "qwen3-asr-0-6b"

# LLM model for formatting
llm_model = "qwen3-1-7b"

# LLM generation parameters
[llm_options]
max_tokens = 512
temperature = 0.3
top_p = 0.9
n_gpu_layers = -1
enable_thinking = false

# Audio settings
[audio]
sample_rate = 44100
vad_threshold = 0.01
silence_duration_ms = 800

# Default context for formatting
default_context = "default"

# Auto-copy to clipboard
auto_clipboard = true

Context Types

Context	Behavior
`default`	General dictation with standard formatting
`email`	Email-appropriate tone and structure
`slack`	Casual, chat-style formatting
`code`	Code-aware formatting, preserves technical terms

Environment Variable Overrides

All settings can be overridden via environment variables prefixed with VOICEFLOW_:

VOICEFLOW_STT_ENGINE=whisper
VOICEFLOW_LLM_MODEL=qwen3-4b
VOICEFLOW_PIPELINE_MODE=consolidated
VOICEFLOW_LLM_TEMPERATURE=0.5

File Paths

Path	Contents
`~/Library/Application Support/com.era-laboratories.voiceflow/config.toml`	Configuration
`~/Library/Application Support/com.era-laboratories.voiceflow/models/`	Downloaded ML models
`~/Library/Application Support/com.era-laboratories.voiceflow/prompts/`	Custom prompt templates

Project Structure

voiceflow/
├── crates/
│   ├── voiceflow-core/          # Core Rust library
│   │   ├── src/
│   │   │   ├── config.rs        # Configuration management
│   │   │   ├── pipeline.rs      # Main processing pipeline
│   │   │   ├── llm/             # LLM inference (mistral.rs + llama.cpp backends)
│   │   │   ├── transcribe/      # STT engines (Whisper, Moonshine)
│   │   │   ├── prosody/         # Voice commands, filler removal, pauses, pitch
│   │   │   └── audio/           # Audio capture and resampling
│   │   └── Cargo.toml
│   ├── voiceflow-cli/           # Command-line interface
│   │   └── src/commands/        # record, file, setup, config, bench, eval
│   └── voiceflow-ffi/           # C FFI for Swift bindings
├── VoiceFlowApp/                # macOS SwiftUI application
│   ├── Sources/VoiceFlowApp/    # Swift UI, audio recording, hotkeys
│   └── build.sh                 # App bundle build script
├── prompts/                     # LLM system prompts & replacement dictionary
│   ├── default.txt              # Default formatting prompt
│   └── replacements.toml        # Customizable word replacements
├── scripts/
│   └── qwen3_asr_daemon.py      # Python daemon for Qwen3-ASR
└── Cargo.toml                   # Workspace configuration

Contributing

See CONTRIBUTING.md for development setup, architecture details, and guidelines.

License

MIT License — Era Laboratories 2024

See LICENSE for details.

Made with care for fast typists who'd rather talk

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
VoiceFlowApp		VoiceFlowApp
crates		crates
docs		docs
img		img
prompts		prompts
scripts		scripts
site		site
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
navbar-dark.png		navbar-dark.png
navbar-light.png		navbar-light.png
test_audio.wav		test_audio.wav

License

Era-Laboratories/voiceflow

Folders and files

Latest commit

History

Repository files navigation

VoiceFlow

Features

How It Works

Installation

Requirements

Download

Building from Source

Prerequisites

Build

Download Models

Usage

Status Indicator

Settings

CLI Reference

Voice Commands

Punctuation

Formatting

Brackets & Grouping

Special Characters

Programming

Automatic Features

Supported Models

STT Engines

LLM Models

Configuration

Key Settings

Context Types

Environment Variable Overrides

File Paths

Project Structure

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages