Vox is the voice input layer for the system. It captures speech via push-to-talk, transcribes it locally with faster-whisper, and injects the text into the clipboard (and optionally into the focused window). No cloud calls; no silent failures.
From PyPI (no clone required; package name is vox-core because vox is taken on PyPI):
uvx vox-coreOr install the tool, then run it (the CLI command is still vox):
pip install vox-core
voxFrom source (development or latest):
git clone https://github.com/jeffrichley/vox.git && cd vox
uv sync
uv run voxOr install in editable mode: pip install -e . (use a venv), then vox.
Packaged binaries for Windows, macOS, and Linux are built on each GitHub Release. Download the archive for your platform (e.g. vox-<version>-windows-amd64.zip, vox-<version>-macos-arm64.zip, vox-<version>-linux-x86_64.tar.gz), unpack it, then run the binary:
- Windows: Unzip the archive, then run
vox.exefrom inside thevoxfolder (e.g..\vox\vox.exe --help). - macOS / Linux: Unzip or untar the archive, then run
./vox/voxfrom the extracted directory (e.g../vox/vox --help).
On first run, the Whisper model is downloaded automatically (see Transcription model). Binaries are built with PyInstaller and are not signed; you may see a security or Gatekeeper prompt on Windows or macOS.
- Config file:
~/.vox/vox.toml. Create the directory if needed:mkdir -p ~/.vox. - Override path: set
VOX_CONFIGto the full path of your config file. - Env overrides:
VOX_HOTKEY,VOX_DEVICE_ID,VOX_MODEL_SIZE,VOX_COMPUTE_TYPE,VOX_COMPUTE_DEVICE,VOX_INJECTION_MODE,VOX_TRAYoverride the same keys from the file.
Copy the example config and edit:
cp vox.toml.example ~/.vox/vox.toml
# Edit hotkey and optionally device_id, model_size, etc.voxorvox run— Start push-to-talk. By default a small window with a Stop button appears; withuse_tray = truein config (orVOX_TRAY=1), a system tray icon is shown instead—click the icon and choose Quit to stop. Press and hold your configured hotkey, speak, release; the audio is transcribed and placed on the clipboard (and optionally pasted into the focused window).vox devices— List audio input devices (ID, name, host API). Use this to choosedevice_idin config.vox test-mic [--device ID] [--seconds N]— Record for N seconds, play back the recording, then transcribe and print text. Default 2 seconds. Use to verify mic and model before usingvox.
- First run: The model is downloaded automatically from Hugging Face (size from config, default
base). No system FFmpeg required (PyAV is used). - CPU: Use
compute_type = "int8"in config for lower memory and faster inference. - GPU: Set
compute_device = "cuda"andcompute_type = "float16"(orint8) in config. Requires CUDA 12 and cuDNN 9. - Model size: Set
model_sizein config (e.g.tiny,base,small,medium,large-v3) for speed vs accuracy.
- Microphone: Required for capture. On Windows, allow app access to the microphone. On macOS, grant Microphone access when prompted.
- Accessibility / input injection: Only needed if you use
injection_mode = "clipboard_and_paste"(paste into focused window). On Windows, run the app with normal privileges; on macOS, grant Accessibility permission to Terminal (or the app runningvox run) so it can simulate paste.
A human can verify the MVP by:
- Install: From repo run
uv sync(orpip install -e .). - Configure: Copy
vox.toml.exampleto~/.vox/vox.toml; sethotkey(e.g.ctrl+shift+v) and optionallydevice_id,model_size,compute_type,injection_mode. - Run: Execute
uv run voxorvox(orvox run). A small “Vox” window with a Stop button appears, or a tray icon ifuse_trayis enabled. - Trigger: Focus any text field (or leave focus anywhere). Press and hold the configured hotkey, speak a short phrase, release the key.
- Verify: Paste from clipboard (Ctrl+V / Cmd+V) and see the transcribed phrase. If
injection_mode = "clipboard_and_paste", the text also appears in the focused field. - Stop: Click Stop in the Vox window (or close the window) to exit.
- Errors: If mic or model is missing, a clear error message appears (no silent failure).
- Quality gate:
just test quality(tests, format, lint, types, docstrings, security checks). - Tests:
just test(pytest). Unit tests undertests/unit/, integration undertests/integration/.