Fast, local speech-to-text with a global hotkey (CLI version)
TomChat is a lightweight Linux CLI application that converts speech to text and types it into any application. Press a hotkey, speak, and your words appear. All processing happens locally - no cloud services required.
- Local Speech Recognition - NVIDIA Parakeet TDT 0.6B with ~6% WER (better than Whisper)
- Voice Activity Detection - Silero VAD auto-stops recording after silence
- Global Hotkey - Works in any application (Caps Lock by default)
- Text Refinement - Optional Ollama integration for fixing transcription errors
- Lightweight - Pure Rust CLI, minimal footprint
- OS: Linux (Ubuntu 22.04+ recommended)
- RAM: 4GB minimum
- Disk: ~200MB for models (INT8)
sudo apt update
sudo apt install -y \
build-essential \
libasound2-dev \
libssl-dev \
pkg-config \
wget
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/envgit clone https://github.com/sixteen-dev/tomchat.git
cd tomchat
cargo build --release# Run the download script
./scripts/download-parakeet.shThis downloads:
- Parakeet TDT 0.6B v2 (INT8) - ~180MB speech recognition model
- Silero VAD - ~2MB voice activity detection model
After downloading, your models/ directory should contain:
models/
├── sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/
│ ├── encoder.int8.onnx
│ ├── decoder.int8.onnx
│ ├── joiner.int8.onnx
│ └── tokens.txt
└── silero_vad.onnx
# Use the wrapper script (sets library path automatically)
./tomchat
# Or run directly with library path
LD_LIBRARY_PATH=./target/release ./target/release/tomchat- Press Caps Lock to start recording
- Speak into your microphone
- Wait for auto-stop (1.5s silence) or press Caps Lock again
- Text appears in the focused application
Edit config.toml to customize:
[hotkey]
combination = "caps" # Options: "caps", "ctrl+shift+space", "f24", etc.
[vad]
model_path = "./models/silero_vad.onnx"
sensitivity = "Normal" # Low, Normal, High, VeryHigh
timeout_ms = 1500 # Auto-stop after this much silence
auto_stop = true # Set false for manual stop only
[speech]
model_dir = "./models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8"
language = "en"
[text_refinement]
enabled = false # Enable for Ollama-based text cleanup
model_name = "gemma3:1b"
ollama_url = "http://localhost:11434"export TOMCHAT_MODEL_DIR="/path/to/models"
export TOMCHAT_HOTKEY="ctrl+alt+c"TomChat can use Ollama to fix transcription errors:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a small model
ollama pull gemma3:1b
# Enable in config.toml
[text_refinement]
enabled = trueThis fixes issues like:
- "cooper nettys" → "Kubernetes"
- Missing punctuation and capitalization
If the script doesn't work, download manually:
mkdir -p models
cd models
# Parakeet model
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
tar xjf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
# Silero VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx# Try a different hotkey in config.toml
combination = "ctrl+shift+space"
# Or use a function key
combination = "f24"# List audio devices
arecord -l
# Test recording
arecord -d 3 test.wav && aplay test.wav# Always use the wrapper script
./tomchat
# Or set LD_LIBRARY_PATH manually
LD_LIBRARY_PATH=./target/release ./target/release/tomchatRUST_LOG=debug ./tomchattomchat/
├── src/
│ ├── main.rs # Entry point
│ ├── app.rs # Application logic
│ ├── config.rs # Configuration handling
│ ├── audio/
│ │ ├── mod.rs # Audio capture
│ │ └── vad.rs # Voice activity detection
│ ├── speech/
│ │ └── transcriber.rs # Parakeet transcription
│ └── text_refinement/ # Optional Ollama integration
├── scripts/
│ └── download-parakeet.sh
├── config.toml # Default configuration
├── Cargo.toml
└── README.md
| Metric | Value |
|---|---|
| Startup | ~2-3s |
| Transcription | 0.5-2s |
| Memory | ~1.5GB |
| Model Size | ~200MB (INT8) |
- sherpa-rs - Rust bindings for sherpa-onnx
- cpal - Cross-platform audio
- global-hotkey - System-wide hotkeys
- enigo - Text injection
- tokio - Async runtime
- tomchat-app - GUI version with Tauri
MIT License - see LICENSE
Named after my dog Tommy - Built with Rust + sherpa-rs