TomChat

Fast, local speech-to-text with a global hotkey (CLI version)

TomChat is a lightweight Linux CLI application that converts speech to text and types it into any application. Press a hotkey, speak, and your words appear. All processing happens locally - no cloud services required.

Features

Local Speech Recognition - NVIDIA Parakeet TDT 0.6B with ~6% WER (better than Whisper)
Voice Activity Detection - Silero VAD auto-stops recording after silence
Global Hotkey - Works in any application (Caps Lock by default)
Text Refinement - Optional Ollama integration for fixing transcription errors
Lightweight - Pure Rust CLI, minimal footprint

Requirements

OS: Linux (Ubuntu 22.04+ recommended)
RAM: 4GB minimum
Disk: ~200MB for models (INT8)

Quick Start

1. Install Dependencies

sudo apt update
sudo apt install -y \
    build-essential \
    libasound2-dev \
    libssl-dev \
    pkg-config \
    wget

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

2. Clone and Build

git clone https://github.com/sixteen-dev/tomchat.git
cd tomchat
cargo build --release

3. Download Models

# Run the download script
./scripts/download-parakeet.sh

This downloads:

Parakeet TDT 0.6B v2 (INT8) - ~180MB speech recognition model
Silero VAD - ~2MB voice activity detection model

After downloading, your models/ directory should contain:

models/
├── sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8/
│   ├── encoder.int8.onnx
│   ├── decoder.int8.onnx
│   ├── joiner.int8.onnx
│   └── tokens.txt
└── silero_vad.onnx

4. Run

# Use the wrapper script (sets library path automatically)
./tomchat

# Or run directly with library path
LD_LIBRARY_PATH=./target/release ./target/release/tomchat

5. Use

Press Caps Lock to start recording
Speak into your microphone
Wait for auto-stop (1.5s silence) or press Caps Lock again
Text appears in the focused application

Configuration

Edit config.toml to customize:

[hotkey]
combination = "caps"  # Options: "caps", "ctrl+shift+space", "f24", etc.

[vad]
model_path = "./models/silero_vad.onnx"
sensitivity = "Normal"  # Low, Normal, High, VeryHigh
timeout_ms = 1500       # Auto-stop after this much silence
auto_stop = true        # Set false for manual stop only

[speech]
model_dir = "./models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8"
language = "en"

[text_refinement]
enabled = false         # Enable for Ollama-based text cleanup
model_name = "gemma3:1b"
ollama_url = "http://localhost:11434"

Environment Variables

export TOMCHAT_MODEL_DIR="/path/to/models"
export TOMCHAT_HOTKEY="ctrl+alt+c"

Text Refinement (Optional)

TomChat can use Ollama to fix transcription errors:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a small model
ollama pull gemma3:1b

# Enable in config.toml
[text_refinement]
enabled = true

This fixes issues like:

"cooper nettys" → "Kubernetes"
Missing punctuation and capitalization

Manual Model Download

If the script doesn't work, download manually:

mkdir -p models
cd models

# Parakeet model
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
tar xjf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2

# Silero VAD
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

Troubleshooting

Hotkey Not Working

# Try a different hotkey in config.toml
combination = "ctrl+shift+space"

# Or use a function key
combination = "f24"

Audio Issues

# List audio devices
arecord -l

# Test recording
arecord -d 3 test.wav && aplay test.wav

Library Not Found

# Always use the wrapper script
./tomchat

# Or set LD_LIBRARY_PATH manually
LD_LIBRARY_PATH=./target/release ./target/release/tomchat

Debug Mode

RUST_LOG=debug ./tomchat

Project Structure

tomchat/
├── src/
│   ├── main.rs           # Entry point
│   ├── app.rs            # Application logic
│   ├── config.rs         # Configuration handling
│   ├── audio/
│   │   ├── mod.rs        # Audio capture
│   │   └── vad.rs        # Voice activity detection
│   ├── speech/
│   │   └── transcriber.rs # Parakeet transcription
│   └── text_refinement/  # Optional Ollama integration
├── scripts/
│   └── download-parakeet.sh
├── config.toml           # Default configuration
├── Cargo.toml
└── README.md

Performance

Metric	Value
Startup	~2-3s
Transcription	0.5-2s
Memory	~1.5GB
Model Size	~200MB (INT8)

Tech Stack

sherpa-rs - Rust bindings for sherpa-onnx
cpal - Cross-platform audio
global-hotkey - System-wide hotkeys
enigo - Text injection
tokio - Async runtime

License

MIT License - see LICENSE

Named after my dog Tommy - Built with Rust + sherpa-rs

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dist		dist
include		include
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
config-backup.toml		config-backup.toml
config.toml		config.toml
test-config.toml		test-config.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TomChat

Features

Requirements

Quick Start

1. Install Dependencies

2. Clone and Build

3. Download Models

4. Run

5. Use

Configuration

Environment Variables

Text Refinement (Optional)

Manual Model Download

Troubleshooting

Hotkey Not Working

Audio Issues

Library Not Found

Debug Mode

Project Structure

Performance

Tech Stack

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TomChat

Features

Requirements

Quick Start

1. Install Dependencies

2. Clone and Build

3. Download Models

4. Run

5. Use

Configuration

Environment Variables

Text Refinement (Optional)

Manual Model Download

Troubleshooting

Hotkey Not Working

Audio Issues

Library Not Found

Debug Mode

Project Structure

Performance

Tech Stack

Related

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages