Privacy-first meeting transcription and voice-to-text tool for Linux
HushNote is a local-only, offline-capable voice transcription and meeting summarization tool. All processing happens on your machine using local AI models—no cloud services, no data sharing, complete privacy.
- 🎙️ Audio Recording: Capture system audio and microphone input using PulseAudio/PipeWire
- 📝 Speech-to-Text Transcription: Convert audio to text using faster-whisper (offline)
- 👥 Speaker Diarization: Identify who spoke when with interactive speaker labeling
- 🤖 AI Summarization: Generate meeting notes, action items, and summaries using Ollama
- 🔒 100% Private: All processing happens locally on your machine
- ⚡ GPU Acceleration: Support for AMD ROCm and NVIDIA CUDA
- 📊 Multiple Output Formats: TXT, JSON, SRT, VTT for transcripts; Markdown, JSON for summaries
- 🎯 Flexible Model Selection: Choose Whisper model size and Ollama model based on your needs
- 🔄 Complete Workflow: Single command to record, diarize, transcribe, and summarize meetings
- Meeting Notes: Record meetings and automatically generate summaries with action items
- Interview Transcription: Convert interviews to searchable text
- Lecture Notes: Transcribe educational content for study
- Documentation: Create written records of verbal discussions
- Accessibility: Generate captions and transcripts for audio content
✅ Zero External Services: No API calls, no cloud uploads, no telemetry ✅ Local Processing: Whisper and Ollama run entirely on your hardware ✅ No Internet Required: Works completely offline after initial setup ✅ Your Data Stays Yours: Recordings never leave your machine ✅ Open Source: Fully auditable code
- Linux (tested on CachyOS/Arch)
- Python 3.10+
- ffmpeg
- PulseAudio or PipeWire
- Ollama (for summarization)
- 4GB+ RAM (8GB+ recommended for larger models)
- Optional: GPU with ROCm or CUDA for acceleration
# Clone the repository
git clone https://github.com/peteonrails/hushnote.git
cd hushnote
# Install system dependencies (Arch/CachyOS)
yay -S ffmpeg pulseaudio-utils python
# Or for PipeWire
yay -S ffmpeg pipewire-pulse python
# Create Python virtual environment
python -m venv venv
# Install Python dependencies
./venv/bin/pip install -r requirements.txt
# Install and start Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b # or any other model
# Make scripts executable (already done in repo)
chmod +x hushnote record_audio.sh transcribe.py summarize.py
# Test installation
./hushnote --help# Complete workflow (stop with Ctrl+C when done)
./hushnote full
# Or with a fixed duration (1 hour)
./hushnote full -d 3600
# With speaker diarization (identify who spoke when)
./hushnote full --diarize --speakers 3
# Note: Pressing Ctrl+C during recording will stop recording
# and automatically continue with transcription and summarization# Record until Ctrl+C
./hushnote record
# Record for 30 minutes
./hushnote record -d 1800# Basic transcription
./hushnote transcribe recording.wav
# With specific model and format
./hushnote transcribe recording.wav -m small -f srt# Generate meeting summary
./hushnote summarize transcript.txt
# Use specific Ollama model
./hushnote summarize transcript.txt -o qwen2.5:14b# Identify speakers in existing recording
./hushnote diarize recording.wav --speakers 3
# Label speakers with names
./hushnote label recording_diarized.jsonSee DIARIZATION.md for complete speaker diarization guide.
./hushnote listUsage: ./hushnote [COMMAND] [OPTIONS]
Commands:
record Start recording a meeting
transcribe FILE Transcribe an audio file
summarize FILE Summarize a transcription
diarize FILE Identify speakers in an audio file
label FILE Label speakers with their names (interactive)
apply-labels FILE Apply speaker labels to create final transcript
full Complete workflow: record, transcribe, and summarize
process FILE Process existing recording through full workflow
process-last Process most recent recording through full workflow
list List all recordings
Options:
-d, --duration SEC Recording duration in seconds
-m, --model MODEL Whisper model size (tiny|base|small|medium|large-v3)
-o, --ollama MODEL Ollama model for summarization
-f, --format FMT Output format (txt|json|srt|vtt|md)
-s, --speakers NUM Number of speakers (for diarization)
--diarize Enable speaker diarization in full workflow
-h, --help Show help
Environment Variables:
RECORDINGS_DIR Directory for recordings (default: ./recordings)
WHISPER_MODEL Default Whisper model (default: base)
OLLAMA_MODEL Default Ollama model (default: llama3.1:8b)
OLLAMA_URL Ollama API URL (default: http://localhost:11434)
HF_TOKEN HuggingFace API token (for speaker diarization)
Choose based on accuracy vs. speed tradeoff:
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
tiny |
75 MB | ~10-20x realtime | Basic | Quick drafts, testing |
base |
150 MB | ~5-10x realtime | Good | Default, balanced |
small |
500 MB | ~2-5x realtime | Better | Accurate transcription |
medium |
1.5 GB | ~1-2x realtime | High | Professional use |
large-v3 |
3 GB | ~0.5-1x realtime | Best | Maximum accuracy |
Models download automatically on first use.
Popular models for summarization:
llama3.1:8b- Balanced performance (default)llama3.2:1b- Fastest, basic summariesmistral:7b- Good quality, fastqwen2.5:14b- Better qualitymixtral:8x7b- Best quality (slower)
Install models: ollama pull model-name
AMD (ROCm):
./venv/bin/pip install faster-whisper[gpu]
./transcribe.py audio.wav -d cudaNVIDIA (CUDA):
./venv/bin/pip install faster-whisper[gpu]
./transcribe.py audio.wav -d cuda# List available audio sources
pactl list sources short
# Set default microphone
pactl set-default-source SOURCE_NAME# High-quality recording with best summarization
./hushnote full -m medium -o mixtral:8x7b# Fast transcription to clipboard
./hushnote record -d 60
./hushnote transcribe recordings/meeting_*.wav -m tiny# Record interview
./hushnote record -o interview.wav
# Generate subtitle file
./hushnote transcribe interview.wav -m small -f srt# Transcribe multiple files
for file in recordings/*.wav; do
./hushnote transcribe "$file" -m base
doneHushNote organizes files predictably:
recordings/
├── meeting_20251005_143022.wav # Audio recording
├── meeting_20251005_143022.txt # Transcription (text)
├── meeting_20251005_143022.json # Transcription (with metadata)
├── meeting_20251005_143022.srt # Subtitles
└── meeting_20251005_143022_summary.md # Meeting summary
This is a sample transcription of the meeting audio. The Whisper model
processes the audio and converts speech to text with high accuracy.
# Meeting Summary
## Summary
Discussion of Q4 project goals and resource allocation. Team agreed on
timeline and deliverables for the upcoming sprint.
## Key Discussion Points
- Project timeline needs acceleration
- Additional resources required for frontend development
- Database migration scheduled for next week
## Action Items
- [ ] John to hire 2 frontend developers by end of month
- [ ] Sarah to prepare migration runbook by Friday
- [ ] Team to review architecture docs before Monday standup
## Decisions Made
- Approved budget increase for contractor hiring
- Selected PostgreSQL for new database backend
- Weekly sync meetings moved to Tuesdays at 2pmReal-time voice transcription to clipboard for instant pasting:
# Capture voice, transcribe, copy to clipboard
./hushnote voice-to-clipboard
# Or with hotkey binding
./hushnote voice-to-clipboard --hotkey "Super+Alt+V"Planned capabilities:
- Press hotkey to start recording
- Speak naturally
- Release hotkey to stop
- Transcribed text automatically copied to clipboard (cliphist)
- Paste anywhere with Ctrl+V
- Ideal for: emails, documents, chat messages, code comments
Live transcription that streams directly to cursor position:
# Stream transcription in real-time
./hushnote stream-to-cursorPlanned capabilities:
- Real-time speech-to-text streaming
- Text appears at cursor as you speak
- Works in any application
- Streaming API via named pipe for GUI integration
- Low-latency (~100-500ms)
- Ideal for: live note-taking, writing assistance, accessibility
Waybar Widget:
- Click to toggle recording
- Visual indicator when recording/processing
- Quick access to recent transcriptions
- Status display (model loaded, ready, processing)
- Settings menu for model selection
System Tray Integration:
- Global hotkey support
- Background service mode
- Notification on transcription complete
- Quick copy/paste options
Standalone GUI:
- Electron or Tauri-based desktop app
- Drag-and-drop audio file processing
- Live waveform visualization
- Edit transcriptions inline
- Export in multiple formats
- Search across all transcriptions
- Zoom/Teams Integration: Automatic speaker labeling from meeting APIs
- Multi-language Support: Auto-detect and transcribe multiple languages
- Custom Vocabulary: Add domain-specific terms for better accuracy
- Noise Reduction: Pre-process audio to improve transcription quality
- Timestamps & Chapters: Automatic chapter markers for long recordings
- Integration APIs: RESTful API for third-party integration
- Mobile Companion: Android/Linux mobile app for on-the-go recording
- Live Streaming: Transcribe Zoom/Teams meetings in real-time
- Search & Index: Full-text search across all transcriptions
Typical performance on modern hardware (Ryzen 7/i7, 16GB RAM):
| Task | Model | Duration | Processing Time |
|---|---|---|---|
| Transcription | tiny | 1 hour | ~3-6 minutes |
| Transcription | base | 1 hour | ~6-12 minutes |
| Transcription | small | 1 hour | ~12-30 minutes |
| Summarization | llama3.1:8b | 1 hour transcript | ~1-2 minutes |
With GPU acceleration, transcription is 3-5x faster.
# Make sure you're using the virtual environment
./venv/bin/pip install faster-whisper# Check audio sources
pactl list sources short
# Set default source
pactl set-default-source YOUR_SOURCE_NAME
# Test with direct recording
./record_audio.sh -d 5# Check if Ollama is running
systemctl status ollama
# Or start manually
ollama serve
# Verify models are installed
ollama list- Use a smaller Whisper model (
tinyorbase) - Enable GPU acceleration if available
- Close other resource-intensive applications
Faster-whisper works with Python 3.13. If you encounter issues with other packages, use Python 3.11 or 3.12.
┌─────────────────┐
│ Audio Input │ (PulseAudio/PipeWire)
└────────┬────────┘
│
▼
┌─────────────────┐
│ record_audio │ (ffmpeg)
│ .sh │
└────────┬────────┘
│
▼
┌─────────────────┐
│ .wav file │
└────────┬────────┘
│
▼
┌─────────────────┐
│ transcribe.py │ (faster-whisper)
└────────┬────────┘
│
▼
┌─────────────────┐
│ .txt/.json/.srt │
└────────┬────────┘
│
▼
┌─────────────────┐
│ summarize.py │ (Ollama)
└────────┬────────┘
│
▼
┌─────────────────┐
│ summary.md │
└─────────────────┘
Contributions welcome! Areas of interest:
- GUI development (Waybar widget, system tray, desktop app)
- Real-time streaming transcription
- Speaker diarization
- Multi-language support
- Performance optimization
- Documentation improvements
- Testing on different Linux distributions
Please open an issue before starting major work.
MIT License - see LICENSE file for details
HushNote is built on top of excellent open-source projects:
- faster-whisper - Fast Whisper implementation
- Whisper - OpenAI's speech recognition model
- Ollama - Local LLM runtime
- ffmpeg - Audio processing
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
HushNote - Because your words are your business. 🤫