Minimal, local speech-to-text for Hyprland. Press a key, speak, paste.
Super+Z → speak → Super+Z → review → Super+Z → pasted
- 100% local - Whisper.cpp, no cloud, no API keys
- Two-stage transcription - Fast preview while speaking, accurate final result
- Real-time feedback - See your words as you speak
- Single hotkey - Super+Z for everything
- FFT spectrum visualizer - 80-band frequency display (optional, requires numpy)
- QuickShell overlay - Native Wayland UI for Hyprland
# Install dependencies (Arch Linux)
yay -S whisper.cpp pipewire wl-clipboard ydotool ffmpeg
# Optional: for spectrum visualizer
yay -S python python-numpy
# Clone and run setup
git clone https://github.com/youruser/whsprland.git
cd whsprland
./setupDisk space required: ~1.6GB for whisper models, ~5MB for the application.
The setup will:
- Check prerequisites (including ydotool permissions)
- Download whisper models with checksum verification
- Build spectrum analyzer (if Rust is available)
- Install daemon to
~/.local/bin/ - Enable systemd service
- Install QuickShell integration
- Configure Hyprland keybind
The core dependencies are available on most Linux distributions:
- Fedora:
dnf install whisper-cpp pipewire wl-clipboard ydotool ffmpeg - Ubuntu/Debian: whisper.cpp may need to be built from source
- Nix:
nix-shell -p whisper-cpp pipewire wl-clipboard ydotool ffmpeg
| Action | Key |
|---|---|
| Start recording | Super+Z |
| Stop & transcribe | Super+Z |
| Confirm & paste | Super+Z |
| Cancel | Esc or close button |
| Copy only | Click "Copy" |
┌─────────────────────────────────────────────────────────────┐
│ Super+Z │
│ │ │
│ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ │
│ │Recording│───▶│Processing│───▶│ Review │───▶│ Pasted │ │
│ │ │ │ │ │ │ │ │ │
│ │ tiny.en │ │medium.en │ │ Edit? │ │ Ctrl+V │ │
│ │ preview │ │ final │ │ │ │ │ │
│ └─────────┘ └──────────┘ └────────┘ └──────────┘ │
│ │ │ │
│ │ Super+Z │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Two-stage transcription:
- While recording:
ggml-tiny.enruns every 0.5s for instant preview - After stop:
ggml-medium.enprocesses full audio for accuracy
Required:
whisper.cpp- Speech recognition (whisper-cli)pipewire- Audio capture (pw-record)wl-clipboard- Clipboard (wl-copy)ydotool- Keyboard simulationffmpeg- Audio processing
For spectrum visualizer (one of):
rust- Builds native FFT analyzer (~1ms startup, 1.3MB binary) ← recommendedpython+numpy- Fallback (~80ms startup)- Neither - Simple volume meter
Optional:
quickshell- UI overlay
~/.local/bin/whsprland # Daemon
~/.local/share/whsprland/models/ # Whisper models (ggml-tiny.en.bin, ggml-medium.en.bin)
~/.local/lib/whsprland/spectrum-analyzer # FFT analyzer (if built)
~/.config/systemd/user/whsprland.service # Systemd service
~/.config/whsprland/ # Config & transcription output
$XDG_RUNTIME_DIR/whsprland/ # Runtime files (FIFO, status, audio)
whsprland daemon # Start daemon (foreground)
whsprland start # Start recording
whsprland stop # Stop and transcribe
whsprland toggle # Toggle recording
whsprland get # Get transcription
whsprland paste # Paste to focused window
whsprland copy # Copy to clipboard
whsprland cancel # Discard transcription
whsprland status # ready|recording|transcribing|error|stopped
whsprland quit # Stop daemon# Check daemon status
$ whsprland status
ready
# Manual recording flow
$ whsprland start
$ sleep 3
$ whsprland stop
$ whsprland get
Hello, this is a test transcription.
# Get spectrum data (during recording)
$ whsprland spectrum
0.42,0.51,0.38,0.29,0.55,0.61,...The setup installs QuickShell files to your config. If you have a custom setup:
- Copy
quickshell/services/Hyprwhspr.qmlto your services directory - Copy
quickshell/modules/hyprwhspr/to your modules directory - Import in your
shell.qml:
import "modules/hyprwhspr"- Add Hyprland keybind:
bind = SUPER, Z, exec, qs ipc call hyprwhspr activateCheck service status:
systemctl --user status whsprland
journalctl --user -u whsprland -fRestart after changes:
systemctl --user restart whsprlandMicrophone not working:
# Check audio devices
pactl list short sources
# Ensure mic is not muted
wpctl set-mute @DEFAULT_AUDIO_SOURCE@ 0
# Test recording directly
pw-record --rate=16000 --channels=1 /tmp/test.wav
# Ctrl+C after speaking, then play back:
pw-play /tmp/test.wavydotool permission denied:
# Add yourself to input group
sudo usermod -aG input $USER
# Log out and back in for changes to take effectUI shows but no waveform/transcription:
# QuickShell may not have ~/.local/bin in PATH
# Check if daemon can be found:
~/.local/bin/whsprland status
# If the issue persists, the service file uses full paths
# so this is likely a QuickShell integration issue"XDG_RUNTIME_DIR not set" error:
# This is required for secure runtime files
# Usually set by your login manager (systemd, elogind)
echo $XDG_RUNTIME_DIR # Should show /run/user/YOUR_UIDTranscription is empty or fails:
# Check if models exist
ls -la ~/.local/share/whsprland/models/
# Re-download models if missing
./setup # Will offer to download missing modelsQuickShell import errors:
If you see QML errors about missing imports, your QuickShell config may have a different module structure. Check the imports at the top of Hyprwhspr.qml and adjust to match your setup.
./setup uninstallRealtime factors (RTF) are shown in logs (journalctl --user -u whsprland -f):
(45.2x) Realtime [2.34s]: Hello this is a test...
(8.5x) Final [5.67s in 0.67s]: Hello, this is a test of the transcription.
RTF explained: A value like 8.5x means the model processed 8.5 seconds of audio per second of compute time. Higher is faster. Values above 1x mean faster-than-realtime processing.
- Realtime (tiny.en): Typically 20-50x on modern CPUs
- Final (medium.en): Typically 5-15x on CPU, 50-100x with GPU
GPU acceleration (CUDA/Vulkan) significantly improves speed. Build whisper.cpp with GPU support for best results.
MIT