whsprland

Minimal, local speech-to-text for Hyprland. Press a key, speak, paste.

Super+Z → speak → Super+Z → review → Super+Z → pasted

Features

100% local - Whisper.cpp, no cloud, no API keys
Two-stage transcription - Fast preview while speaking, accurate final result
Real-time feedback - See your words as you speak
Single hotkey - Super+Z for everything
FFT spectrum visualizer - 80-band frequency display (optional, requires numpy)
QuickShell overlay - Native Wayland UI for Hyprland

Quick Start

# Install dependencies (Arch Linux)
yay -S whisper.cpp pipewire wl-clipboard ydotool ffmpeg

# Optional: for spectrum visualizer
yay -S python python-numpy

# Clone and run setup
git clone https://github.com/youruser/whsprland.git
cd whsprland
./setup

Disk space required: ~1.6GB for whisper models, ~5MB for the application.

The setup will:

Check prerequisites (including ydotool permissions)
Download whisper models with checksum verification
Build spectrum analyzer (if Rust is available)
Install daemon to ~/.local/bin/
Enable systemd service
Install QuickShell integration
Configure Hyprland keybind

Other Distributions

The core dependencies are available on most Linux distributions:

Fedora: dnf install whisper-cpp pipewire wl-clipboard ydotool ffmpeg
Ubuntu/Debian: whisper.cpp may need to be built from source
Nix: nix-shell -p whisper-cpp pipewire wl-clipboard ydotool ffmpeg

Usage

Action	Key
Start recording	`Super+Z`
Stop & transcribe	`Super+Z`
Confirm & paste	`Super+Z`
Cancel	`Esc` or close button
Copy only	Click "Copy"

How It Works

┌─────────────────────────────────────────────────────────────┐
│  Super+Z                                                    │
│     │                                                       │
│     ▼                                                       │
│  ┌─────────┐    ┌──────────┐    ┌────────┐    ┌──────────┐ │
│  │Recording│───▶│Processing│───▶│ Review │───▶│  Pasted  │ │
│  │         │    │          │    │        │    │          │ │
│  │ tiny.en │    │medium.en │    │ Edit?  │    │ Ctrl+V   │ │
│  │ preview │    │  final   │    │        │    │          │ │
│  └─────────┘    └──────────┘    └────────┘    └──────────┘ │
│       │                              │                      │
│       │         Super+Z              │                      │
│       └──────────────────────────────┘                      │
└─────────────────────────────────────────────────────────────┘

Two-stage transcription:

While recording: ggml-tiny.en runs every 0.5s for instant preview
After stop: ggml-medium.en processes full audio for accuracy

Dependencies

Required:

whisper.cpp - Speech recognition (whisper-cli)
pipewire - Audio capture (pw-record)
wl-clipboard - Clipboard (wl-copy)
ydotool - Keyboard simulation
ffmpeg - Audio processing

For spectrum visualizer (one of):

rust - Builds native FFT analyzer (~1ms startup, 1.3MB binary) ← recommended
python + numpy - Fallback (~80ms startup)
Neither - Simple volume meter

Optional:

quickshell - UI overlay

Files

~/.local/bin/whsprland                    # Daemon
~/.local/share/whsprland/models/          # Whisper models (ggml-tiny.en.bin, ggml-medium.en.bin)
~/.local/lib/whsprland/spectrum-analyzer  # FFT analyzer (if built)
~/.config/systemd/user/whsprland.service  # Systemd service
~/.config/whsprland/                      # Config & transcription output
$XDG_RUNTIME_DIR/whsprland/               # Runtime files (FIFO, status, audio)

CLI Commands

whsprland daemon    # Start daemon (foreground)
whsprland start     # Start recording
whsprland stop      # Stop and transcribe
whsprland toggle    # Toggle recording
whsprland get       # Get transcription
whsprland paste     # Paste to focused window
whsprland copy      # Copy to clipboard
whsprland cancel    # Discard transcription
whsprland status    # ready|recording|transcribing|error|stopped
whsprland quit      # Stop daemon

Examples

# Check daemon status
$ whsprland status
ready

# Manual recording flow
$ whsprland start
$ sleep 3
$ whsprland stop
$ whsprland get
Hello, this is a test transcription.

# Get spectrum data (during recording)
$ whsprland spectrum
0.42,0.51,0.38,0.29,0.55,0.61,...

QuickShell Integration

The setup installs QuickShell files to your config. If you have a custom setup:

Copy quickshell/services/Hyprwhspr.qml to your services directory
Copy quickshell/modules/hyprwhspr/ to your modules directory
Import in your shell.qml:

import "modules/hyprwhspr"

Add Hyprland keybind:

bind = SUPER, Z, exec, qs ipc call hyprwhspr activate

Troubleshooting

Check service status:

systemctl --user status whsprland
journalctl --user -u whsprland -f

Restart after changes:

systemctl --user restart whsprland

Microphone not working:

# Check audio devices
pactl list short sources

# Ensure mic is not muted
wpctl set-mute @DEFAULT_AUDIO_SOURCE@ 0

# Test recording directly
pw-record --rate=16000 --channels=1 /tmp/test.wav
# Ctrl+C after speaking, then play back:
pw-play /tmp/test.wav

ydotool permission denied:

# Add yourself to input group
sudo usermod -aG input $USER
# Log out and back in for changes to take effect

UI shows but no waveform/transcription:

# QuickShell may not have ~/.local/bin in PATH
# Check if daemon can be found:
~/.local/bin/whsprland status

# If the issue persists, the service file uses full paths
# so this is likely a QuickShell integration issue

"XDG_RUNTIME_DIR not set" error:

# This is required for secure runtime files
# Usually set by your login manager (systemd, elogind)
echo $XDG_RUNTIME_DIR  # Should show /run/user/YOUR_UID

Transcription is empty or fails:

# Check if models exist
ls -la ~/.local/share/whsprland/models/

# Re-download models if missing
./setup  # Will offer to download missing models

QuickShell import errors:

If you see QML errors about missing imports, your QuickShell config may have a different module structure. Check the imports at the top of Hyprwhspr.qml and adjust to match your setup.

Uninstall

./setup uninstall

Performance

Realtime factors (RTF) are shown in logs (journalctl --user -u whsprland -f):

(45.2x) Realtime [2.34s]: Hello this is a test...
(8.5x) Final [5.67s in 0.67s]: Hello, this is a test of the transcription.

RTF explained: A value like 8.5x means the model processed 8.5 seconds of audio per second of compute time. Higher is faster. Values above 1x mean faster-than-realtime processing.

Realtime (tiny.en): Typically 20-50x on modern CPUs
Final (medium.en): Typically 5-15x on CPU, 50-100x with GPU

GPU acceleration (CUDA/Vulkan) significantly improves speed. Build whisper.cpp with GPU support for best results.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whsprland

Features

Quick Start

Other Distributions

Usage

How It Works

Dependencies

Files

CLI Commands

Examples

QuickShell Integration

Troubleshooting

Uninstall

Performance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
quickshell		quickshell
spectrum-analyzer		spectrum-analyzer
systemd		systemd
.gitignore		.gitignore
README.md		README.md
setup		setup
whsprland		whsprland

Folders and files

Latest commit

History

Repository files navigation

whsprland

Features

Quick Start

Other Distributions

Usage

How It Works

Dependencies

Files

CLI Commands

Examples

QuickShell Integration

Troubleshooting

Uninstall

Performance

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages