Skip to content

lloyd/whsprland

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whsprland

Minimal, local speech-to-text for Hyprland. Press a key, speak, paste.

Super+Z → speak → Super+Z → review → Super+Z → pasted

Features

  • 100% local - Whisper.cpp, no cloud, no API keys
  • Two-stage transcription - Fast preview while speaking, accurate final result
  • Real-time feedback - See your words as you speak
  • Single hotkey - Super+Z for everything
  • FFT spectrum visualizer - 80-band frequency display (optional, requires numpy)
  • QuickShell overlay - Native Wayland UI for Hyprland

Quick Start

# Install dependencies (Arch Linux)
yay -S whisper.cpp pipewire wl-clipboard ydotool ffmpeg

# Optional: for spectrum visualizer
yay -S python python-numpy

# Clone and run setup
git clone https://github.com/youruser/whsprland.git
cd whsprland
./setup

Disk space required: ~1.6GB for whisper models, ~5MB for the application.

The setup will:

  1. Check prerequisites (including ydotool permissions)
  2. Download whisper models with checksum verification
  3. Build spectrum analyzer (if Rust is available)
  4. Install daemon to ~/.local/bin/
  5. Enable systemd service
  6. Install QuickShell integration
  7. Configure Hyprland keybind

Other Distributions

The core dependencies are available on most Linux distributions:

  • Fedora: dnf install whisper-cpp pipewire wl-clipboard ydotool ffmpeg
  • Ubuntu/Debian: whisper.cpp may need to be built from source
  • Nix: nix-shell -p whisper-cpp pipewire wl-clipboard ydotool ffmpeg

Usage

Action Key
Start recording Super+Z
Stop & transcribe Super+Z
Confirm & paste Super+Z
Cancel Esc or close button
Copy only Click "Copy"

How It Works

┌─────────────────────────────────────────────────────────────┐
│  Super+Z                                                    │
│     │                                                       │
│     ▼                                                       │
│  ┌─────────┐    ┌──────────┐    ┌────────┐    ┌──────────┐ │
│  │Recording│───▶│Processing│───▶│ Review │───▶│  Pasted  │ │
│  │         │    │          │    │        │    │          │ │
│  │ tiny.en │    │medium.en │    │ Edit?  │    │ Ctrl+V   │ │
│  │ preview │    │  final   │    │        │    │          │ │
│  └─────────┘    └──────────┘    └────────┘    └──────────┘ │
│       │                              │                      │
│       │         Super+Z              │                      │
│       └──────────────────────────────┘                      │
└─────────────────────────────────────────────────────────────┘

Two-stage transcription:

  • While recording: ggml-tiny.en runs every 0.5s for instant preview
  • After stop: ggml-medium.en processes full audio for accuracy

Dependencies

Required:

  • whisper.cpp - Speech recognition (whisper-cli)
  • pipewire - Audio capture (pw-record)
  • wl-clipboard - Clipboard (wl-copy)
  • ydotool - Keyboard simulation
  • ffmpeg - Audio processing

For spectrum visualizer (one of):

  • rust - Builds native FFT analyzer (~1ms startup, 1.3MB binary) ← recommended
  • python + numpy - Fallback (~80ms startup)
  • Neither - Simple volume meter

Optional:

  • quickshell - UI overlay

Files

~/.local/bin/whsprland                    # Daemon
~/.local/share/whsprland/models/          # Whisper models (ggml-tiny.en.bin, ggml-medium.en.bin)
~/.local/lib/whsprland/spectrum-analyzer  # FFT analyzer (if built)
~/.config/systemd/user/whsprland.service  # Systemd service
~/.config/whsprland/                      # Config & transcription output
$XDG_RUNTIME_DIR/whsprland/               # Runtime files (FIFO, status, audio)

CLI Commands

whsprland daemon    # Start daemon (foreground)
whsprland start     # Start recording
whsprland stop      # Stop and transcribe
whsprland toggle    # Toggle recording
whsprland get       # Get transcription
whsprland paste     # Paste to focused window
whsprland copy      # Copy to clipboard
whsprland cancel    # Discard transcription
whsprland status    # ready|recording|transcribing|error|stopped
whsprland quit      # Stop daemon

Examples

# Check daemon status
$ whsprland status
ready

# Manual recording flow
$ whsprland start
$ sleep 3
$ whsprland stop
$ whsprland get
Hello, this is a test transcription.

# Get spectrum data (during recording)
$ whsprland spectrum
0.42,0.51,0.38,0.29,0.55,0.61,...

QuickShell Integration

The setup installs QuickShell files to your config. If you have a custom setup:

  1. Copy quickshell/services/Hyprwhspr.qml to your services directory
  2. Copy quickshell/modules/hyprwhspr/ to your modules directory
  3. Import in your shell.qml:
import "modules/hyprwhspr"
  1. Add Hyprland keybind:
bind = SUPER, Z, exec, qs ipc call hyprwhspr activate

Troubleshooting

Check service status:

systemctl --user status whsprland
journalctl --user -u whsprland -f

Restart after changes:

systemctl --user restart whsprland

Microphone not working:

# Check audio devices
pactl list short sources

# Ensure mic is not muted
wpctl set-mute @DEFAULT_AUDIO_SOURCE@ 0

# Test recording directly
pw-record --rate=16000 --channels=1 /tmp/test.wav
# Ctrl+C after speaking, then play back:
pw-play /tmp/test.wav

ydotool permission denied:

# Add yourself to input group
sudo usermod -aG input $USER
# Log out and back in for changes to take effect

UI shows but no waveform/transcription:

# QuickShell may not have ~/.local/bin in PATH
# Check if daemon can be found:
~/.local/bin/whsprland status

# If the issue persists, the service file uses full paths
# so this is likely a QuickShell integration issue

"XDG_RUNTIME_DIR not set" error:

# This is required for secure runtime files
# Usually set by your login manager (systemd, elogind)
echo $XDG_RUNTIME_DIR  # Should show /run/user/YOUR_UID

Transcription is empty or fails:

# Check if models exist
ls -la ~/.local/share/whsprland/models/

# Re-download models if missing
./setup  # Will offer to download missing models

QuickShell import errors:

If you see QML errors about missing imports, your QuickShell config may have a different module structure. Check the imports at the top of Hyprwhspr.qml and adjust to match your setup.

Uninstall

./setup uninstall

Performance

Realtime factors (RTF) are shown in logs (journalctl --user -u whsprland -f):

(45.2x) Realtime [2.34s]: Hello this is a test...
(8.5x) Final [5.67s in 0.67s]: Hello, this is a test of the transcription.

RTF explained: A value like 8.5x means the model processed 8.5 seconds of audio per second of compute time. Higher is faster. Values above 1x mean faster-than-realtime processing.

  • Realtime (tiny.en): Typically 20-50x on modern CPUs
  • Final (medium.en): Typically 5-15x on CPU, 50-100x with GPU

GPU acceleration (CUDA/Vulkan) significantly improves speed. Build whisper.cpp with GPU support for best results.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors