-
Notifications
You must be signed in to change notification settings - Fork 1
Home
🌐 Language: English | Français
Linux voice dictation with local speech recognition, optional translation, and full KDE Plasma 6 / GNOME integration. Supports 4 ASR backends (Parakeet-TDT, Canary, faster-whisper, Vosk) across 25+ languages.
This wiki is the technical companion to the README. The README covers what dictee is, how to install it, and typical usage. The wiki dives deeper into configuration, backend internals, post-processing pipeline, troubleshooting, and contribution.
I want to install it → Installation · Setup-Wizard · GPU-Setup
I want to understand how it works → ASR-Backends · Post-Processing-Overview · Translation · Configuration
I want to contribute or build from source → Developer-Guide · Building from source · Testing
-
Installation — 1-liner,
.deb/.rpm/AUR/tarball, aarch64/Jetson, non-packaged distros - Setup-Wizard — first-run 8-step guided flow with end-to-end GIF walkthrough
- Configuration — tab-by-tab reference of the dictee-setup UI (all backends, PP, UI options)
- Plasmoid-Widget — KDE Plasma 6 widget, 5 animation styles, advanced settings
- Tray-Icon — system tray menu, light/dark themes, GNOME/Ubuntu (AppIndicator)
- Keyboard-Shortcuts — KDE/GNOME shortcut capture, tiling WMs (Sway/i3/Hyprland), double shortcut
- Voice-Commands — every voice command per language + the floating cheatsheet (Ctrl+Alt+F9)
- GPU-Setup — CUDA prerequisites per distro, cuDNN, GPU detection, CPU fallback
- ASR-Backends — comparison table, when to choose each backend
- Parakeet-TDT-Deep-Dive — main model, 25 languages, VRAM limits, 10–15 min caps
- Canary-1B-Deep-Dive — encoder-decoder with built-in translation, 7 languages, best accuracy
- Translation — 5 backends compared (Canary built-in, LibreTranslate, Ollama, Google, Bing)
- Ollama-Setup — install, recommended models (Gemma 3 4B), structured prompts
- Post-Processing-Overview — full pipeline, step ordering, diagram
- Rules-and-Dictionary — regex rules, dictionary, ASR variants, 7 languages
- LLM-Correction — Ollama position (first/last/hybrid), prompts
- Numbers-Dates-Continuation — cardinal/ordinal/versions/decimals/times + continuation buffer
- Diarization — Sortformer, up to 4 speakers, VRAM limits, batch chunked (v1.4)
-
CLI-Reference — every command:
dictee,dictee-switch-backend, Rust binaries
- Troubleshooting — common errors, GPU OOM, logs, daemon socket
- FAQ — why Rust? why not Whisper streaming? multi-user support?
- Developer-Guide — build, architecture, tests, contribution workflow
- Changelog — version history (mirrors GitHub release notes)
- Repository: https://github.com/rcspam/dictee
- Latest release: https://github.com/rcspam/dictee/releases/latest
- Issues: https://github.com/rcspam/dictee/issues
- Discussions: https://github.com/rcspam/dictee/discussions
Languages: this wiki is available in English and Français — every page has a language switcher at the top.
README.fr.mdis the canonical French documentation for the project overview.
Getting started / Premiers pas
- Installation · 🇬🇧 · 🇫🇷
- Setup-Wizard · 🇬🇧 · 🇫🇷
- Configuration · 🇬🇧 · 🇫🇷
- Plasmoid-Widget · 🇬🇧 · 🇫🇷
- Tray-Icon · 🇬🇧 · 🇫🇷
- Keyboard-Shortcuts · 🇬🇧 · 🇫🇷
- Voice-Commands · 🇬🇧 · 🇫🇷
- GPU-Setup · 🇬🇧 · 🇫🇷
- Diarization · 🇬🇧 · 🇫🇷
- LLM-Diarization · 🇬🇧 · 🇫🇷
Speech recognition / ASR
Translation / Traduction
Post-processing / Post-traitement
- Overview · 🇬🇧 · 🇫🇷
- Rules-and-Dictionary · 🇬🇧 · 🇫🇷
- LLM-Correction · 🇬🇧 · 🇫🇷
- Numbers-Dates-Continuation · 🇬🇧 · 🇫🇷
CLI
Reference / Référence
🏠 Repo · 📦 Releases · 🐛 Issues