Privacy-first desktop speech-to-text that runs entirely on your machine.
Installation · How It Works · Obsidian Integration · CLI · Build from Source · Contributing
Free and open source. Dictx is GPL-3.0 licensed — you can build it from source with full functionality. Buy Dictx Pro ($29 one-time) for signed builds, in-app auto-updates, and priority support.
Dictx is a cross-platform desktop application for speech transcription. Press a shortcut, speak, and your words appear in any text field — no cloud, no API keys, no data leaving your computer.
Built with Tauri (Rust + React/TypeScript). Forked from Handy by cjpais.
- 100% Local — All processing happens on-device. Zero network requests for transcription.
- Multiple Models — Choose from Whisper (Small/Medium/Turbo/Large) with GPU acceleration or Parakeet V3 for CPU-optimized inference.
- Voice Activity Detection — Automatic silence filtering with Silero VAD.
- Obsidian Integration — Export transcriptions as markdown notes with YAML frontmatter, daily note appending, and configurable folder structure.
- Post-Processing — Optional LLM-based cleanup, summarization, or reformatting of transcriptions.
- Cross-Platform — macOS (Intel + Apple Silicon), Windows (x64), Linux (x64).
- CLI Control — Toggle recording, cancel operations, and configure startup behavior from the command line.
- i18n — Localized in 17 languages.
Download the latest release for your platform from the Releases page.
| Platform | Format |
|---|---|
| macOS | .dmg |
| Windows | .msi |
| Linux | .AppImage / .deb |
After installation:
- Launch Dictx and grant the required permissions (microphone, accessibility on macOS)
- Select and download a transcription model
- Configure your keyboard shortcut in Settings
- Start transcribing
To build from source, see BUILD.md.
- Press a configurable keyboard shortcut (toggle or push-to-talk)
- Speak — Dictx records and filters silence in real-time
- Release — audio is processed locally through your chosen model
- Done — transcribed text is pasted into the active text field
| Model | Type | Speed | Accuracy | Requirements |
|---|---|---|---|---|
| Whisper Small | GPU | Fast | Good | GPU recommended |
| Whisper Medium | GPU | Moderate | Better | GPU recommended |
| Whisper Turbo | GPU | Fast | Very Good | GPU recommended |
| Whisper Large | GPU | Slower | Best | GPU required |
| Parakeet V3 | CPU | Fast | Very Good | CPU only, auto language detection |
Dictx can automatically export every transcription to your Obsidian vault. Configure in Settings > Advanced > Obsidian Integration:
- Vault Path — Select your Obsidian vault root folder
- Subfolder — Target folder within your vault (default:
voice-notes) - Append to Daily Note — Add a timestamped reference to today's daily note
Exported notes include YAML frontmatter (timestamp, duration, word count, source) and optionally embed the raw transcription in a collapsible callout when post-processing is enabled.
Dictx supports command-line flags for remote control and startup configuration.
Control a running instance:
dictx --toggle-transcription # Start/stop recording
dictx --toggle-post-process # Start/stop with post-processing
dictx --cancel # Cancel current operationStartup options:
dictx --start-hidden # Launch without showing the window
dictx --no-tray # Launch without system tray icon
dictx --debug # Enable verbose loggingCombine flags for autostart scenarios: dictx --start-hidden --no-tray
macOS: When installed as an app bundle, invoke the binary directly:
/Applications/Dictx.app/Contents/MacOS/Dictx --toggle-transcription
src-tauri/src/ Rust backend
├── lib.rs App entry point, Tauri setup
├── managers/ Core logic (audio, model, transcription, history)
├── audio_toolkit/ Audio recording, resampling, VAD
├── commands/ Tauri command handlers
├── shortcut.rs Global keyboard shortcuts
├── settings.rs Settings management
├── tray.rs System tray
└── obsidian_export.rs Obsidian vault integration
src/ React/TypeScript frontend
├── App.tsx Main component
├── components/ UI (settings, onboarding, sidebar)
├── stores/ Zustand state management
├── hooks/ React hooks
└── i18n/ 17 language translations
Key dependencies: whisper-rs, transcribe-rs (Parakeet), cpal, vad-rs, rdev, rubato
- Metal acceleration for Whisper models
- Requires Accessibility and Microphone permissions
- Debug mode:
Cmd+Shift+D
- Vulkan acceleration for Whisper models
- Debug mode:
Ctrl+Shift+D
Requires a text input tool for reliable typing:
| Display Server | Tool | Install |
|---|---|---|
| X11 | xdotool |
sudo apt install xdotool |
| Wayland | wtype |
sudo apt install wtype |
| Both | dotool |
sudo apt install dotool |
- Recording overlay disabled by default (compositors may treat it as active window)
- If issues occur, try:
WEBKIT_DISABLE_DMABUF_RENDERER=1 dictx - Wayland shortcuts must be configured through your DE using CLI flags
If you're behind a proxy or firewall, models can be installed manually. Download the model files and place them in the app data directory:
| Platform | Path |
|---|---|
| macOS | ~/Library/Application Support/com.0xnyk.dictx/ |
| Windows | C:\Users\{username}\AppData\Roaming\com.0xnyk.dictx\ |
| Linux | ~/.config/com.0xnyk.dictx/ |
Contributions are welcome. See CONTRIBUTING.md for guidelines.
GPL-3.0. See LICENSE for details and NOTICE for attribution and license history.
Originally created by cjpais as Handy (MIT). Forked and extended by 0xNyk under GPL-3.0.
- Handy by cjpais — the upstream project
- Whisper by OpenAI — speech recognition models
- whisper.cpp / ggml — cross-platform inference
- Silero VAD — voice activity detection
- Tauri — desktop application framework
