GitHub - splitlabs/dictx: Dictx — Local voice-to-text with Obsidian integration

Privacy-first desktop speech-to-text that runs entirely on your machine.

Installation · How It Works · Obsidian Integration · CLI · Build from Source · Contributing

Free and open source. Dictx is GPL-3.0 licensed — you can build it from source with full functionality. Buy Dictx Pro ($29 one-time) for signed builds, in-app auto-updates, and priority support.

Dictx is a cross-platform desktop application for speech transcription. Press a shortcut, speak, and your words appear in any text field — no cloud, no API keys, no data leaving your computer.

Built with Tauri (Rust + React/TypeScript). Forked from Handy by cjpais.

Features

100% Local — All processing happens on-device. Zero network requests for transcription.
Multiple Models — Choose from Whisper (Small/Medium/Turbo/Large) with GPU acceleration or Parakeet V3 for CPU-optimized inference.
Voice Activity Detection — Automatic silence filtering with Silero VAD.
Obsidian Integration — Export transcriptions as markdown notes with YAML frontmatter, daily note appending, and configurable folder structure.
Post-Processing — Optional LLM-based cleanup, summarization, or reformatting of transcriptions.
Cross-Platform — macOS (Intel + Apple Silicon), Windows (x64), Linux (x64).
CLI Control — Toggle recording, cancel operations, and configure startup behavior from the command line.
i18n — Localized in 17 languages.

Installation

Download the latest release for your platform from the Releases page.

Platform	Format
macOS	`.dmg`
Windows	`.msi`
Linux	`.AppImage` / `.deb`

After installation:

Launch Dictx and grant the required permissions (microphone, accessibility on macOS)
Select and download a transcription model
Configure your keyboard shortcut in Settings
Start transcribing

To build from source, see BUILD.md.

How It Works

Press a configurable keyboard shortcut (toggle or push-to-talk)
Speak — Dictx records and filters silence in real-time
Release — audio is processed locally through your chosen model
Done — transcribed text is pasted into the active text field

Models

Model	Type	Speed	Accuracy	Requirements
Whisper Small	GPU	Fast	Good	GPU recommended
Whisper Medium	GPU	Moderate	Better	GPU recommended
Whisper Turbo	GPU	Fast	Very Good	GPU recommended
Whisper Large	GPU	Slower	Best	GPU required
Parakeet V3	CPU	Fast	Very Good	CPU only, auto language detection

Obsidian Integration

Dictx can automatically export every transcription to your Obsidian vault. Configure in Settings > Advanced > Obsidian Integration:

Vault Path — Select your Obsidian vault root folder
Subfolder — Target folder within your vault (default: voice-notes)
Append to Daily Note — Add a timestamped reference to today's daily note

Exported notes include YAML frontmatter (timestamp, duration, word count, source) and optionally embed the raw transcription in a collapsible callout when post-processing is enabled.

CLI

Dictx supports command-line flags for remote control and startup configuration.

Control a running instance:

dictx --toggle-transcription    # Start/stop recording
dictx --toggle-post-process     # Start/stop with post-processing
dictx --cancel                  # Cancel current operation

Startup options:

dictx --start-hidden            # Launch without showing the window
dictx --no-tray                 # Launch without system tray icon
dictx --debug                   # Enable verbose logging

Combine flags for autostart scenarios: dictx --start-hidden --no-tray

macOS: When installed as an app bundle, invoke the binary directly:
/Applications/Dictx.app/Contents/MacOS/Dictx --toggle-transcription

Architecture

src-tauri/src/          Rust backend
├── lib.rs              App entry point, Tauri setup
├── managers/           Core logic (audio, model, transcription, history)
├── audio_toolkit/      Audio recording, resampling, VAD
├── commands/           Tauri command handlers
├── shortcut.rs         Global keyboard shortcuts
├── settings.rs         Settings management
├── tray.rs             System tray
└── obsidian_export.rs  Obsidian vault integration

src/                    React/TypeScript frontend
├── App.tsx             Main component
├── components/         UI (settings, onboarding, sidebar)
├── stores/             Zustand state management
├── hooks/              React hooks
└── i18n/               17 language translations

Key dependencies: whisper-rs, transcribe-rs (Parakeet), cpal, vad-rs, rdev, rubato

Platform Notes

macOS

Metal acceleration for Whisper models
Requires Accessibility and Microphone permissions
Debug mode: Cmd+Shift+D

Windows

Vulkan acceleration for Whisper models
Debug mode: Ctrl+Shift+D

Linux

Requires a text input tool for reliable typing:

Display Server	Tool	Install
X11	`xdotool`	`sudo apt install xdotool`
Wayland	`wtype`	`sudo apt install wtype`
Both	`dotool`	`sudo apt install dotool`

Recording overlay disabled by default (compositors may treat it as active window)
If issues occur, try: WEBKIT_DISABLE_DMABUF_RENDERER=1 dictx
Wayland shortcuts must be configured through your DE using CLI flags

Troubleshooting

Manual Model Installation

If you're behind a proxy or firewall, models can be installed manually. Download the model files and place them in the app data directory:

Platform	Path
macOS	`~/Library/Application Support/com.0xnyk.dictx/`
Windows	`C:\Users\{username}\AppData\Roaming\com.0xnyk.dictx\`
Linux	`~/.config/com.0xnyk.dictx/`

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

License

GPL-3.0. See LICENSE for details and NOTICE for attribution and license history.

Originally created by cjpais as Handy (MIT). Forked and extended by 0xNyk under GPL-3.0.

Acknowledgments

Handy by cjpais — the upstream project
Whisper by OpenAI — speech recognition models
whisper.cpp / ggml — cross-platform inference
Silero VAD — voice activity detection
Tauri — desktop application framework

Name		Name	Last commit message	Last commit date
Latest commit History 630 Commits
.cargo		.cargo
.github		.github
.vscode		.vscode
assets		assets
docs/commercial		docs/commercial
landing		landing
scripts		scripts
sponsor-images		sponsor-images
src-tauri		src-tauri
src		src
tests		tests
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
BUILD.md		BUILD.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_TRANSLATIONS.md		CONTRIBUTING_TRANSLATIONS.md
CRUSH.md		CRUSH.md
HANDOFF.md		HANDOFF.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
bun.lock		bun.lock
eslint.config.js		eslint.config.js
flake.lock		flake.lock
flake.nix		flake.nix
index.html		index.html
package.json		package.json
playwright.config.ts		playwright.config.ts
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Installation

How It Works

Models

Obsidian Integration

CLI

Architecture

Platform Notes

macOS

Windows

Linux

Troubleshooting

Manual Model Installation

Contributing

License

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

How It Works

Models

Obsidian Integration

CLI

Architecture

Platform Notes

macOS

Windows

Linux

Troubleshooting

Manual Model Installation

Contributing

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages