Echo

Local‑first, Russian‑primary bilingual speech‑to‑text dictation.

Speak in Russian, English, or both at once — Echo transcribes it on‑device, in real time, and types it straight into whatever app you're using. No cloud, no account, no telemetry.

What it is

Echo is a tray‑first speech‑to‑text utility built on Tauri (Rust core, React UI) and whisper.cpp / NVIDIA Parakeet. It's a fork of cjpais/Handy, tuned for Russian as the primary language with first‑class support for the whole Slavic family and for mixing English into Russian speech.

Press a hotkey, talk, release — the text appears in your editor, terminal, chat, or browser. Everything runs offline.

Key capabilities

Smart dictation

🗣️ Bilingual code‑switching. Say "давай закоммитим этот pull request в main" and English terms stay in Latin script while Russian stays Cyrillic — no transliteration to «бамп».
🌍 Slavic family + English. Auto‑detects Russian, Ukrainian, Polish, Czech, Slovak, Slovenian, Bulgarian, Croatian and 25 European languages.
✏️ Auto‑punctuation & capitalization. Bilingual heuristics add commas, sentence case, and question marks from intonation cues — toggleable.
🔁 Self‑correction. Restate yourself mid‑sentence ("встречаемся в 2, нет, в 3 часа") and Echo keeps only the corrected version.
🔢 Spoken lists. Dictated enumerations ("first… second… third…" / "первое… второе…") become a clean numbered list.
🧠 Clean output. Voice‑activity detection trims silence, and known Whisper "silence hallucinations" (subtitle credits, "thanks for watching", …) are filtered out so they never land in your text.

Command Mode

Begin an utterance with a spoken instruction and Echo transforms the text in a single step — no app switching:

"translate to English …" / "переведи на английский …" → inserts the translation.
"make shorter …" / "сделай короче …" → condenses the phrasing.
"make formal …" / "ответь формально …" → rewrites in a professional tone.
End with "press enter" / "нажми ввод" to submit after pasting.

Transform commands run through your configured post‑processing provider (a local LLM via Ollama/llama.cpp, or any OpenAI‑compatible endpoint). Detection itself is local; opt in under Settings → Advanced → Voice commands.

Voice capture to folder

Save a dictation as a note instead of typing it. Set a capture folder and one or more comma‑separated trigger phrases under Settings → Advanced, then start an utterance with a phrase and Echo writes the rest to a timestamped markdown file instead of pasting it into the active app:

"capture note buy milk" → writes …-echo-note.md (with simple frontmatter) to your capture folder; nothing is typed into the current window.

Trigger phrases are fully configurable. Leave the capture folder empty to keep the feature off. Everything stays local — it's just a file written to a folder you choose, so it pairs naturally with an Obsidian inbox, a notes directory, or any tool that watches a folder.

Personalization

📓 Custom dictionary. Teach Echo unusual names, terms, and acronyms.
✂️ Snippets. Spoken trigger phrases expand into canned text — signatures, calendar links, boilerplate replies.
💬 Live subtitles. An optional caption under the recording orb streams the transcript as you speak, with adjustable size, length, and refresh rate.
🌐 Interface language. The UI itself ships in 20 languages.

Speech coach

📈 Delivery metrics. Every dictation is scored locally for pace (WPM), filler/crutch words (RU + EN — «э‑э», «ну», "um", "like", …), and weak/hedge words, shown inline in History and as an optional post‑dictation toast.
🗂️ Progress dashboard. A Coach tab tracks improvement over time: this‑week averages with deltas vs last week, WPM and filler‑rate trend sparklines (7d / 30d / all), and a practice streak. Each new report also shows whether you did better or worse than your rolling baseline — all on‑device.

Developer features

⌨️ Context‑aware formatting. Echo detects code editors (VS Code, Cursor, Zed, JetBrains, …) and strips trailing punctuation automatically.
🐫 camelCase trigger. Say "camel case my variable name" to get myVariableName.
🧰 Developer dictionary. An opt‑in built‑in glossary steers transcription toward the correct spelling of common tooling (GitHub, Vercel, TypeScript, Kubernetes, …).
🎬 Offline transcription (CLI). Batch a file headlessly: echo --transcribe-file talk.mp4 -o talk.txt — transcribes any audio/video (via ffmpeg) reusing the same engine and bilingual RU/EN steering as live dictation. Choose engine/language with --model / --language. See BUILD.md.

Privacy & performance

🔒 Truly local. Your audio never leaves the machine. No cloud calls, no telemetry, no account, no word limits — free and open‑source (MIT).
⚡ GPU accelerated. Whisper runs on the GPU (Vulkan — NVIDIA / AMD / Intel, auto‑detected) or Parakeet on CPU/DirectML. Pick the device in Settings.
🫥 Stealth UX. A floating orb shows recording / preparing / transcribing state without ever stealing focus from the app you're typing into.
🪶 Quiet at rest. The model unloads after an idle timeout and the overlay only receives live data while visible, so Echo stays out of the way when not in use.

Choosing a model

Echo ships several engines. For the bilingual Russian use case:

Model	Best for	Speed	Notes
Parakeet V3 (recommended)	RU + Slavic + EN, auto‑detect	⚡⚡ fast (GPU/CPU)	Great all‑rounder; weaker on heavy intra‑sentence code‑switching
Whisper Large v3 Turbo	Heavy RU↔EN code‑switching	⚡ (GPU)	Keeps English in Latin; needs GPU to be snappy
GigaAM v3	Pure Russian, max accuracy	⚡⚡	Russian‑only — no code‑switching or other languages
Whisper Small / Medium / Large	General multilingual	varies	Classic Whisper quality/size trade‑offs

Pick and download models in Settings → Models.

How it works

Press your shortcut (tap to lock, or hold to push‑to‑talk).
Speak. Silero VAD filters silence; the floating pill shows live state.
Release. The selected engine transcribes locally and refines the text.
Paste. Text is typed into the focused app (native Unicode input — works with Cyrillic regardless of keyboard layout), with code‑editor‑aware formatting.

Practical scenarios

Bilingual code review. In Cursor, dictate a comment mixing Russian prose with English identifiers — pull request, main, useEffect stay in Latin, trailing punctuation is stripped for the editor.
Instant translation. Reply to a foreign colleague: Command Mode "переведи на английский …", speak in Russian, the English text lands in the chat box.
Quick meeting notes. Dictate "first decisions, second action items, third owners" and get a formatted numbered list in Notion.
Templated support reply. A snippet calendar link expands to your full booking URL as you speak the trigger phrase.
Fast send. End any dictation with "press enter" to fire the message without touching the keyboard.

Limitations

Transform commands need an LLM provider. Translate / shorten / formal run through a post‑processing provider; configure a local model (Ollama / llama.cpp) or an OpenAI‑compatible endpoint in Settings → Advanced. Plain dictation, snippets, self‑correction, lists, and the developer dictionary are fully offline.
Self‑correction is heuristic. It collapses comma‑delimited restatements ("…, no, …") and is off by default; complex corrections may pass through.
Live subtitles cost compute. The streaming caption re‑transcribes the buffer periodically, so it's best paired with a fast engine (Parakeet / Whisper on GPU); it's off by default.

Quick start

Install

Download the latest build from the releases page.
Install and launch Echo, then grant microphone + accessibility permissions.
Open Settings → Models, download a model (Parakeet V3 to start).
Set your hotkey in Settings → General, and start dictating.

On Windows the app installs to %LOCALAPPDATA%/echo.

Build from source

Any package manager works (npm shown; pnpm / bun are fine):

npm install
npm run tauri dev      # development
npm run tauri build    # production bundle

Full instructions, including the GPU‑Whisper (Vulkan) toolchain on Windows, are in BUILD.md. Contributors: see CONTRIBUTING.md.

GPU acceleration

Whisper can run on the GPU via Vulkan, which auto‑detects any vendor's card (NVIDIA / AMD / Intel) and falls back to CPU when none is present. Choose the accelerator and device in Settings → Advanced → Hardware Acceleration (enable Experimental Features if the section is hidden). Parakeet uses DirectML on Windows.

Architecture

Frontend — React 18 + TypeScript + Tailwind CSS.
Backend — Rust core; a single‑threaded coordinator serialises the record → transcribe → paste pipeline to avoid races.
Key crates — whisper-rs (whisper.cpp, GPU), transcribe-rs (Parakeet / GigaAM / Canary / SenseVoice), vad-rs (Silero VAD), windows-rs (Win32 overlay integration).

CLI

Echo accepts flags, both to control a running instance and to customise startup (all platforms):

# control a running instance
echo --toggle-transcription      # toggle recording
echo --toggle-post-process       # toggle recording with post-processing
echo --cancel                    # cancel current operation

# startup
echo --start-hidden              # start with no window
echo --no-tray                   # start without the tray icon
echo --debug                     # verbose logging
echo --help                      # list all flags

macOS: when installed as a bundle, call the binary directly, e.g. /Applications/Echo.app/Contents/MacOS/Echo --toggle-transcription.

Debug mode toggles with Ctrl+Shift+D (Windows/Linux) or Cmd+Shift+D (macOS).

Platform support

macOS (Intel + Apple Silicon), x64 Windows, x64 Linux (Ubuntu 22.04 / 24.04).

Whisper models: any modern GPU (NVIDIA / AMD / Intel), or CPU.
Parakeet V3: CPU‑friendly (Intel Skylake / equivalent AMD or newer), ~5× real‑time on mid‑range hardware, automatic language detection.

Linux notes

Text injection needs a display‑server tool:

Display server	Tool	Install
X11	`xdotool`	`sudo apt install xdotool`
Wayland	`wtype`	`sudo apt install wtype`
Either	`dotool`	`sudo apt install dotool` (add user to `input` group)

Without one, Echo falls back to enigo, which has limited Wayland support.

Echo links gtk-layer-shell for the overlay. If startup fails with libgtk-layer-shell.so.0, install the runtime package (libgtk-layer-shell0 on Debian/Ubuntu, gtk-layer-shell on Fedora/Arch). The overlay is off by default on Linux (Overlay Position: None) because some compositors treat it as the active window and break paste‑back.

Wayland global shortcuts must be bound in your DE/WM to the CLI flags, e.g. Sway/i3 bindsym $mod+o exec echo --toggle-transcription, or via Unix signals (pkill -USR2 -n echo to toggle, -USR1 for post‑processing).

If Echo crashes on launch, try HANDY_NO_GTK_LAYER_SHELL=1 echo (skip the layer shell) or WEBKIT_DISABLE_DMABUF_RENDERER=1 echo (WebKit renderer). Make a workaround permanent by prefixing your .desktop Exec= line.

Manual model installation

If you're behind a proxy/firewall, download models by hand into the models folder of your app‑data directory (shown in Settings → About):

macOS ~/Library/Application Support/com.sovern.echo/models
Windows %APPDATA%\com.sovern.echo\models
Linux ~/.config/com.sovern.echo/models

Whisper (.bin, drop in directly):

https://blob.handy.computer/ggml-small.bin            # Small  (487 MB)
https://blob.handy.computer/whisper-medium-q4_1.bin   # Medium (492 MB)
https://blob.handy.computer/ggml-large-v3-turbo.bin   # Turbo  (1.6 GB)
https://blob.handy.computer/ggml-large-v3-q5_0.bin    # Large  (1.1 GB)

Parakeet (.tar.gz, extract the directory into models/, keep the exact name parakeet-tdt-0.6b-v3-int8):

https://blob.handy.computer/parakeet-v3-int8.tar.gz   # V3 (478 MB)

Restart Echo; the models appear in Settings → Models as Downloaded. Custom Whisper GGML .bin files dropped into models/ are auto‑discovered under Custom Models.

Model files are hosted on the upstream Handy CDN (blob.handy.computer).

Verify release signatures

Release artifacts are signed with Tauri's updater (minisign) key; the public key lives in src-tauri/tauri.conf.json under plugins.updater.pubkey. Decode that key and the artifact's .sig from base64 and verify with minisign (not gpg). Releasing is documented in RELEASING.md.

Links

Releases — download builds
Build from source — incl. the Windows GPU‑Whisper (Vulkan) toolchain
Contributing · Translations
Releasing — signing & publishing
Upstream: cjpais/Handy — the project Echo forks

Contributing

Issues and PRs welcome at github.com/master5d/Echo. Fork, branch, test on your platform, and open a PR with a clear description. See CONTRIBUTING.md.

Credits

Echo is a fork of Handy by cjpais — thank you for the foundation and for hosting the model CDN. Built on the work of:

whisper.cpp / ggml — fast cross‑platform Whisper inference
OpenAI Whisper — the speech‑recognition models
NVIDIA Parakeet / Canary and Sber GigaAM — multilingual & Russian ASR
Silero — lightweight voice‑activity detection
Tauri — the Rust app framework

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 730 Commits
.cargo		.cargo
.github		.github
.nix		.nix
.vscode		.vscode
docs		docs
nix		nix
scripts		scripts
src-tauri		src-tauri
src		src
tests		tests
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
BUILD.md		BUILD.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_TRANSLATIONS.md		CONTRIBUTING_TRANSLATIONS.md
CRUSH.md		CRUSH.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
bun.lock		bun.lock
eslint.config.js		eslint.config.js
flake.lock		flake.lock
flake.nix		flake.nix
index.html		index.html
package.json		package.json
playwright.config.ts		playwright.config.ts
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echo

What it is

Key capabilities

Smart dictation

Command Mode

Voice capture to folder

Personalization

Speech coach

Developer features

Privacy & performance

Choosing a model

How it works

Practical scenarios

Limitations

Quick start

Install

Build from source

GPU acceleration

Architecture

CLI

Platform support

Linux notes

Manual model installation

Verify release signatures

Links

Contributing

Credits

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Echo

What it is

Key capabilities

Smart dictation

Command Mode

Voice capture to folder

Personalization

Speech coach

Developer features

Privacy & performance

Choosing a model

How it works

Practical scenarios

Limitations

Quick start

Install

Build from source

GPU acceleration

Architecture

CLI

Platform support

Linux notes

Manual model installation

Verify release signatures

Links

Contributing

Credits

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages