vent

Voice-to-text overlay for Wayland. Click, speak, done.

A small pill sits at the bottom of your screen. Click to record, click again to transcribe. The text gets typed into whatever window has focus. No GUI, no settings panels, no cloud. Runs locally on CPU with faster-whisper.

Requirements

Linux with a Wayland compositor (Hyprland, Sway, etc.)
Python >= 3.11

Install

curl -fsSL https://raw.githubusercontent.com/synthforged/vent/main/install.sh | bash

Or from a local clone:

git clone https://github.com/synthforged/vent.git && cd vent
./install.sh

The installer detects your distro and handles system dependencies:

Distro	Package manager
Arch / CachyOS / Manjaro	pacman
Debian / Ubuntu	apt
Fedora	dnf

Installs a venv at ~/.local/share/vent/ with a vent wrapper in ~/.local/bin/.

Manual install

# Arch
sudo pacman -S gtk4 gtk4-layer-shell gobject-introspection cairo portaudio wtype wl-clipboard

# Debian/Ubuntu
sudo apt install build-essential python3-dev python3-venv pkg-config \
    libgirepository1.0-dev libcairo2-dev libgtk-4-dev gir1.2-gtk-4.0 \
    libportaudio2 portaudio19-dev wl-clipboard libgtk4-layer-shell-dev wtype

# Fedora
sudo dnf install gtk4-devel gobject-introspection-devel cairo-devel \
    python3-devel gcc pkg-config portaudio-devel wl-clipboard \
    gtk4-layer-shell-devel wtype

# Then
python -m venv .venv && source .venv/bin/activate
pip install -e .
vent

Usage

Run vent. A small pill appears at the bottom of your screen.

Action	Effect
Left-click (idle)	Start recording — red waveform bars
Left-click (recording/paused)	Stop and transcribe — green pulsing dots
Right-click (recording)	Pause — pill turns amber
Right-click (paused)	Resume recording
`q`	Quit (click pill first to focus)

Transcribed text is copied to clipboard via wl-copy and typed into the focused window via wtype. Pause/resume cycles merge into one transcription.

Speech recognition

Uses faster-whisper (CTranslate2 reimplementation of OpenAI Whisper). Runs on CPU, no GPU required.

Setting	Value
Model	`small` (~461M params, ~500MB)
Quantization	`int8`
Audio	16kHz mono float32
Language	Auto-detected (99 languages)

The model loads lazily on first transcription (2-5s). Subsequent calls are instant.

English has the best accuracy. For other languages, swap to medium or large-v3 in transcriber.py.

Development

See CONTRIBUTING.md.

ruff check .          # lint
ruff format --check . # formatting

CI runs on every PR. Ruff is the only gatekeeper.

Uninstall

rm -rf ~/.local/share/vent ~/.local/bin/vent

Security

See SECURITY.md. No telemetry. No network requests (except Whisper model download on first run).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
src/vent		src/vent
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
install.sh		install.sh
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
vent-states.png		vent-states.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vent

Requirements

Install

Usage

Speech recognition

Development

Uninstall

Security

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vent

Requirements

Install

Usage

Speech recognition

Development

Uninstall

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages