Skip to content

hookr-dev/YapStack

YapStack

YapStack

Real-time audio capture & transcription for your desktop. On-device. Open source.

CI Latest release License: AGPL v3 PRs welcome


Warning

YapStack is in alpha. Officially supported on macOS (Apple Silicon recommended). Builds on Intel Macs and Windows are experimental — see Platform Support below. Schema, APIs, and on-disk formats may still change between releases; pin a specific version for any serious use, and expect to read the CHANGELOG before upgrading.

Note

Vibe-coded with AI assistance. YapStack is built using AI pair-programming as a first-class part of the workflow, and we plan to keep iterating that way. Contributions are welcome — your prompts, enhancements, modifications, and PRs back into the project. Bring whatever tools you want; we care about correctness, design clarity, and tests, not provenance.

YapStack is a privacy-first desktop app that captures mic and system audio, transcribes it locally using Whisper or Parakeet TDT, and organizes everything into searchable, editable notes. All processing happens on-device — nothing leaves your machine.

Highlights

Never miss a word

Always-on ring buffer (up to 5 min) captures audio before you hit record. Start a session and rewind to grab what you missed. Backfill transcribes retroactively while live transcription continues in parallel.

Real-time transcription

Choose between Whisper (Metal-accelerated, broad language support) and Parakeet TDT v3 (NVIDIA, faster on Apple Silicon via WebGPU + int8). Per-source VAD, hallucination filtering, two-tier prompt context for continuity. All on-device.

Speaker diarization

Optional multi-speaker labeling via Sortformer (Parakeet only). Rename Speaker 1 / Speaker 2 to whatever fits the meeting.

Full audio capture

Sessions stream to WAV incrementally. Play back at 6 speeds (0.5×–2×) with seeking. Click any transcript timestamp to jump to that moment.

Voice dictation

Global shortcut-driven dictation with deep customization:

  • Multiple slots — Named slots with custom global keybinds.
  • Activation modes — Hold-to-talk or toggle.
  • AI processing — Per-slot system prompts to transform speech (e.g. "Clean & Focus", "Create Spec").
  • Output actions — Paste into active app, copy to clipboard, or create a new note.
  • Status bubble — Floating overlay (listening → transcribing → processing → done).
  • History — Past dictations grouped by day with audio replay.

AI session chat

Per-session chat with tool calling: rename, pin, save to notes, tag, organize into folders — each with 10s undo. AI cites transcript segments as clickable timestamp chips.

Rich notes editor

Tiptap split-pane editor alongside the transcript. Version history with restore.

Mic + system audio

Capture mic, system audio, or both. Independent per-source VAD. Stream health monitoring with auto-restart (up to 3 attempts).

Organization & search

Folders with icons and colors, pinning, drag-and-drop sorting, Cmd+K search across sessions, notes, and segments.

Desktop integration

System tray with quick actions. Fully customizable global shortcuts. Recording indicator overlay. Close-to-minimize.

Platform Support

Platform Status Notes
macOS (Apple Silicon) ✅ Officially supported Primary target. Metal acceleration for Whisper, WebGPU + int8 Parakeet.
macOS (Intel) ⚠️ Best-effort Builds, but reduced performance and limited testing.
Windows 🧪 Experimental Not officially supported. CI/CD does not produce Windows builds. CUDA support exists in code; you may build and run locally at your own discretion. Official Windows support is planned for a future release.
Linux ❌ Not yet No current build target.

Getting Started

Prerequisites

  • Rust ≥ 1.77.2 — install via rustup
  • Node.js ≥ 22
  • pnpmcorepack enable && corepack prepare pnpm@latest-10 --activate
  • cmake — required only if building with the whisper feature flag

Setup

pnpm install

Development

# Full app (Tauri + Vite)
pnpm tauri dev

# Frontend only
pnpm --filter @yapstack/desktop dev

Build

pnpm tauri build

Testing

# Everything
pnpm check          # Rust build + test + fmt + clippy + TS typecheck + ESLint + vitest

# Selective
pnpm test           # Rust + frontend tests
pnpm test:frontend  # Vitest only
pnpm test:rust      # cargo test --all
pnpm lint           # Rust fmt + clippy + ESLint
pnpm typecheck      # TypeScript type checking

Architecture

Tauri v2 app with a Rust backend and React frontend. Five crates handle distinct concerns:

Crate Role
yapstack-common Shared types, config, audio utilities
yapstack-audio Lock-free ring buffers, mic/system capture, WAV export
yapstack-transcription Model management, sidecar IPC
yapstack-transcription-sidecar Standalone Whisper / Parakeet inference binary
yapstack-desktop Tauri command layer, live transcription controller

Detailed docs under docs/:

Tech Stack

  • Tauri v2 — native desktop shell
  • Rust — audio capture, transcription orchestration
  • React 19 + TypeScript — frontend UI
  • SQLite — session, note, and folder persistence
  • Whisper.cpp — on-device speech recognition
  • Parakeet TDT v3 — NVIDIA TDT speech recognition (alternative engine)
  • ONNX Runtime — Parakeet inference + WebGPU/CoreML execution
  • Tiptap — rich text editor

Contributing

PRs welcome. See CONTRIBUTING.md for the full workflow, docs/PRINCIPLES.md for design and testing posture, and docs/DEVELOPMENT.md for build details (including local Windows builds for the curious).

This project follows the Contributor Covenant Code of Conduct. Security issues should be reported per SECURITY.md, not as public issues.

License

YapStack is licensed under the GNU Affero General Public License v3.0. You can use, modify, and redistribute it under those terms; if you run YapStack as a network service, you must make your modifications available under the same license.