██ ██ █████ ██████
██ ██ ██ ██ ██ ██
████ ███████ ██████
██ ██ ██ ██
██ ██ ██ ██
local-first speech I/O v1
─────────────────────────────────
mic ───► whisper ───► transcript
│
export to
gitlab | github
sftp | webhook
│
speaker ◄─── piper ◄──── text
─────────────────────────────────
all processing on your hardware.
no data leaves your server.
Local-first speech I/O stack — privacy-preserving transcription, synthesis, and export hooks into structured workflows.
Tabs: ASR | TTS | Export | Data | Settings
Documentation:
- 📖 User Guide - Complete interface and workflow documentation
- 📊 Data & Metrics - Usage tracking and analytics
- 📱 Mobile & Tablet - Touch-optimized interface guide
- 📤 Export Setup - GitLab, GitHub, SFTP, and webhook configuration
- 🧪 Testing - Automated test suite documentation
Quick Links: Quick Start | Troubleshooting | Security
Yap provides a unified web application combining ASR (speech-to-text) and TTS (text-to-speech) in a single tabbed interface served from one domain.
| Feature | Description | Backend |
|---|---|---|
| ASR | Record audio and transcribe to text | OpenAI Whisper |
| TTS | Convert text to natural speech | Piper TTS |
The application runs as Docker containers with a terminal-style dark UI, designed for private LAN use.
Yap uses a single-domain architecture where:
- UI is served at
https://APP_DOMAIN/ - ASR API is routed at
https://APP_DOMAIN/asr/* - TTS API is routed at
https://APP_DOMAIN/tts/*
This is achieved via Caddy labels (production) or nginx proxy (local mode).
- Browser-based multi-clip audio recording with live waveform visualization
- Whisper-powered transcription with per-clip status tracking
- Single Copy button copies complete transcript
- Configurable transcript formatting (separators, whitespace cleanup)
- Auto-transcribe and auto-copy options
- Export to GitLab, GitHub, SFTP, or webhooks
- Keyboard shortcuts: Space (record/stop), Ctrl+Enter (transcribe), Ctrl+Shift+C (copy)
See the User Guide for detailed ASR documentation.
- Text input or file upload (supports .txt and .md files)
- Multiple voice selection with preference persistence
- Adjustable speaking rate (0.5× to 2.0×)
- Markdown preview with prominent Plain/Markdown toggle
- Read-along mode with dedicated panel and paragraph highlighting
- Audio playback with Media Session API support
- Download generated audio as .wav files
- Keyboard shortcut: Ctrl+Enter to synthesize
See the User Guide for detailed TTS documentation.
- Export transcripts to GitLab or GitHub repositories (commit files directly)
- Upload transcripts via SFTP
- Generic webhooks - POST to any HTTP endpoint (n8n, Zapier, custom servers)
- GitLab via Webhook - Commit via proxy server (recommended, avoids CORS)
- Save export profiles for quick access and one-tap export
See the Export Guide for complete setup instructions.
- Always visible in main navigation for easy access
- Local-only metrics - Track ASR and TTS usage, data never leaves your server
- Summary cards: minutes recorded, transcribed, and TTS generated
- Event history table with filtering, pagination, and timestamps
- Export history as JSON for analysis
- Clear history functionality with confirmation
- Enabled by default - Set
METRICS_ENABLED=falseto disable - When disabled, shows one-click enable button
See the Data & Metrics Guide for complete documentation.
Note: The Apps ecosystem is disabled by default. Enable it by setting
enableApps: trueinapp/ui/config.js.
When enabled:
- Non-modal draggable/resizable app windows
- Built-in Apps:
- Send (Webhook): Send transcript or conversation data to webhooks
- External Apps: Load additional apps from a manifest URL
- See Apps Documentation for details
The TTS tab features a prominent Plain/Markdown toggle for easy switching between text and rendered markdown views.
The Settings panel provides access to all ASR behavior and formatting options.
The TTS Read-Along feature opens a dedicated panel that highlights paragraphs as they play.
Yap supports two run modes:
- Production mode (recommended): Uses Caddy reverse proxy with automatic HTTPS
- Local mode: Direct port access for testing without Caddy
- Docker with Compose V2
- NVIDIA GPU with CUDA drivers (for ASR)
- For production mode: caddy-docker-proxy
git clone https://github.com/itscooleric/quick-yap.git
cd quick-yapcp app/.env.example app/.env
# Edit app/.env with your settingsKey variables:
# Single domain for unified UI
APP_DOMAIN=app.localhost
# Network
CADDY_NETWORK=caddy
# Model paths
WHISPER_MODELS_PATH=/srv/whisper-asr/models
PIPER_MODELS_PATH=/srv/piper/models
# ASR settings
ASR_ENGINE=faster_whisper
ASR_MODEL=tiny.ensudo mkdir -p /srv/whisper-asr/models
sudo mkdir -p /srv/piper/modelsThe TTS service requires voice models to work. Download at least one:
cd /srv/piper/models
# Recommended: British English Cori (high quality)
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/cori/high/en_GB-cori-high.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/cori/high/en_GB-cori-high.onnx.json
# Set permissions
sudo chmod 644 *.onnx *.jsonOr use the Makefile helper:
make tts-model-cori # Shows download commandsNote: TTS will start even without models, but will show a clear warning message. See Troubleshooting for details.
Production Mode (with Caddy):
# Create Caddy network (if not exists)
docker network create caddy
# Start the unified app
cd app && docker compose up -dOr use the Makefile:
make app-upAccess at: https://app.localhost (or your configured APP_DOMAIN)
Local Mode (without Caddy):
cd app
docker compose -f docker-compose.yml -f docker-compose.local.yml up -dOr use the Makefile:
make app-localAccess at: http://localhost:8080
Direct backend ports (for debugging):
- ASR API:
http://localhost:9000 - TTS API:
http://localhost:5000
The original separate ASR and TTS deployments are still available in asr/ and tts/ folders. See those folders for standalone deployment instructions.
| Variable | Description | Default |
|---|---|---|
APP_DOMAIN |
Single domain for unified UI | app.localhost |
CADDY_NETWORK |
Docker network for Caddy | caddy |
WHISPER_MODELS_PATH |
Host path for Whisper models | /srv/whisper-asr/models |
PIPER_MODELS_PATH |
Host path for Piper voices | /srv/piper/models |
ASR_MODEL |
Whisper model size | tiny.en |
ASR_ENGINE |
ASR engine | faster_whisper |
ASR_DEVICE |
Compute device | cuda |
For legacy separate deployments:
| ASR_DOMAIN | ASR domain (legacy mode) | asr.localhost |
| TTS_DOMAIN | TTS domain (legacy mode) | tts.localhost |
| Model | Parameters | VRAM | Speed | Quality |
|---|---|---|---|---|
| tiny | 39M | ~1GB | Fastest | Low |
| base | 74M | ~1GB | Fast | Good |
| small | 244M | ~2GB | Moderate | Better |
| medium | 769M | ~5GB | Slow | High |
| large-v2/v3 | 1550M | ~10GB | Slowest | Highest |
yap/
├── app/ # Unified Application (recommended)
│ ├── docker-compose.yml # Production config with Caddy labels
│ ├── docker-compose.local.yml # Local development override
│ ├── .env.example # Environment template
│ ├── README.md # Unified app documentation
│ └── ui/ # Static web UI
│ ├── index.html # Main HTML with tabs
│ ├── favicon.svg # Yak logo
│ ├── config.js # Optional config
│ ├── nginx.conf # Nginx config for local mode
│ ├── css/styles.css # Shared styles
│ └── js/ # ES modules
│ ├── main.js # Tab router + bootstrap
│ ├── asr.js # ASR tab logic
│ ├── tts.js # TTS tab logic
│ ├── addons.js # Apps window manager
│ └── util.js # Utility functions
├── asr/ # Legacy Speech-to-Text (standalone)
│ ├── docker-compose.yml
│ ├── docker-compose.local.yml
│ ├── .env.example
│ ├── README.md
│ └── ui/
├── tts/ # Text-to-Speech Backend + Legacy UI
│ ├── docker-compose.yml
│ ├── docker-compose.local.yml
│ ├── Dockerfile
│ ├── app.py
│ ├── .env.example
│ ├── README.md
│ └── ui/
├── exporter/ # Export Service (GitLab/GitHub/SFTP)
│ ├── docker-compose.yml
│ ├── Dockerfile
│ ├── app.py
│ ├── .env.example
│ ├── README.md
│ └── requirements.txt
├── add-ons/ # Apps documentation and examples
│ ├── README.md # Apps guide
│ └── ollama-summarize/ # Ollama integration docs
├── tests/ # Automated tests
│ ├── test_tts.py # TTS endpoint tests
│ └── requirements.txt # Test dependencies
├── docs/
│ └── images/
├── .env.example # Root configuration template
├── .gitignore
├── LICENSE
├── Makefile # Helper commands
└── README.md
The Export feature allows you to commit transcripts to GitLab/GitHub or upload via SFTP.
Quick Start:
cd exporter
cp .env.example .env
# Edit .env with your GitLab/GitHub tokens or SFTP credentials
docker compose up -dToken Requirements:
- GitLab: Personal access token with
apiscope - GitHub: Personal access token with
reposcope or fine-grained token with "Contents: Read and write"
Security Warning: The exporter service stores API tokens. Run on localhost only (default) and do not expose to public internet without authentication.
For complete export setup, configuration, and troubleshooting, see the Export Guide.
- Run on
localhostonly (default) - Tokens are stored server-side, not in the browser
- See exporter/README.md for full documentation
Note: The Apps ecosystem is disabled by default since v1.1. To enable it, set
enableApps: trueinapp/ui/config.js.
Yap includes an extensible apps system for additional functionality. See the apps documentation for details on:
- Using built-in apps (Ollama Summarize, Send/Webhook)
- Loading external apps from a manifest URL
- Creating custom apps
- Apps API reference
Edit app/ui/config.js:
window.__YAP_CONFIG = {
// Enable Apps ecosystem
enableApps: true,
// External apps manifest URL (optional)
appsManifestUrl: 'https://example.com/yap-apps/manifest.json',
// Allowed origins for iframe apps (REQUIRED for external apps)
appsAllowedOrigins: ['https://apps.example.com'],
// Ollama configuration
ollamaUrl: 'http://localhost:11434',
ollamaModel: 'llama3'
};External apps are loaded from a JSON manifest:
{
"version": 1,
"apps": [
{
"id": "my-app",
"name": "My App",
"description": "Description of my app",
"type": "iframe",
"entryUrl": "https://apps.example.com/my-app/index.html"
}
]
}Automated tests are available in the tests/ directory:
# Install test dependencies
pip install -r tests/requirements.txt
# Run TTS tests (requires TTS service running)
pytest tests/test_tts.py -v
# Run unit tests (no services needed)
pytest tests/test_export.py tests/test_settings.py tests/test_read_along.py -vSee tests/README.md for more details on running tests.
When making changes, verify the following features work:
ASR Tab
- Record audio → clip appears with correct duration
- Transcribe All → clips are transcribed
- Auto-transcribe (Settings) → works when enabled
- Auto-copy (Settings) → copies after transcription
- Copy button → copies displayed transcript
- Download .txt → downloads transcript file
- Export button → opens export panel
- Settings → all toggles work and persist
- Ctrl+Enter → triggers transcribe
- Ctrl+Shift+C → triggers copy
- Clear button → respects confirm setting
TTS Tab
- Enter text → synthesize button enables
- Plain/Markdown toggle → switches between views
- Synthesize → audio plays
- Play with Read-Along → opens panel, highlights paragraphs
- Read-along panel → Pause/Stop/Close buttons work
- Download → downloads audio file
- Ctrl+Enter → triggers synthesize
Export
- Export panel opens from ASR tab
- New webhook target → can be created and saved
- GitLab via webhook → works with proxy
- Preview payload → shows correct JSON
- Send → exports successfully
- GitLab commit (requires exporter service) → works
- GitHub commit (requires exporter service) → works
Data Tab (Metrics)
- Data tab visible when metrics enabled
- Record/transcribe → events appear in Data tab
- TTS synthesize → events appear in Data tab
- Summary cards → show correct totals
- History table → shows events with pagination
- Clear history → removes all events
- Export JSON → downloads history file
Apps (when enabled)
- Apps button visible when enableApps=true
- Apps button hidden when enableApps=false
- Built-in apps work (Ollama Summarize, Send/Webhook)
Common commands for managing Yap services:
make help # Show all available commands
# Unified App (recommended)
make app-up # Start unified app (Caddy mode)
make app-local # Start unified app (local mode)
make app-down # Stop unified app
make app-logs # View app logs
make app-restart # Restart app
# Legacy ASR (standalone)
make asr-up # Start ASR services
make asr-down # Stop ASR services
make asr-logs # View ASR logs
make asr-restart # Restart ASR
# Legacy TTS (standalone)
make tts-up # Start TTS services
make tts-down # Stop TTS services
make tts-logs # View TTS logs
make tts-restart # Restart TTS
make tts-health # Check TTS health endpoint
make tts-voices # List available voices
make tts-model-cori # Show commands to download Cori voiceWarning: These tools are designed for private LAN use and have no authentication by default.
- Do not expose to the public internet without authentication
- If you must expose publicly, add authentication via Caddy:
asr.yourdomain.com {
basicauth /* {
user $2a$14$hashedpassword
}
# ... rest of config
}- Use HTTPS (automatic with Caddy)
- Consider VPN access for remote use
- ASR: Audio recordings are sent to your server for transcription
- TTS: Text is sent to your server for synthesis
- No data is sent to external services (all processing is local)
No voices available / TTS won't synthesize
The TTS backend will start successfully even without voice models, but it will display a prominent warning in the logs. To fix:
-
Check the TTS logs:
make tts-logs # or cd tts && docker compose logs -f
-
If you see "NO VOICES FOUND" warning, download voice models:
make tts-model-cori # Shows commands to download -
Verify the models are in the correct directory:
ls -la $PIPER_MODELS_PATH # Should show .onnx and .onnx.json files
-
Test API endpoints (Caddy mode):
# Check health endpoint curl -k https://$APP_DOMAIN/tts/health # Should return: {"status":"ok","voices_count":1} # List available voices curl -k https://$APP_DOMAIN/tts/voices # Should return: ["en_GB-cori-high"] # Test synthesis (POST method) curl -k https://$APP_DOMAIN/tts/synthesize/en_GB-cori-high \ -X POST \ -H "Content-Type: text/plain" \ -d "Hello world" \ --output test.wav # Should create test.wav file
-
Test API endpoints (Local mode):
curl http://localhost:8080/tts/health curl http://localhost:8080/tts/voices # Test synthesis curl http://localhost:8080/tts/synthesize/en_GB-cori-high \ -X POST \ -H "Content-Type: text/plain" \ -d "Hello world" \ --output test.wav # Or test directly on TTS service port (bypassing nginx) curl http://localhost:5000/health curl http://localhost:5000/voices
TTS synthesis returns 405 Method Not Allowed
If you receive a 405 error when trying to synthesize:
-
Verify the endpoint accepts POST requests:
# Should work with POST (recommended) curl -X POST http://localhost:5000/synthesize/VOICE_NAME \ -H "Content-Type: text/plain" \ -d "Test text" # Also works with GET curl "http://localhost:5000/synthesize/VOICE_NAME?text=Test+text"
-
Check NGINX/Caddy logs for routing issues:
docker compose logs yap-app-ui
-
Ensure no redirects are happening (redirects can change POST to GET):
- Check for trailing slash issues in URLs
- Verify proxy configuration preserves HTTP method
Voice files present but not detected
Each voice model requires BOTH files to work:
.onnx- The neural network model file.onnx.json- The model configuration file
If voices are empty or not detected:
- Ensure both
.onnxAND.onnx.jsonfiles exist for each voice - Check file permissions:
sudo chmod 644 *.onnx *.json - Verify the
PIPER_MODELS_PATHenvironment variable matches your directory - Restart the TTS service:
make tts-restart
Synthesis slow
- Use medium quality voices instead of high for faster synthesis
- Longer texts take more time to process
GPU not detected
# Verify NVIDIA Container Toolkit
nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smiIf GPU isn't detected, ensure:
- NVIDIA drivers are installed
- NVIDIA Container Toolkit is installed
- Docker has been restarted after toolkit installation
Microphone not working
- Check browser permissions
- Ensure HTTPS or localhost (required for
getUserMedia) - Check browser console for errors
Model download slow
- First startup downloads the Whisper model
- Models are cached in
$WHISPER_MODELS_PATHfor subsequent runs - Larger models (medium, large) take longer to download
Container won't start
# Check logs
docker compose logs -f
# Check container status
docker ps -aNetwork issues (Caddy mode)
# Verify Caddy network exists
docker network inspect caddy
# If network doesn't exist, create it
docker network create caddyEnvironment variables not being used
- Ensure
.envfile exists in the appropriate directory (root, asr/, or tts/) - Check for typos in variable names
- Restart containers after changing
.env:make tts-restartormake asr-restart
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT - See LICENSE
YAP - Local speech tools


