Yap


  ██    ██  █████  ██████
   ██  ██  ██   ██ ██   ██
    ████   ███████ ██████
     ██    ██   ██ ██
     ██    ██   ██ ██

  local-first speech I/O        v1
  ─────────────────────────────────

  mic ───► whisper ───► transcript
                            │
                        export to
                     gitlab | github
                      sftp  | webhook
                            │
  speaker ◄─── piper ◄──── text

  ─────────────────────────────────
  all processing on your hardware.
  no data leaves your server.

Local-first speech I/O stack — privacy-preserving transcription, synthesis, and export hooks into structured workflows.

Quick Navigation

Tabs: ASR | TTS | Export | Data | Settings

Documentation:

📖 User Guide - Complete interface and workflow documentation
📊 Data & Metrics - Usage tracking and analytics
📱 Mobile & Tablet - Touch-optimized interface guide
📤 Export Setup - GitLab, GitHub, SFTP, and webhook configuration
🧪 Testing - Automated test suite documentation

Quick Links: Quick Start | Troubleshooting | Security

Overview

Yap provides a unified web application combining ASR (speech-to-text) and TTS (text-to-speech) in a single tabbed interface served from one domain.

Feature	Description	Backend
ASR	Record audio and transcribe to text	OpenAI Whisper
TTS	Convert text to natural speech	Piper TTS

The application runs as Docker containers with a terminal-style dark UI, designed for private LAN use.

Architecture

Yap uses a single-domain architecture where:

UI is served at https://APP_DOMAIN/
ASR API is routed at https://APP_DOMAIN/asr/*
TTS API is routed at https://APP_DOMAIN/tts/*

This is achieved via Caddy labels (production) or nginx proxy (local mode).

Features

ASR Tab

Browser-based multi-clip audio recording with live waveform visualization
Whisper-powered transcription with per-clip status tracking
Single Copy button copies complete transcript
Configurable transcript formatting (separators, whitespace cleanup)
Auto-transcribe and auto-copy options
Export to GitLab, GitHub, SFTP, or webhooks
Keyboard shortcuts: Space (record/stop), Ctrl+Enter (transcribe), Ctrl+Shift+C (copy)

See the User Guide for detailed ASR documentation.

TTS Tab

Text input or file upload (supports .txt and .md files)
Multiple voice selection with preference persistence
Adjustable speaking rate (0.5× to 2.0×)
Markdown preview with prominent Plain/Markdown toggle
Read-along mode with dedicated panel and paragraph highlighting
Audio playback with Media Session API support
Download generated audio as .wav files
Keyboard shortcut: Ctrl+Enter to synthesize

See the User Guide for detailed TTS documentation.

Export

Export transcripts to GitLab or GitHub repositories (commit files directly)
Upload transcripts via SFTP
Generic webhooks - POST to any HTTP endpoint (n8n, Zapier, custom servers)
GitLab via Webhook - Commit via proxy server (recommended, avoids CORS)
Save export profiles for quick access and one-tap export

See the Export Guide for complete setup instructions.

Data Tab (Metrics)

Always visible in main navigation for easy access
Local-only metrics - Track ASR and TTS usage, data never leaves your server
Summary cards: minutes recorded, transcribed, and TTS generated
Event history table with filtering, pagination, and timestamps
Export history as JSON for analysis
Clear history functionality with confirmation
Enabled by default - Set METRICS_ENABLED=false to disable
When disabled, shows one-click enable button

See the Data & Metrics Guide for complete documentation.

Apps (Optional)

Note: The Apps ecosystem is disabled by default. Enable it by setting enableApps: true in app/ui/config.js.

When enabled:

Non-modal draggable/resizable app windows
Built-in Apps:
- Send (Webhook): Send transcript or conversation data to webhooks
External Apps: Load additional apps from a manifest URL
See Apps Documentation for details

Screenshots

TTS View with Markdown Toggle

The TTS tab features a prominent Plain/Markdown toggle for easy switching between text and rendered markdown views.

Settings Panel

The Settings panel provides access to all ASR behavior and formatting options.

Read-Along Mode

The TTS Read-Along feature opens a dedicated panel that highlights paragraphs as they play.

Quick Start

Yap supports two run modes:

Production mode (recommended): Uses Caddy reverse proxy with automatic HTTPS
Local mode: Direct port access for testing without Caddy

Prerequisites

Docker with Compose V2
NVIDIA GPU with CUDA drivers (for ASR)
For production mode: caddy-docker-proxy

Setup

1. Clone the Repository

git clone https://github.com/itscooleric/quick-yap.git
cd quick-yap

2. Configure Environment

cp app/.env.example app/.env
# Edit app/.env with your settings

Key variables:

# Single domain for unified UI
APP_DOMAIN=app.localhost

# Network
CADDY_NETWORK=caddy

# Model paths
WHISPER_MODELS_PATH=/srv/whisper-asr/models
PIPER_MODELS_PATH=/srv/piper/models

# ASR settings
ASR_ENGINE=faster_whisper
ASR_MODEL=tiny.en

3. Create Model Directories

sudo mkdir -p /srv/whisper-asr/models
sudo mkdir -p /srv/piper/models

4. Download TTS Voice Models

The TTS service requires voice models to work. Download at least one:

cd /srv/piper/models

# Recommended: British English Cori (high quality)
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/cori/high/en_GB-cori-high.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/cori/high/en_GB-cori-high.onnx.json

# Set permissions
sudo chmod 644 *.onnx *.json

Or use the Makefile helper:

make tts-model-cori  # Shows download commands

Note: TTS will start even without models, but will show a clear warning message. See Troubleshooting for details.

5. Start Services

Production Mode (with Caddy):

# Create Caddy network (if not exists)
docker network create caddy

# Start the unified app
cd app && docker compose up -d

Or use the Makefile:

make app-up

Access at: https://app.localhost (or your configured APP_DOMAIN)

Local Mode (without Caddy):

cd app
docker compose -f docker-compose.yml -f docker-compose.local.yml up -d

Or use the Makefile:

make app-local

Access at: http://localhost:8080

Direct backend ports (for debugging):

ASR API: http://localhost:9000
TTS API: http://localhost:5000

Legacy Separate Deployments

The original separate ASR and TTS deployments are still available in asr/ and tts/ folders. See those folders for standalone deployment instructions.

Configuration

Environment Variables

Variable	Description	Default
`APP_DOMAIN`	Single domain for unified UI	`app.localhost`
`CADDY_NETWORK`	Docker network for Caddy	`caddy`
`WHISPER_MODELS_PATH`	Host path for Whisper models	`/srv/whisper-asr/models`
`PIPER_MODELS_PATH`	Host path for Piper voices	`/srv/piper/models`
`ASR_MODEL`	Whisper model size	`tiny.en`
`ASR_ENGINE`	ASR engine	`faster_whisper`
`ASR_DEVICE`	Compute device	`cuda`

Whisper Model Sizes

Model	Parameters	VRAM	Speed	Quality
tiny	39M	~1GB	Fastest	Low
base	74M	~1GB	Fast	Good
small	244M	~2GB	Moderate	Better
medium	769M	~5GB	Slow	High
large-v2/v3	1550M	~10GB	Slowest	Highest

Repository Structure

yap/
├── app/                       # Unified Application (recommended)
│   ├── docker-compose.yml     # Production config with Caddy labels
│   ├── docker-compose.local.yml # Local development override
│   ├── .env.example           # Environment template
│   ├── README.md              # Unified app documentation
│   └── ui/                    # Static web UI
│       ├── index.html         # Main HTML with tabs
│       ├── favicon.svg        # Yak logo
│       ├── config.js          # Optional config
│       ├── nginx.conf         # Nginx config for local mode
│       ├── css/styles.css     # Shared styles
│       └── js/                # ES modules
│           ├── main.js        # Tab router + bootstrap
│           ├── asr.js         # ASR tab logic
│           ├── tts.js         # TTS tab logic
│           ├── addons.js      # Apps window manager
│           └── util.js        # Utility functions
├── asr/                       # Legacy Speech-to-Text (standalone)
│   ├── docker-compose.yml
│   ├── docker-compose.local.yml
│   ├── .env.example
│   ├── README.md
│   └── ui/
├── tts/                       # Text-to-Speech Backend + Legacy UI
│   ├── docker-compose.yml
│   ├── docker-compose.local.yml
│   ├── Dockerfile
│   ├── app.py
│   ├── .env.example
│   ├── README.md
│   └── ui/
├── exporter/                  # Export Service (GitLab/GitHub/SFTP)
│   ├── docker-compose.yml
│   ├── Dockerfile
│   ├── app.py
│   ├── .env.example
│   ├── README.md
│   └── requirements.txt
├── add-ons/                   # Apps documentation and examples
│   ├── README.md              # Apps guide
│   └── ollama-summarize/      # Ollama integration docs
├── tests/                     # Automated tests
│   ├── test_tts.py            # TTS endpoint tests
│   └── requirements.txt       # Test dependencies
├── docs/
│   └── images/
├── .env.example               # Root configuration template
├── .gitignore
├── LICENSE
├── Makefile                   # Helper commands
└── README.md

Export Configuration

The Export feature allows you to commit transcripts to GitLab/GitHub or upload via SFTP.

Quick Start:

cd exporter
cp .env.example .env
# Edit .env with your GitLab/GitHub tokens or SFTP credentials
docker compose up -d

Token Requirements:

GitLab: Personal access token with api scope
GitHub: Personal access token with repo scope or fine-grained token with "Contents: Read and write"

Security Warning: The exporter service stores API tokens. Run on localhost only (default) and do not expose to public internet without authentication.

For complete export setup, configuration, and troubleshooting, see the Export Guide.

Run on localhost only (default)
Tokens are stored server-side, not in the browser
See exporter/README.md for full documentation

Apps (Optional)

Note: The Apps ecosystem is disabled by default since v1.1. To enable it, set enableApps: true in app/ui/config.js.

Yap includes an extensible apps system for additional functionality. See the apps documentation for details on:

Using built-in apps (Ollama Summarize, Send/Webhook)
Loading external apps from a manifest URL
Creating custom apps
Apps API reference

Enabling Apps

Edit app/ui/config.js:

window.__YAP_CONFIG = {
  // Enable Apps ecosystem
  enableApps: true,
  
  // External apps manifest URL (optional)
  appsManifestUrl: 'https://example.com/yap-apps/manifest.json',
  
  // Allowed origins for iframe apps (REQUIRED for external apps)
  appsAllowedOrigins: ['https://apps.example.com'],
  
  // Ollama configuration
  ollamaUrl: 'http://localhost:11434',
  ollamaModel: 'llama3'
};

External Apps Manifest

External apps are loaded from a JSON manifest:

{
  "version": 1,
  "apps": [
    {
      "id": "my-app",
      "name": "My App",
      "description": "Description of my app",
      "type": "iframe",
      "entryUrl": "https://apps.example.com/my-app/index.html"
    }
  ]
}

Testing

Automated tests are available in the tests/ directory:

# Install test dependencies
pip install -r tests/requirements.txt

# Run TTS tests (requires TTS service running)
pytest tests/test_tts.py -v

# Run unit tests (no services needed)
pytest tests/test_export.py tests/test_settings.py tests/test_read_along.py -v

See tests/README.md for more details on running tests.

Manual Test Checklist

When making changes, verify the following features work:

ASR Tab

TTS Tab

Enter text → synthesize button enables
Plain/Markdown toggle → switches between views
Synthesize → audio plays
Play with Read-Along → opens panel, highlights paragraphs
Read-along panel → Pause/Stop/Close buttons work
Download → downloads audio file
Ctrl+Enter → triggers synthesize

Export

Export panel opens from ASR tab
New webhook target → can be created and saved
GitLab via webhook → works with proxy
Preview payload → shows correct JSON
Send → exports successfully
GitLab commit (requires exporter service) → works
GitHub commit (requires exporter service) → works

Data Tab (Metrics)

Data tab visible when metrics enabled
Record/transcribe → events appear in Data tab
TTS synthesize → events appear in Data tab
Summary cards → show correct totals
History table → shows events with pagination
Clear history → removes all events
Export JSON → downloads history file

Apps (when enabled)

Apps button visible when enableApps=true
Apps button hidden when enableApps=false
Built-in apps work (Ollama Summarize, Send/Webhook)

Makefile Helpers

Common commands for managing Yap services:

make help           # Show all available commands

# Unified App (recommended)
make app-up         # Start unified app (Caddy mode)
make app-local      # Start unified app (local mode)
make app-down       # Stop unified app
make app-logs       # View app logs
make app-restart    # Restart app

# Legacy ASR (standalone)
make asr-up         # Start ASR services
make asr-down       # Stop ASR services
make asr-logs       # View ASR logs
make asr-restart    # Restart ASR

# Legacy TTS (standalone)
make tts-up         # Start TTS services
make tts-down       # Stop TTS services
make tts-logs       # View TTS logs
make tts-restart    # Restart TTS
make tts-health     # Check TTS health endpoint
make tts-voices     # List available voices
make tts-model-cori # Show commands to download Cori voice

Security Notes

Warning: These tools are designed for private LAN use and have no authentication by default.

Recommendations

Do not expose to the public internet without authentication
If you must expose publicly, add authentication via Caddy:

asr.yourdomain.com {
    basicauth /* {
        user $2a$14$hashedpassword
    }
    # ... rest of config
}

Use HTTPS (automatic with Caddy)
Consider VPN access for remote use

What is exposed

ASR: Audio recordings are sent to your server for transcription
TTS: Text is sent to your server for synthesis
No data is sent to external services (all processing is local)

Troubleshooting

TTS Issues

No voices available / TTS won't synthesize

The TTS backend will start successfully even without voice models, but it will display a prominent warning in the logs. To fix:

Check the TTS logs:

make tts-logs
# or
cd tts && docker compose logs -f

If you see "NO VOICES FOUND" warning, download voice models:
```
make tts-model-cori  # Shows commands to download
```

Verify the models are in the correct directory:

ls -la $PIPER_MODELS_PATH
# Should show .onnx and .onnx.json files

Test API endpoints (Caddy mode):

# Check health endpoint
curl -k https://$APP_DOMAIN/tts/health
# Should return: {"status":"ok","voices_count":1}

# List available voices
curl -k https://$APP_DOMAIN/tts/voices
# Should return: ["en_GB-cori-high"]

# Test synthesis (POST method)
curl -k https://$APP_DOMAIN/tts/synthesize/en_GB-cori-high \
  -X POST \
  -H "Content-Type: text/plain" \
  -d "Hello world" \
  --output test.wav
# Should create test.wav file

Test API endpoints (Local mode):

curl http://localhost:8080/tts/health
curl http://localhost:8080/tts/voices

# Test synthesis
curl http://localhost:8080/tts/synthesize/en_GB-cori-high \
  -X POST \
  -H "Content-Type: text/plain" \
  -d "Hello world" \
  --output test.wav

# Or test directly on TTS service port (bypassing nginx)
curl http://localhost:5000/health
curl http://localhost:5000/voices

TTS synthesis returns 405 Method Not Allowed

If you receive a 405 error when trying to synthesize:

Verify the endpoint accepts POST requests:

# Should work with POST (recommended)
curl -X POST http://localhost:5000/synthesize/VOICE_NAME \
  -H "Content-Type: text/plain" \
  -d "Test text"

# Also works with GET
curl "http://localhost:5000/synthesize/VOICE_NAME?text=Test+text"

Check NGINX/Caddy logs for routing issues:
```
docker compose logs yap-app-ui
```
Ensure no redirects are happening (redirects can change POST to GET):
- Check for trailing slash issues in URLs
- Verify proxy configuration preserves HTTP method

Voice files present but not detected

Each voice model requires BOTH files to work:

.onnx - The neural network model file
.onnx.json - The model configuration file

If voices are empty or not detected:

Ensure both .onnx AND .onnx.json files exist for each voice
Check file permissions: sudo chmod 644 *.onnx *.json
Verify the PIPER_MODELS_PATH environment variable matches your directory
Restart the TTS service: make tts-restart

Synthesis slow

Use medium quality voices instead of high for faster synthesis
Longer texts take more time to process

ASR Issues

GPU not detected

# Verify NVIDIA Container Toolkit
nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

If GPU isn't detected, ensure:

NVIDIA drivers are installed
NVIDIA Container Toolkit is installed
Docker has been restarted after toolkit installation

Microphone not working

Check browser permissions
Ensure HTTPS or localhost (required for getUserMedia)
Check browser console for errors

Model download slow

First startup downloads the Whisper model
Models are cached in $WHISPER_MODELS_PATH for subsequent runs
Larger models (medium, large) take longer to download

General Issues

Container won't start

# Check logs
docker compose logs -f

# Check container status
docker ps -a

Network issues (Caddy mode)

# Verify Caddy network exists
docker network inspect caddy

# If network doesn't exist, create it
docker network create caddy

Environment variables not being used

Ensure .env file exists in the appropriate directory (root, asr/, or tts/)
Check for typos in variable names
Restart containers after changing .env: make tts-restart or make asr-restart

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

MIT - See LICENSE

YAP - Local speech tools

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
add-ons		add-ons
app		app
asr		asr
docs		docs
exporter		exporter
services/yap-metrics		services/yap-metrics
tests		tests
tts		tts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Yap

Quick Navigation

Overview

Architecture

Features

ASR Tab

TTS Tab

Export

Data Tab (Metrics)

Apps (Optional)

Screenshots

TTS View with Markdown Toggle

Settings Panel

Read-Along Mode

Quick Start

Prerequisites

Setup

1. Clone the Repository

2. Configure Environment

3. Create Model Directories

4. Download TTS Voice Models

5. Start Services

Legacy Separate Deployments

Configuration

Environment Variables

Whisper Model Sizes

Repository Structure

Export Configuration

Apps (Optional)

Enabling Apps

External Apps Manifest

Testing

Manual Test Checklist

Makefile Helpers

Security Notes

Recommendations

What is exposed

Troubleshooting

TTS Issues

ASR Issues

General Issues

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages