Turn any EPUB or PDF into a narrated audiobook — locally, with XTTS and VoxCPM2 voice generation.
A full-stack, locally-hosted audiobook production pipeline with a guided 5-step wizard UI. Upload a book, review AI-detected chapters, pick a voice, and download studio-quality audio — no cloud APIs, no subscriptions, no data leaving your machine.
💡 Want to try it yourself? Grab any free EPUB from Project Gutenberg and follow the quickstart below.
- Drag-and-drop or browse to upload EPUB or PDF
- Validated on upload (extension + magic bytes) — no corrupt files sneak in
- Automatic metadata extraction: title, author, cover image
- EPUB: full spine traversal, section and chapter detection
- PDF: text-block extraction via PyMuPDF with layout awareness
- Confidence-scored chapter detection — the app tells you how sure it is about each split
- Cleanup rules strip headers, footers, page numbers, and OCR artifacts automatically
- Edit chapter text directly in the browser
- Split one chapter into two at any cursor position
- Merge adjacent chapters
- Skip chapters you don't want rendered (foreword, index, etc.)
- All edits are persisted — close the browser and come back later
- XTTS v2 voice cloning from 6–30 second audio clips
- VoxCPM2 profiles in
clone,design, andhifimodes - Multiple voice profiles supported; switch between them per render
- Voice previews generated locally before starting a full render
- Speaker embeddings and provider caches stored locally for fast re-renders
- Chapter-level job queue — each chapter is an independent render unit
- Resumable: re-run after a crash and only missing chapters are re-rendered
- Progress visible in real time in the UI
- Outputs WAV per chapter + optional MP3 (requires local
ffmpeg)
- Results page with playable audio previews
- Download individual chapters or the full manifest
- Full
pipeline.logfor debugging render issues - Artifacts stored under
data/projects/<book_id>/— easy to find and share
Upload (EPUB/PDF)
│
▼
Extract ──► raw_document.json (chapters, metadata, cover)
│
▼
Analyze ──► cleaned/book.json (confidence scores, cleanup rules)
│
▼
Review ──► edit / split / merge / skip chapters
│
▼
Render ──► job queue → XTTS / VoxCPM / debug_sine → WAV per chapter
│
▼
Output ──► audio files + manifest + pipeline.log
Requirements: Node.js 18+ · Python 3.11 · Git
ffmpegis optional (only needed for MP3 export)
# 1. Clone and enter the repo
git clone https://github.com/mateusz0909/WakeTheBook.git && cd WakeTheBook
# 2. Install Node dependencies
npm install
# 3. Create Python environment and install backend
python3.11 -m venv .venv && .venv/bin/python -m pip install -r app/backend/requirements.txt
# 4. (Optional but recommended) Set up dedicated TTS runtimes
COQUI_TOS_AGREED=1 ./scripts/setup_xtts_env.sh
# optional: VoxCPM2 runtime for clone/design/hifi profiles
./scripts/setup_voxcpm_env.sh
# 5. Run
COQUI_TOS_AGREED=1 npm run devThen open http://127.0.0.1:5173 in your browser. That's it.
| Service | URL |
|---|---|
| App (frontend) | http://127.0.0.1:5173 |
| API (backend) | http://127.0.0.1:8000/api |
Note on
COQUI_TOS_AGREED=1: XTTS v2 is distributed under the Coqui CPML license. Setting this env var confirms you've reviewed and accept the terms. If you skip the XTTS setup, the app still runs and the debug provider remains available. The optional Vox runtime script now prepares the default VoxCPM2 model.
app/
├── frontend/ # React 19 + Vite + TypeScript + Tailwind v4
│ └── src/
│ ├── pages/ # Library, Wizard (5 steps), Output
│ ├── components/ # WizardLayout, LibraryShell, shared UI
│ └── lib/api.ts # All backend communication (TanStack Query)
└── backend/ # FastAPI + SQLite + filesystem storage
└── app/
├── api/ # Thin HTTP route handlers
├── services/ # Business logic & orchestration
├── repositories/ # SQLite + filesystem persistence
├── schemas/ # Pydantic contracts
└── workers/ # Render job queue + XTTS / VoxCPM runtimes
data/
├── projects/<book_id>/ # Per-book artifacts (audio, chapters, logs…)
├── voices/ # Voice profiles, samples, previews
└── cache/ # XTTS latents and VoxCPM cache
PYTHONPATH=app/backend .venv/bin/python -m pytest app/backend/testsCovers: book upload, extraction, analysis, chapter review, render pipeline, XTTS runtime selection, and VoxCPM voice contracts.
- Full management UI for editing samples inside existing voice profiles
- Dedicated smoke test for real XTTS chapter render on fixture text
- Dedicated smoke test for real VoxCPM runtime once installed
Built by Mateusz Byrtus — Product Owner & AI Product Builder
Portfolio · LinkedIn
If you find this project useful, consider buying me a coffee ☕