📚 WakeTheBook

Turn any EPUB or PDF into a narrated audiobook — locally, with XTTS and VoxCPM2 voice generation.

A full-stack, locally-hosted audiobook production pipeline with a guided 5-step wizard UI. Upload a book, review AI-detected chapters, pick a voice, and download studio-quality audio — no cloud APIs, no subscriptions, no data leaving your machine.

💡 Want to try it yourself? Grab any free EPUB from Project Gutenberg and follow the quickstart below.

✨ Features

📖 Book Import

Drag-and-drop or browse to upload EPUB or PDF
Validated on upload (extension + magic bytes) — no corrupt files sneak in
Automatic metadata extraction: title, author, cover image

🔍 Smart Extraction & Analysis

EPUB: full spine traversal, section and chapter detection
PDF: text-block extraction via PyMuPDF with layout awareness
Confidence-scored chapter detection — the app tells you how sure it is about each split
Cleanup rules strip headers, footers, page numbers, and OCR artifacts automatically

✏️ Manual Chapter Review

Edit chapter text directly in the browser
Split one chapter into two at any cursor position
Merge adjacent chapters
Skip chapters you don't want rendered (foreword, index, etc.)
All edits are persisted — close the browser and come back later

🎙 Voice Profiles (XTTS + VoxCPM2)

XTTS v2 voice cloning from 6–30 second audio clips
VoxCPM2 profiles in clone, design, and hifi modes
Multiple voice profiles supported; switch between them per render
Voice previews generated locally before starting a full render
Speaker embeddings and provider caches stored locally for fast re-renders

⚙️ Render Engine

Chapter-level job queue — each chapter is an independent render unit
Resumable: re-run after a crash and only missing chapters are re-rendered
Progress visible in real time in the UI
Outputs WAV per chapter + optional MP3 (requires local ffmpeg)

📦 Results & Export

Results page with playable audio previews
Download individual chapters or the full manifest
Full pipeline.log for debugging render issues
Artifacts stored under data/projects/<book_id>/ — easy to find and share

🔄 Pipeline

Upload (EPUB/PDF)
    │
    ▼
Extract  ──► raw_document.json  (chapters, metadata, cover)
    │
    ▼
Analyze  ──► cleaned/book.json  (confidence scores, cleanup rules)
    │
    ▼
Review   ──► edit / split / merge / skip chapters
    │
    ▼
Render   ──► job queue → XTTS / VoxCPM / debug_sine → WAV per chapter
    │
    ▼
Output   ──► audio files + manifest + pipeline.log

🚀 Quickstart (5 commands)

Requirements: Node.js 18+ · Python 3.11 · Git
ffmpeg is optional (only needed for MP3 export)

# 1. Clone and enter the repo
git clone https://github.com/mateusz0909/WakeTheBook.git && cd WakeTheBook

# 2. Install Node dependencies
npm install

# 3. Create Python environment and install backend
python3.11 -m venv .venv && .venv/bin/python -m pip install -r app/backend/requirements.txt

# 4. (Optional but recommended) Set up dedicated TTS runtimes
COQUI_TOS_AGREED=1 ./scripts/setup_xtts_env.sh
# optional: VoxCPM2 runtime for clone/design/hifi profiles
./scripts/setup_voxcpm_env.sh

# 5. Run
COQUI_TOS_AGREED=1 npm run dev

Then open http://127.0.0.1:5173 in your browser. That's it.

Service	URL
App (frontend)	http://127.0.0.1:5173
API (backend)	http://127.0.0.1:8000/api

Note on COQUI_TOS_AGREED=1: XTTS v2 is distributed under the Coqui CPML license. Setting this env var confirms you've reviewed and accept the terms. If you skip the XTTS setup, the app still runs and the debug provider remains available. The optional Vox runtime script now prepares the default VoxCPM2 model.

🏗 Architecture

app/
├── frontend/          # React 19 + Vite + TypeScript + Tailwind v4
│   └── src/
│       ├── pages/     # Library, Wizard (5 steps), Output
│       ├── components/ # WizardLayout, LibraryShell, shared UI
│       └── lib/api.ts  # All backend communication (TanStack Query)
└── backend/           # FastAPI + SQLite + filesystem storage
    └── app/
        ├── api/        # Thin HTTP route handlers
        ├── services/   # Business logic & orchestration
        ├── repositories/ # SQLite + filesystem persistence
        ├── schemas/    # Pydantic contracts
        └── workers/    # Render job queue + XTTS / VoxCPM runtimes
data/
├── projects/<book_id>/ # Per-book artifacts (audio, chapters, logs…)
    ├── voices/             # Voice profiles, samples, previews
    └── cache/              # XTTS latents and VoxCPM cache

🧪 Tests

PYTHONPATH=app/backend .venv/bin/python -m pytest app/backend/tests

Covers: book upload, extraction, analysis, chapter review, render pipeline, XTTS runtime selection, and VoxCPM voice contracts.

🗺 Roadmap

Full management UI for editing samples inside existing voice profiles
Dedicated smoke test for real XTTS chapter render on fixture text
Dedicated smoke test for real VoxCPM runtime once installed

🧑‍💻 Author

Built by Mateusz Byrtus — Product Owner & AI Product Builder
Portfolio · LinkedIn

If you find this project useful, consider buying me a coffee ☕

_{All processing is local. Your books never leave your machine.}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
app		app
doc		doc
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 WakeTheBook

✨ Features

📖 Book Import

🔍 Smart Extraction & Analysis

✏️ Manual Chapter Review

🎙 Voice Profiles (XTTS + VoxCPM2)

⚙️ Render Engine

📦 Results & Export

🔄 Pipeline

🚀 Quickstart (5 commands)

🏗 Architecture

🧪 Tests

🗺 Roadmap

🧑‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 WakeTheBook

✨ Features

📖 Book Import

🔍 Smart Extraction & Analysis

✏️ Manual Chapter Review

🎙 Voice Profiles (XTTS + VoxCPM2)

⚙️ Render Engine

📦 Results & Export

🔄 Pipeline

🚀 Quickstart (5 commands)

🏗 Architecture

🧪 Tests

🗺 Roadmap

🧑‍💻 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages