Skip to content

mateusz0909/WakeTheBook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 WakeTheBook

Turn any EPUB or PDF into a narrated audiobook — locally, with XTTS and VoxCPM2 voice generation.

Python FastAPI React TypeScript Tailwind CSS TanStack Query XTTS License

A full-stack, locally-hosted audiobook production pipeline with a guided 5-step wizard UI. Upload a book, review AI-detected chapters, pick a voice, and download studio-quality audio — no cloud APIs, no subscriptions, no data leaving your machine.


💡 Want to try it yourself? Grab any free EPUB from Project Gutenberg and follow the quickstart below.


✨ Features

📖 Book Import

  • Drag-and-drop or browse to upload EPUB or PDF
  • Validated on upload (extension + magic bytes) — no corrupt files sneak in
  • Automatic metadata extraction: title, author, cover image

🔍 Smart Extraction & Analysis

  • EPUB: full spine traversal, section and chapter detection
  • PDF: text-block extraction via PyMuPDF with layout awareness
  • Confidence-scored chapter detection — the app tells you how sure it is about each split
  • Cleanup rules strip headers, footers, page numbers, and OCR artifacts automatically

✏️ Manual Chapter Review

  • Edit chapter text directly in the browser
  • Split one chapter into two at any cursor position
  • Merge adjacent chapters
  • Skip chapters you don't want rendered (foreword, index, etc.)
  • All edits are persisted — close the browser and come back later

🎙 Voice Profiles (XTTS + VoxCPM2)

  • XTTS v2 voice cloning from 6–30 second audio clips
  • VoxCPM2 profiles in clone, design, and hifi modes
  • Multiple voice profiles supported; switch between them per render
  • Voice previews generated locally before starting a full render
  • Speaker embeddings and provider caches stored locally for fast re-renders

⚙️ Render Engine

  • Chapter-level job queue — each chapter is an independent render unit
  • Resumable: re-run after a crash and only missing chapters are re-rendered
  • Progress visible in real time in the UI
  • Outputs WAV per chapter + optional MP3 (requires local ffmpeg)

📦 Results & Export

  • Results page with playable audio previews
  • Download individual chapters or the full manifest
  • Full pipeline.log for debugging render issues
  • Artifacts stored under data/projects/<book_id>/ — easy to find and share

🔄 Pipeline

Upload (EPUB/PDF)
    │
    ▼
Extract  ──► raw_document.json  (chapters, metadata, cover)
    │
    ▼
Analyze  ──► cleaned/book.json  (confidence scores, cleanup rules)
    │
    ▼
Review   ──► edit / split / merge / skip chapters
    │
    ▼
Render   ──► job queue → XTTS / VoxCPM / debug_sine → WAV per chapter
    │
    ▼
Output   ──► audio files + manifest + pipeline.log

🚀 Quickstart (5 commands)

Requirements: Node.js 18+ · Python 3.11 · Git
ffmpeg is optional (only needed for MP3 export)

# 1. Clone and enter the repo
git clone https://github.com/mateusz0909/WakeTheBook.git && cd WakeTheBook

# 2. Install Node dependencies
npm install

# 3. Create Python environment and install backend
python3.11 -m venv .venv && .venv/bin/python -m pip install -r app/backend/requirements.txt

# 4. (Optional but recommended) Set up dedicated TTS runtimes
COQUI_TOS_AGREED=1 ./scripts/setup_xtts_env.sh
# optional: VoxCPM2 runtime for clone/design/hifi profiles
./scripts/setup_voxcpm_env.sh

# 5. Run
COQUI_TOS_AGREED=1 npm run dev

Then open http://127.0.0.1:5173 in your browser. That's it.

Service URL
App (frontend) http://127.0.0.1:5173
API (backend) http://127.0.0.1:8000/api

Note on COQUI_TOS_AGREED=1: XTTS v2 is distributed under the Coqui CPML license. Setting this env var confirms you've reviewed and accept the terms. If you skip the XTTS setup, the app still runs and the debug provider remains available. The optional Vox runtime script now prepares the default VoxCPM2 model.


🏗 Architecture

app/
├── frontend/          # React 19 + Vite + TypeScript + Tailwind v4
│   └── src/
│       ├── pages/     # Library, Wizard (5 steps), Output
│       ├── components/ # WizardLayout, LibraryShell, shared UI
│       └── lib/api.ts  # All backend communication (TanStack Query)
└── backend/           # FastAPI + SQLite + filesystem storage
    └── app/
        ├── api/        # Thin HTTP route handlers
        ├── services/   # Business logic & orchestration
        ├── repositories/ # SQLite + filesystem persistence
        ├── schemas/    # Pydantic contracts
        └── workers/    # Render job queue + XTTS / VoxCPM runtimes
data/
├── projects/<book_id>/ # Per-book artifacts (audio, chapters, logs…)
    ├── voices/             # Voice profiles, samples, previews
    └── cache/              # XTTS latents and VoxCPM cache

🧪 Tests

PYTHONPATH=app/backend .venv/bin/python -m pytest app/backend/tests

Covers: book upload, extraction, analysis, chapter review, render pipeline, XTTS runtime selection, and VoxCPM voice contracts.


🗺 Roadmap

  • Full management UI for editing samples inside existing voice profiles
  • Dedicated smoke test for real XTTS chapter render on fixture text
  • Dedicated smoke test for real VoxCPM runtime once installed

🧑‍💻 Author

Built by Mateusz Byrtus — Product Owner & AI Product Builder
Portfolio · LinkedIn

If you find this project useful, consider buying me a coffee ☕

Buy Me a Coffee at ko-fi.com


All processing is local. Your books never leave your machine.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors