pdf2audiobook

Convert any PDF into a chapter-aware audiobook with a single command — or use the web app to upload, listen, and read along in real time.

How it works

The pipeline runs four stages:

Parse — Extract chapters and structure from the PDF (auto-selects Docling for complex layouts or PyMuPDF for simple ones)
Clean — Remove citations, tables, figures, and normalize text for speech using any LLM via LiteLLM (falls back to regex if unavailable)
Chunk — Split cleaned text into sentence-boundary-respecting chunks using spaCy
Synthesize — Generate audio with parallel TTS, stitch chapters, and produce a final M4B (with chapter markers) or MP3

Between parsing and the streaming stages, the pipeline generates an executive summary of the entire document via LLM and injects it as the first chapter. This gives listeners a high-level overview before diving into the full content.

Stages 2–4 run as a streaming pipeline: while one thread synthesizes Chapter 0 (CPU-heavy), another cleans Chapter 1 via LLM (I/O-heavy). Audio becomes available chapter-by-chapter instead of waiting for the entire book.

Quick start

# Clone and install
git clone https://github.com/matt-ebrahim/pdf2audiobook.git
cd pdf2audiobook
pip install -e ".[kokoro,web]"

# Download the spaCy model
python -m spacy download en_core_web_sm

# Run via CLI
pdf2audiobook paper.pdf --tts kokoro --llm gpt-4o-mini

# Or launch the web app
pdf2audiobook-web
# Open http://localhost:8000

CLI usage

pdf2audiobook <pdf> [options]

Options:
  -o, --output-dir DIR    Output directory (default: ./output/<pdf_name>)
  -c, --config FILE       Path to TOML config file
  --tts ENGINE            kokoro | chatterbox | openai | elevenlabs
  --llm MODEL             LiteLLM model string, or "none" to skip LLM cleaning
  --parser PARSER         auto | docling | pymupdf
  --voice NAME            Voice name (engine-specific)
  --format FORMAT         m4b | mp3

Examples

# Local TTS with Kokoro, LLM cleaning with GPT-4o-mini
pdf2audiobook paper.pdf --tts kokoro --llm gpt-4o-mini

# API TTS with OpenAI, no LLM cleaning
pdf2audiobook book.pdf --tts openai --llm none --voice alloy

# Force Docling parser, output as MP3
pdf2audiobook scanned.pdf --parser docling --format mp3

# Resume an interrupted run (just re-run the same command)
pdf2audiobook paper.pdf --tts kokoro --llm gpt-4o-mini -o output/paper

Checkpoint/resume is automatic — if you interrupt a run, re-running the same command picks up where it left off.

Web app

The web interface provides real-time streaming with:

Upload — Drag-and-drop or click to upload any PDF
Live progress — Watch chapters flow through cleaning → chunking → synthesis with color-coded status indicators
Instant playback — Start listening as soon as the first chapter is ready, with auto-advance to the next
Read along — Text reader panel shows the cleaned chapter text with paragraph highlighting synced to audio playback
PDF view — Embedded PDF viewer tab to reference the original document
Downloads — Download individual chapter MP3s or all chapters as a zip archive
Custom player — Previous/next, seekable progress bar, playback speed control (0.5x–2x), keyboard shortcuts (Space, arrows)

pip install -e ".[kokoro,web]"
pdf2audiobook-web

TTS engines

Engine	Type	Install	Notes
Kokoro	Local	`pip install -e ".[kokoro]"`	Free, runs on CPU, good quality
Chatterbox	Local	`pip install -e ".[chatterbox]"`	Voice cloning support
OpenAI TTS	API	`pip install -e ".[openai-tts]"`	Set `OPENAI_API_KEY`
ElevenLabs	API	`pip install -e ".[elevenlabs]"`	Set `ELEVENLABS_API_KEY`

LLM cleaning

Text cleaning uses LiteLLM, which supports any LLM provider with a single model string:

--llm gpt-4o-mini                    # OpenAI
--llm claude-sonnet-4-20250514             # Anthropic
--llm gemini/gemini-2.0-flash        # Google
--llm ollama/llama3.2                # Local via Ollama
--llm none                           # Skip LLM, use regex only

Set the corresponding API key as an environment variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) or in the config file.

Configuration

Copy config.example.toml and customize:

cp config.example.toml config.toml
pdf2audiobook paper.pdf -c config.toml

See config.example.toml for all available options with descriptions.

Project structure

src/pdf2audiobook/
├── cli.py              # Command-line interface
├── webapp.py           # FastAPI web application
├── pipeline.py         # Main pipeline orchestrator
├── summary.py          # Executive summary generation via LLM
├── streaming.py        # Streaming pipeline (concurrent chapter processing)
├── checkpoint.py       # Checkpoint/resume system
├── progress.py         # Progress reporting with SSE event support
├── config.py           # TOML configuration loading
├── models.py           # Data models (Chapter, ChunkMeta, etc.)
├── static/
│   └── index.html      # Web UI (single-page app)
├── parse/
│   ├── detector.py     # Auto-detect PDF complexity
│   ├── docling_parser.py   # Docling parser (complex PDFs)
│   └── pymupdf_parser.py   # PyMuPDF parser (simple PDFs)
├── clean/
│   └── cleaner.py      # LLM + regex text cleaning
├── chunk/
│   └── chunker.py      # spaCy sentence-boundary chunking
└── synth/
    ├── base.py         # Abstract TTS engine interface
    ├── synthesizer.py  # Synthesis orchestrator
    ├── stitcher.py     # Audio stitching & M4B creation
    ├── kokoro_tts.py   # Kokoro TTS backend
    ├── openai_tts.py   # OpenAI TTS backend
    ├── elevenlabs_tts.py   # ElevenLabs TTS backend
    └── chatterbox_tts.py   # Chatterbox TTS backend

Requirements

Python 3.9+
ffmpeg (for M4B creation): brew install ffmpeg / apt install ffmpeg

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/pdf2audiobook		src/pdf2audiobook
tests		tests
.gitignore		.gitignore
README.md		README.md
config.example.toml		config.example.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2audiobook

How it works

Quick start

CLI usage

Examples

Web app

TTS engines

LLM cleaning

Configuration

Project structure

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2audiobook

How it works

Quick start

CLI usage

Examples

Web app

TTS engines

LLM cleaning

Configuration

Project structure

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages