AudioBookMaker is a modern, full-stack web application that converts eBooks (EPUB and PDF) into audiobooks using advanced AI text-to-speech technology powered by Coqui TTS. Built with a FastAPI backend, Next.js frontend, and asynchronous task processing, it provides a seamless experience for creating personalized audiobooks.
- Multi-Format Support: Convert EPUB and PDF files to audiobooks
- Multilingual TTS: Support for multiple languages using Coqui XTTS v2
- Voice Cloning: Optional voice cloning using sample audio files or browser recording for personalized narration
- Asynchronous Processing: Background job processing with Celery workers
- Real-time Progress Tracking: Monitor conversion progress in real-time
- Resume Capability: Automatically resume interrupted conversions
- Modern UI: Beautiful, responsive web interface built with Next.js and shadcn/ui
- Job Queue Management: Redis-backed task queue for scalable processing
- Job History: Track all conversion jobs with detailed status information
- Smart Text Processing: Advanced sentence segmentation with NLTK
- Hardware Acceleration: Automatic GPU detection (CUDA, MPS) with CPU fallback
- Docker Support: Containerized deployment with Docker Compose
- Blob Storage Ready: Configurable for cloud storage (S3, etc.)
- In-Browser Voice Recording: Record voice samples directly in the browser (up to 10 seconds)
The application consists of four main components:
- Frontend (Next.js): Runs on port 3000, provides the user interface
- Backend (FastAPI): Runs on port 8000, handles API requests
- Redis: Runs on port 6379, serves as message broker and result backend
- Celery Worker: Background worker that processes TTS conversion jobs
Backend:
- FastAPI - High-performance async web framework
- Celery - Distributed task queue for background processing
- Redis - Message broker and result backend
- SQLAlchemy - ORM for job persistence
- Coqui TTS - Advanced neural TTS engine
- PyTorch - Deep learning framework for TTS models
- NLTK - Natural language processing for text segmentation
Frontend:
- Next.js 16 - React framework with App Router
- React 19 - UI library
- TypeScript - Type-safe development
- shadcn/ui - Premium UI component library
- Tailwind CSS 4 - Utility-first CSS framework
- Axios - HTTP client for API requests
- Lucide React - Icon library
- Web Audio API - Browser-based voice recording
- Node.js 20+ and npm/pnpm
- Python 3.9+
- Redis (or use Docker)
- ffmpeg (installation guide)
- CUDA (optional, for GPU acceleration)
-
Clone the repository:
git clone https://github.com/maru-775/AudioBookMaker.git cd AudioBookMaker -
Start all services:
docker-compose up -d
-
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
-
Navigate to backend directory:
cd backend -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables (optional):
export DEVICE=auto # Options: auto, cuda, mps, cpu export COQUI_TOS_AGREED=1 export CELERY_BROKER_URL=redis://localhost:6379/0 export CELERY_RESULT_BACKEND=redis://localhost:6379/0
-
Start the FastAPI server:
uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000
-
Start the Celery worker (in a new terminal):
celery -A src.core.worker.celery_app worker --loglevel=info
-
Navigate to frontend directory:
cd frontend -
Install dependencies:
npm install
-
Set environment variables:
# Create .env.local file echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
-
Start the development server:
npm run dev
-
Access the application:
- Frontend: http://localhost:3000
-
Upload an eBook:
- Navigate to http://localhost:3000
- Select an EPUB or PDF file (or paste text directly)
- Optionally provide a voice sample:
- Upload: Choose a WAV or MP3 file (recommended 10 seconds)
- Record: Record your voice directly in the browser (up to 10 seconds)
-
Configure conversion:
- Select target language
- Adjust speech speed if desired
- Enable "Preview Mode" to convert only the first paragraph
- Click "Create Audiobook"
-
Monitor progress:
- View real-time progress in the job history panel
- Download completed audiobooks directly from the interface
- Refresh job history manually using the refresh button
The backend provides a RESTful API for programmatic access:
# Create a conversion job
curl -X POST "http://localhost:8000/api/convert" \
-F "file=@book.epub" \
-F "language=en" \
-F "speed=1.0"
# Check job status
curl "http://localhost:8000/api/jobs/{job_id}"
# List all jobs
curl "http://localhost:8000/api/jobs"
# Download audiobook
curl "http://localhost:8000/api/download/{job_id}" -OFull API documentation is available at http://localhost:8000/docs
Configuration is managed via environment variables or .env file in the backend directory:
# Application Settings
APP_NAME=AudioBookMaker API
DEBUG=False
OUTPUT_DIR=audiobooks
# TTS Model Settings
MODEL_NAME=tts_models/multilingual/multi-dataset/xtts_v2
DEVICE=auto # auto, cuda, mps, or cpu
# Celery Settings
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
# Database (if using PostgreSQL)
DATABASE_URL=postgresql://user:password@localhost/audiobooksCreate .env.local in the frontend directory:
NEXT_PUBLIC_API_URL=http://localhost:8000The application automatically detects available hardware:
- CUDA (NVIDIA GPUs): Set
DEVICE=cuda - MPS (Apple Silicon): Set
DEVICE=mps - CPU: Set
DEVICE=cpu - Auto-detect: Set
DEVICE=auto(recommended)
# Build optimized images
docker-compose -f docker-compose.prod.yml build
# Deploy
docker-compose -f docker-compose.prod.yml up -dAudioBookMaker/
├── backend/
│ ├── src/
│ │ ├── api/ # FastAPI routes and endpoints
│ │ ├── core/ # Core business logic
│ │ │ ├── converter.py # E-book to audiobook conversion
│ │ │ ├── text_processor.py # Text extraction and processing
│ │ │ ├── celery_app.py # Celery configuration
│ │ │ ├── worker.py # Celery worker tasks
│ │ │ └── database.py # Database models
│ │ ├── utils/ # Utility functions
│ │ └── config.py # Configuration management
│ ├── audiobooks/ # Generated audiobook files
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile
├── frontend/
│ ├── src/
│ │ ├── app/ # Next.js app router pages
│ │ ├── components/ # React components
│ │ │ ├── ui/ # shadcn/ui components
│ │ │ └── VoiceRecorder.tsx # Voice recording component
│ │ └── lib/ # Utilities and API client
│ ├── public/ # Static assets
│ ├── package.json
│ └── Dockerfile
├── docker-compose.yml # Docker orchestration
├── LICENSE
└── README.md
- Backend: Add routes in
backend/src/api/main.py - Workers: Add tasks in
backend/src/core/worker.py - Frontend: Create components in
frontend/src/components/
This project is licensed under the MIT License. See the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made by maru-775