Audiobook Conversion Tool

This tool converts a PDF document into an audiobook. It uses a combination of OCR, image interpretation, and text-to-speech (TTS) technologies to generate an audio file for each page of the document. The tool also provides a server for downloading the generated audio files.

Installation

Clone this repo and create an environment from requirements.txt

The first time you use modal it will prompt you to log in. You will need to set the OPENAI_API_KEY in modal secrets for this to work.

Running audiobook.py tts requires a model to be downloaded that is ~2GB. It will be saved in the repo itself, but is ignored by git.

##Usage

Fill out chapters.yaml with the chapters you actively want converted, going off the zero-indexed page numbers.

To start the conversion process, run the following command:

modal run convert.py

To download files run:

modal run server.py

take the urls generated and replace the ones you see in api.py (those are mine and won't work for you)

Then run:

python api.py

You may to edit the functionality in api.py as appropriate to interact with the api.

Directory Structure

The project uses a specific directory structure for input and output files:

mount/: This directory should contain the input PDF file.
outputs/: This directory is created by the scripts and contains the generated audio files and TTS models.
- outputs/models/: Contains the downloaded TTS models.
- outputs/pages/: Contains directories for each page of the PDF, each with its own audio and image files.

Please ensure that the mount/ directory exists and contains the input PDF file before running the scripts.

Note

The conversion process can take a significant amount of time, especially for large PDF files. The image interpretation is expensive — a full 400 page audiobook could cost $100, $80 in GPT cost from vision and $20 in GPU cost on spot pricing.

This could probably be made cheaper with some optimization, including local llama models (which this has an integration with via Ollama, but is optional.)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
api.py		api.py
audiobook.py		audiobook.py
convert.py		convert.py
server.py		server.py
test_audiobook.py		test_audiobook.py
tts_demo.ipynb		tts_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audiobook Conversion Tool

Installation

Directory Structure

Note

About

Releases

Packages

Languages

maxtheman/audiobook

Folders and files

Latest commit

History

Repository files navigation

Audiobook Conversion Tool

Installation

Directory Structure

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages