BookForge Studio

December 17, 2025: Now with support for Chatterbox Turbo model!

Promo video on Youtube:

TL;DR Get it set up and running locally

Install ffmpeg and python 3.10. (Other python versions are currently untested.)
pip install -r requirements.txt
Windows:
1. Open command prompt and run run.bat, select option 2 "Start main server"
2. Open another command prompt and run run.bat, select whichever AI model you want to run.
3. Go to localhost:8000
Unix (Linux/Mac):
1. Open the terminal and run make main
2. Open another terminal and run make to see all the possible models you can run. Pick one and run it with make chatterbox-service, make higgs-service, etc.
3. Go to localhost:8000

Optional: You can run multiple AI models at once if you wish.
Optional: You can also run Ollama for local speaker identification.

Tutorial video: How to install and use (Runpod and Windows)

Another youtube video: Windows promo and quick install guide

Concepts

Local, open source tool. This repo is MIT licensed and designed for local or hosted usage.
Model-agnostic, workflow-specific. BookForge Studio is designed for creating fully voiced audiobooks with AI audio models and unique voices for every character. Different models can be swapped out and even run in parallel if you find one model is good for a certain thing. New open source AI audio models will be added as they are released.
Support single-speaker and multi-speaker generation. Some models like dia only do multi-speaker generation, many models only do single-speaker. Some like higgs do both. We aim to support whichever modes are viable, or both for experimentation purposes, with voice clones correctly assigned to each party.
Models run on separate processes from the main server. This means you need to run the main server, open another command prompt, and then start a process for the model you want to use. Some of these models take tons of VRAM -- higgs specifically is gigantic. Some are smaller like chatterbox or VibeVoice-1.5B (small). Each 'service' command will automatically create a virtual environment just for the model you choose. Additionally, ollama (for example) will be run as another process.

--

Actors: An 'actor' is a voice clip + some extra data which can be reused to represent a specific character. You can 'favorite' actors.
Voice Modes: A 'voice mode' is a series of 'steps' which each do one task and can be chained together. Start with the default voice modes visible in the interface. You can customize them for a specific workflow (i.e. you may want to have separate voice modes for high, medium and low cfg scale inference, in case one works better with a specific character or context).
BookForge Studio Script: The fundamental 'project file' for BookForge Studio, this file includes all your work on a chapter of a book, including each line of dialogue, what character said it, what actor and voice mode is assigned, and links to all generated audio. (We say "chapter" not "book" because the performance will be awful in several ways if you try to do an entire book in one file.)
Text Workflows: Various ways to turn text or a CSV into a BFS Script. These can include using external api's, a local LLM (hosted with ollama), or simply dragging a pre-annotated CSV.

Adding files (audio, text, etc.) into your project

Audio files, text files (like an audiobook, a chapter of an audiobook, or CSV files which include audiobook text and speakers) should be placed in the files/input folder. You can also drag and drop files in the interface. (Output files will generate in files/output.)

Books and voice clips you get for free

BookForge Studio comes conveniently pre-loaded with:

24+ public domain books (in English) pre-annotated with speakers from the ANITA dataset
- One book, Wuthering Heights, has also been turned into a "starter project" to show what a basic script looks like. However you can turn any of the ANITA dataset books into a BFS script using the "Make Script from CSV" workflow.
250+ voice clips pre-assigned to 'actor' files from the tts-voices-sampler dataset.

For Developers

Check out README-developers.md

Licenses

This repo is MIT licensed.

Chatterbox and VibeVoice are MIT. Dia and Higgs are Apache 2.0.

Some of the voice clips are completely free to use in any context, some are free for non-commercial use, and then there are a couple other wrinkles. Read the readme for our dataset to get the details.

Possible future work

This project was mostly made mid-2025 and some other models have released while we were making the tutorial videos and etc. Let us know if you want to see some specific models, or even better, make a PR -- the 'microservice' setup means that all you need to do to add a new model to this project is add it in ./models/<model name>/ and ./backend/models/<model name>/.

Thanks

Thanks to psdwizzard and cursedhelm for the help thus far!

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
backend		backend
frontend-build		frontend-build
frontend		frontend
models		models
readme-extras		readme-extras
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run.bat		run.bat
run_model.py		run_model.py
run_ollama.sh		run_ollama.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BookForge Studio

December 17, 2025: Now with support for Chatterbox Turbo model!

Promo video on Youtube:

TL;DR Get it set up and running locally

Tutorial video: How to install and use (Runpod and Windows)

Concepts

Adding files (audio, text, etc.) into your project

Books and voice clips you get for free

For Developers

Licenses

Possible future work

Thanks

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

kenning/BookForge-Studio

Folders and files

Latest commit

History

Repository files navigation

BookForge Studio

December 17, 2025: Now with support for Chatterbox Turbo model!

Promo video on Youtube:

TL;DR Get it set up and running locally

Tutorial video: How to install and use (Runpod and Windows)

Concepts

Adding files (audio, text, etc.) into your project

Books and voice clips you get for free

For Developers

Licenses

Possible future work

Thanks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages