Skip to content

jnesew/AutoAudio

Repository files navigation

AutoAudio

AutoAudio converts a book file (EPUB, TXT, Markdown, or RST) into chapter and part audiobook files using ComfyUI + VibeVoice.

What you need before running

1) Python and dependencies

Install project dependencies:

python -m pip install -r requirements.txt

2) System tools

AutoAudio uses ffmpeg and ffprobe for stitching audio and writing metadata. Make sure both are installed and on your PATH.

3) ComfyUI runtime requirements (required for real generation)

AutoAudio expects a running ComfyUI server and a compatible workflow/node setup:

  • ComfyUI server reachable at 127.0.0.1:8188 by default (or set --comfyui-server-address)
  • The VibeVoice Single Speaker custom node available in ComfyUI (VibeVoiceSingleSpeakerNode)
  • A reference voice file available in ComfyUI's input files as default_voice.wav
    • The bundled workflow resources/workflows/vibevoice_single_speaker.json loads this filename by default.

If you do not have a live ComfyUI runtime yet, you can still run pipeline logic with --comfyui-mode spoof for testing/development.

Quick usage flow

  1. Start ComfyUI and verify the VibeVoice node loads correctly.
  2. Put your reference voice clip in ComfyUI input files as default_voice.wav.
  3. Choose an input book (.epub, .txt, .md, .markdown, or .rst).
  4. Run AutoAudio from CLI or GUI.
  5. Collect generated chapter/part files from your output directory (default: audiobook_output/).

Run methods

CLI

Basic run:

python auto_audiobook.py --input-book /path/to/book.epub --output-dir /path/to/output

Run with metadata fetch and MP3 output:

python auto_audiobook.py \
  --input-book /path/to/book.epub \
  --output-dir /path/to/output \
  --fetch-metadata \
  --output-format mp3

Resume a prior compatible run checkpoint:

python auto_audiobook.py --input-book /path/to/book.epub --output-dir /path/to/output --resume yes

GUI

Launch desktop app:

python auto_audiobook.py --gui

Notes:

  • GUI mode requires PySide6 (already included in requirements.txt).
  • In GUI, pick input/output paths, optionally enable Fetch metadata, then click Start.
  • If a compatible checkpoint exists, the GUI enables Resume automatically.

CLI arguments

Input/output and source parsing

  • --input-book <path>: input book file path.
  • --output-dir <path>: output directory for generated files.
  • --source-mode {auto,epub,text}: force source parser mode.
  • --pages-per-chapter <int>: EPUB chapter grouping helper.
  • --target-words-per-chapter <int>: text chapter sizing target.
  • --min-paragraphs-per-chapter <int>: lower bound when grouping text chapters.
  • --chapters-per-part <int>: how many chapter files per final "part" file.

Generation tuning

  • --max-words-per-chunk <int>
  • --diffusion-steps <int>
  • --temperature <float>
  • --top-p <float>
  • --cfg-scale <float>
  • --free-memory-after-generate (flag)

Output and metadata

  • --output-format {flac,mp3,m4b}
  • --fetch-metadata (flag; optional online Gutenberg/Gutendex lookup)
  • --gutenberg-id <id> (manual Gutenberg ID override)
  • --title <value> (manual title override)
  • --author <value> (manual author override)

Metadata precedence is:

  1. User overrides (--title, --author)
  2. Embedded source metadata
  3. Fetched online metadata (if enabled)
  4. Fallback defaults

ComfyUI connection/runtime controls

  • --comfyui-mode {network,spoof}
  • --comfyui-server-address <host:port>
  • --comfyui-timeout-seconds <float>
  • --comfyui-spoof-scenario {success,timeout,malformed_history,missing_view_payload,connection_error}

Run control

  • --resume {auto,yes,no}
  • --gui (launches desktop GUI instead of CLI pipeline run)

Outputs and run artifacts

  • Chapter files: Chapter_###_<title>.<format>
  • Part files: <book title> - Part_###.<format>
  • Segment cache: <output-dir>/.segments/
  • Run log: <output-dir>/autoaudio_debug.log
  • Resume checkpoint state: resources/.autoaudio_state/checkpoint_state.json

Troubleshooting

  • Cannot connect to ComfyUI: verify server is running and address matches --comfyui-server-address.
  • No audio generated: verify the VibeVoice node is installed and workflow-compatible.
  • Missing reference voice: ensure default_voice.wav exists in ComfyUI input files.
  • Metadata fetch gives nothing: this is optional; run without --fetch-metadata to stay fully offline.

License

AutoAudio source code is licensed under the MIT License. See LICENSE.

Third-party dependencies are licensed under their own terms. See THIRD_PARTY_DEPENDENCIES.md.

About

Automatically generate audiobooks using TTS model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors

Languages