A Bash script that converts Markdown files into narrated audio using Text-to-Speech (TTS) via Docker and the Coqui-TTS framework.
- Converts a Markdown (
.md
) file to MP3 audio. - Uses the VCTK multi-speaker TTS model with customizable voices.
- Cleans Markdown syntax (headings, links, code blocks, etc.) for more natural-sounding speech.
- Splits large files into manageable chunks for processing.
- Optional playback of the generated audio.
- Docker (with GPU support recommended for faster synthesis)
- Docker Compose
- Python 3 (for model invocation and Markdown cleaning)
- ffmpeg (for audio processing)
- mp3wrap (for merging audio chunks)
- str (for string manipulation)
-
Clone the repository:
git clone https://github.com/ngpepin/TTS.git cd narrate-md
-
Make the script executable:
chmod +x narrate-md.sh
-
(Optional) Place the script in your
PATH
for global access:sudo ln -s $(pwd)/narrate-md.sh /usr/local/bin/narrate-md
./narrate-md.sh [-p] <input.md>
| Flag | Description |
| ---- | ------------------------------------ |
| `-p` | Play the generated audio immediately |
./narrate-md.sh -p README.md # Converts README.md to README.mp3 and plays it
Edit these variables in the script to customize behavior:
MODEL="tts_models/en/vctk/vits" # TTS model (default: VCTK multi-speaker)
SPEAKER="p230" # Speaker ID (e.g., p230-p260 for VCTK)
MAX_LINES_PER_CHUNK=20 # Split large files into chunks of this size
-
Markdown Cleaning:
-
Strips headings, links, code blocks, and formatting
-
Adds pauses for punctuation (e.g., commas → ",,")
-
-
Audio Generation:
-
Uses Coqui-TTS in a Docker container
-
Converts text to WAV → MP3 with tempo adjustment
-
-
Chunk Handling:
-
Splits large files (>20 lines by default)
-
Merges chunks into a single MP3
-
-
VCTK: Multi-speaker English (113 speakers, various accents)
-
LJSpeech: High-quality single-speaker English (via Tacotron2)
MIT License. See LICENSE for details.
Warning
This is a work in progress. Error handling is minimal, and results may vary with complex Markdown.