TTS

Convert a markdown file into an audio narration using VCTK and Coqui-TT

A Bash script that converts Markdown files into narrated audio using Text-to-Speech (TTS) via Docker and the Coqui-TTS framework.

Features

Converts a Markdown (.md) file to MP3 audio.
Uses the VCTK multi-speaker TTS model with customizable voices.
Cleans Markdown syntax (headings, links, code blocks, etc.) for more natural-sounding speech.
Splits large files into manageable chunks for processing.
Optional playback of the generated audio.

Prerequisites

Docker (with GPU support recommended for faster synthesis)
Docker Compose
Python 3 (for model invocation and Markdown cleaning)
ffmpeg (for audio processing)
mp3wrap (for merging audio chunks)
str (for string manipulation)

Installation

Clone the repository:

git clone https://github.com/ngpepin/TTS.git
cd narrate-md

Make the script executable:
```
chmod +x narrate-md.sh
```

(Optional) Place the script in your PATH for global access:

sudo ln -s $(pwd)/narrate-md.sh /usr/local/bin/narrate-md

Usage

   ./narrate-md.sh [-p] <input.md>

Options

| Flag | Description                          |
| ---- | ------------------------------------ |
| `-p` | Play the generated audio immediately |

Example

./narrate-md.sh -p README.md  # Converts README.md to README.mp3 and plays it

Configuration

Edit these variables in the script to customize behavior:

MODEL="tts_models/en/vctk/vits"  # TTS model (default: VCTK multi-speaker)
SPEAKER="p230"                   # Speaker ID (e.g., p230-p260 for VCTK)
MAX_LINES_PER_CHUNK=20           # Split large files into chunks of this size

Technical Details

Pipeline

Markdown Cleaning:
- Strips headings, links, code blocks, and formatting
- Adds pauses for punctuation (e.g., commas → ",,")
Audio Generation:
- Uses Coqui-TTS in a Docker container
- Converts text to WAV → MP3 with tempo adjustment
Chunk Handling:
- Splits large files (>20 lines by default)
- Merges chunks into a single MP3

Supported Models

VCTK: Multi-speaker English (113 speakers, various accents)
LJSpeech: High-quality single-speaker English (via Tacotron2)

License

MIT License. See LICENSE for details.

Warning
This is a work in progress. Error handling is minimal, and results may vary with complex Markdown.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
2024-08-26-Ethical_Hacking_Hat_Colors.log		2024-08-26-Ethical_Hacking_Hat_Colors.log
2025-07-13-Types_of_Aphasia_Summary.log		2025-07-13-Types_of_Aphasia_Summary.log
README.md		README.md
docker-compose.yml		docker-compose.yml
narrate-md.sh		narrate-md.sh
narrate-md_bak.sh		narrate-md_bak.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTS

Convert a markdown file into an audio narration using VCTK and Coqui-TT

Features

Prerequisites

Installation

Usage

Options

Example

Configuration

Technical Details

Pipeline

Supported Models

License

About

Uh oh!

Releases

Packages

Languages

ngpepin/POC-for-Coqui-driven-Text-to-Speach

Folders and files

Latest commit

History

Repository files navigation

TTS

Convert a markdown file into an audio narration using VCTK and Coqui-TT

Features

Prerequisites

Installation

Usage

Options

Example

Configuration

Technical Details

Pipeline

Supported Models

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages