md-tts

Listen to technical Markdown out loud, with interactive pauses on code blocks.

md-tts reads a Markdown file aloud and stops on every code block, table and flashcard so you can actually look at the screen and study. It recognises <details><summary>Q</summary>A</details> blocks as flashcards (question → wait → answer) and detects the dominant language of the document (Spanish or English) to pick a single TTS voice for the whole session.

A --no-pause "podcast mode" is included for when you just want continuous playback in the background (commute, gym): instead of waiting on code blocks, it announces them and moves on.

Why this exists

Existing TTS tools for Markdown either:

treat code blocks as silence and skip them, leaving the listener confused about what just happened;
read code character-by-character as if it were prose (open-paren-self-comma-x), which is unusable; or
support SSML pauses but not interactive pauses where playback waits for the listener.

After testing 8+ tools (Speechify, NaturalReader, Study MD Desk, VoxTrack and several SSML-based pipelines) nothing offered the combination of parse Markdown structure → speak prose → stop on code → wait for me. md-tts is a small Python CLI that does exactly that.

It is intentionally minimal. It targets developers who want to revise their own technical notes while away from the keyboard.

Features

🛑 Interactive pauses on code blocks and tables.
🎴 Flashcard mode for <details><summary>Q</summary>A</details> (speak Q, wait, speak A).
🌍 ES/EN dominant-language detection: the parser picks a single session voice based on the document’s dominant language. Per-paragraph voice switching was tried and proved unstable on SAPI5; it lives in the roadmap.
🎧 Podcast mode (--no-pause) that announces skipped blocks in the chosen language instead of waiting.
🔊 Cross-platform TTS via pyttsx4 (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux). No cloud account, no API key.
🌐 Optional Edge neural voices (--backend edge): natural-sounding Microsoft voices, picks a voice per paragraph based on the detected language. Requires internet.
🧪 Unit tested on Python 3.11 / 3.12 / 3.13 (see CI).

Installation

pip install md-tts

This gives you the offline pyttsx4 backend (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux) plus the Markdown parser and interactive CLI — no API keys, no internet.

On Linux you also need espeak installed at the system level: sudo apt-get install espeak libespeak1.

Optional extras

Install	Adds	When to use
`pip install md-tts`	base only	Local TTS playback, no neural voices.
`pip install "md-tts[edge]"`	`edge-tts` + `pygame`	Microsoft Edge neural voices (`--backend edge`) and real pause/resume during playback.
`pip install "md-tts[export]"`	`edge-tts` only	MP3 export (`--export out.mp3`) on headless or mobile environments where `pygame`/SDL2 is hard to install.

Termux / Android

pygame requires SDL2 native libraries and is painful to build on Termux. If you only want to generate MP3 files and listen to them with a regular Android audio player, use the [export] extra:

pkg install python
pip install "md-tts[export]"
md-tts notes.md --backend edge --export notes.mp3

That avoids pygame entirely. For local playback on Termux with pyttsx4 (no Edge), also install pkg install espeak. Interactive controls (SPACE pause, +/- rate) work only with the [edge] extra, which is not recommended on Termux.

From source (development)

git clone https://github.com/jmponcebe/md-tts.git
cd md-tts
uv sync --extra dev      # or: pip install -e ".[dev]"

Usage

# Default: interactive — ENTER skips each code block / table / flashcard.
md-tts notes.md

# Podcast mode: never wait, just announce skipped blocks.
md-tts notes.md --no-pause

# Force a language (no auto-detect):
md-tts notes.md --lang es

# Force a specific voice by id (use --list-voices to discover them):
md-tts notes.md --voice "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ES-ES_HELENA_11.0"

# Tune speed:
md-tts notes.md --rate 220

# Inspect voices available on this system (path is optional with this flag):
md-tts --list-voices

# Switch to Microsoft Edge neural voices (requires internet, sounds much better):
md-tts notes.md --backend edge

# Edge mode auto-picks an es-ES voice for Spanish paragraphs and en-US for
# English ones, so a bilingual document is read with the right voice per
# paragraph automatically.
md-tts notes.md --backend edge

# Inspect the Edge voice catalogue:
md-tts --backend edge --list-voices

# Export to MP3 for offline listening (commute, gym, etc.):
md-tts notes.md --backend edge --export notes.mp3
# Code blocks become short "skipping code block" announcements; <details>
# cards keep their question + 3 s silence + answer pattern.

You can also run the module directly:

python -m md_tts notes.md

Markdown features supported

Markdown construct	Behaviour
Headings	Spoken with `Chapter:` / `Section:` prefix (or `Capítulo:` in Spanish).
Paragraphs	Spoken as prose.
Inline code ` `	Quoted in the spoken output (e.g. `'git status'`) so it’s audibly distinct from prose.
Fenced code blocks	Pause + print to terminal.
Tables	Pause + print rows.
Inline images	Announced inline as `[imagen: alt]`.
Lists	Spoken as `Punto 1: ..., Punto 2: ...` (Spanish prefix used in both languages currently).
Block quotes	Prefixed with `Cita:`.
HR (`---`)	Spoken as `Separador.`.
`<details><summary>Q</summary>A</details>`	Flashcard: speak Q, wait for ENTER, speak A.

Math blocks ($$ ... $$) and standalone image blocks are not yet detected as pause points — they fall through as text. Adding them is on the roadmap.

Architecture

.md file
   │
   ▼
parser.parse_markdown(text)         → Iterator[Block]
   │                                  kind ∈ {text, code, table, card}
   ▼
cli.run()                           ← argparse + interactive loop
   │
   ▼
reader.build_reader(backend)        → LocalReader (pyttsx4) or EdgeReader (edge-tts)

Three modules plus per-backend implementations, ~700 lines total. The parser builds on top of markdown-it-py and pre-processes <details> HTML blocks with a regex/placeholder trick before parsing, because markdown-it treats raw HTML as opaque tokens.

The local backend uses pyttsx4 (a maintained fork of pyttsx3) because pyttsx3 2.99 exhibits a SAPI5 bug on Windows where only the first runAndWait() call produces audio. The edge backend uses edge-tts to call Microsoft Edge's neural voices over HTTPS (no Microsoft account, no API key) and plays the resulting MP3 with pygame.mixer.music, which exposes real pause/unpause cross-platform (SDL_mixer under the hood) — that's what enables the SPACE control during a paragraph. Per-paragraph voice switching works on edge because each utterance is an independent HTTP request, with no shared engine state to corrupt.

Roadmap

Development

uv sync --extra dev          # install dev extras (pytest, pytest-cov, ruff)
uv run pytest                # 48 tests
uv run ruff check .
uv run ruff format .

Conventional commits, feature branches off main, squash-merge by default. See .github/copilot-instructions.md for the full contributor guide.

License

MIT — see LICENSE.

Author

Jose María Ponce Bernabé. Built as a side-project while studying for AI / Data Engineering interviews — needed a way to revise PharmaGraphRAG and DengueMLOps notes during commutes.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
src/md_tts		src/md_tts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
sample.md		sample.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

md-tts

Why this exists

Features

Installation

Optional extras

Termux / Android

From source (development)

Usage

Markdown features supported

Architecture

Roadmap

Development

License

Author

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

md-tts

Why this exists

Features

Installation

Optional extras

Termux / Android

From source (development)

Usage

Markdown features supported

Architecture

Roadmap

Development

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages