Skip to content

jmponcebe/md-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

md-tts

Listen to technical Markdown out loud, with interactive pauses on code blocks.

CI Python License: MIT Style: ruff

md-tts reads a Markdown file aloud and stops on every code block, table and flashcard so you can actually look at the screen and study. It recognises <details><summary>Q</summary>A</details> blocks as flashcards (question → wait → answer) and detects the dominant language of the document (Spanish or English) to pick a single TTS voice for the whole session.

A --no-pause "podcast mode" is included for when you just want continuous playback in the background (commute, gym): instead of waiting on code blocks, it announces them and moves on.

Why this exists

Existing TTS tools for Markdown either:

  • treat code blocks as silence and skip them, leaving the listener confused about what just happened;
  • read code character-by-character as if it were prose (open-paren-self-comma-x), which is unusable; or
  • support SSML pauses but not interactive pauses where playback waits for the listener.

After testing 8+ tools (Speechify, NaturalReader, Study MD Desk, VoxTrack and several SSML-based pipelines) nothing offered the combination of parse Markdown structure → speak prose → stop on code → wait for me. md-tts is a small Python CLI that does exactly that.

It is intentionally minimal. It targets developers who want to revise their own technical notes while away from the keyboard.

Features

  • 🛑 Interactive pauses on code blocks and tables.
  • 🎴 Flashcard mode for <details><summary>Q</summary>A</details> (speak Q, wait, speak A).
  • 🌍 ES/EN dominant-language detection: the parser picks a single session voice based on the document’s dominant language. Per-paragraph voice switching was tried and proved unstable on SAPI5; it lives in the roadmap.
  • 🎧 Podcast mode (--no-pause) that announces skipped blocks in the chosen language instead of waiting.
  • 🔊 Cross-platform TTS via pyttsx4 (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux). No cloud account, no API key.
  • 🌐 Optional Edge neural voices (--backend edge): natural-sounding Microsoft voices, picks a voice per paragraph based on the detected language. Requires internet.
  • 🧪 Unit tested on Python 3.11 / 3.12 / 3.13 (see CI).

Installation

pip install md-tts

This gives you the offline pyttsx4 backend (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux) plus the Markdown parser and interactive CLI — no API keys, no internet.

On Linux you also need espeak installed at the system level: sudo apt-get install espeak libespeak1.

Optional extras

Install Adds When to use
pip install md-tts base only Local TTS playback, no neural voices.
pip install "md-tts[edge]" edge-tts + pygame Microsoft Edge neural voices (--backend edge) and real pause/resume during playback.
pip install "md-tts[export]" edge-tts only MP3 export (--export out.mp3) on headless or mobile environments where pygame/SDL2 is hard to install.

Termux / Android

pygame requires SDL2 native libraries and is painful to build on Termux. If you only want to generate MP3 files and listen to them with a regular Android audio player, use the [export] extra:

pkg install python
pip install "md-tts[export]"
md-tts notes.md --backend edge --export notes.mp3

That avoids pygame entirely. For local playback on Termux with pyttsx4 (no Edge), also install pkg install espeak. Interactive controls (SPACE pause, +/- rate) work only with the [edge] extra, which is not recommended on Termux.

From source (development)

git clone https://github.com/jmponcebe/md-tts.git
cd md-tts
uv sync --extra dev      # or: pip install -e ".[dev]"

Usage

# Default: interactive — ENTER skips each code block / table / flashcard.
md-tts notes.md

# Podcast mode: never wait, just announce skipped blocks.
md-tts notes.md --no-pause

# Force a language (no auto-detect):
md-tts notes.md --lang es

# Force a specific voice by id (use --list-voices to discover them):
md-tts notes.md --voice "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ES-ES_HELENA_11.0"

# Tune speed:
md-tts notes.md --rate 220

# Inspect voices available on this system (path is optional with this flag):
md-tts --list-voices

# Switch to Microsoft Edge neural voices (requires internet, sounds much better):
md-tts notes.md --backend edge

# Edge mode auto-picks an es-ES voice for Spanish paragraphs and en-US for
# English ones, so a bilingual document is read with the right voice per
# paragraph automatically.
md-tts notes.md --backend edge

# Inspect the Edge voice catalogue:
md-tts --backend edge --list-voices

# Export to MP3 for offline listening (commute, gym, etc.):
md-tts notes.md --backend edge --export notes.mp3
# Code blocks become short "skipping code block" announcements; <details>
# cards keep their question + 3 s silence + answer pattern.

You can also run the module directly:

python -m md_tts notes.md

Markdown features supported

Markdown construct Behaviour
Headings Spoken with Chapter: / Section: prefix (or Capítulo: in Spanish).
Paragraphs Spoken as prose.
Inline code ` ` Quoted in the spoken output (e.g. 'git status') so it’s audibly distinct from prose.
Fenced code blocks Pause + print to terminal.
Tables Pause + print rows.
Inline images Announced inline as [imagen: alt].
Lists Spoken as Punto 1: ..., Punto 2: ... (Spanish prefix used in both languages currently).
Block quotes Prefixed with Cita:.
HR (---) Spoken as Separador..
<details><summary>Q</summary>A</details> Flashcard: speak Q, wait for ENTER, speak A.

Math blocks ($$ ... $$) and standalone image blocks are not yet detected as pause points — they fall through as text. Adding them is on the roadmap.

Architecture

.md file
   │
   ▼
parser.parse_markdown(text)         → Iterator[Block]
   │                                  kind ∈ {text, code, table, card}
   ▼
cli.run()                           ← argparse + interactive loop
   │
   ▼
reader.build_reader(backend)        → LocalReader (pyttsx4) or EdgeReader (edge-tts)

Three modules plus per-backend implementations, ~700 lines total. The parser builds on top of markdown-it-py and pre-processes <details> HTML blocks with a regex/placeholder trick before parsing, because markdown-it treats raw HTML as opaque tokens.

The local backend uses pyttsx4 (a maintained fork of pyttsx3) because pyttsx3 2.99 exhibits a SAPI5 bug on Windows where only the first runAndWait() call produces audio. The edge backend uses edge-tts to call Microsoft Edge's neural voices over HTTPS (no Microsoft account, no API key) and plays the resulting MP3 with pygame.mixer.music, which exposes real pause/unpause cross-platform (SDL_mixer under the hood) — that's what enables the SPACE control during a paragraph. Per-paragraph voice switching works on edge because each utterance is an independent HTTP request, with no shared engine state to corrupt.

Roadmap

  • Interactive controls during playback (SPACE / s / n / b / +/- / q) — v0.3
  • Optional cloud-quality TTS backend (Microsoft Edge neural voices) — v0.2
  • Rewind / skip-back during interactive mode — v0.3
  • MP3 export of an entire document for offline mobile listening — v0.4
  • PyPI release (pip install md-tts)
  • Math blocks ($$ ... $$) detected as pause points instead of being read as prose
  • Standalone image blocks announced as [image: alt-text] instead of being silently flattened
  • Bookmarks: persist a per-document position so --resume picks up where you left off
  • --chapter flag to start playback from a specific heading
  • Real-time rate change (requires a streaming pitch-preserving resampler — non-trivial)
  • More backends: Piper (local neural, fast, free), Azure TTS (premium voices via API key)

Development

uv sync --extra dev          # install dev extras (pytest, pytest-cov, ruff)
uv run pytest                # 48 tests
uv run ruff check .
uv run ruff format .

Conventional commits, feature branches off main, squash-merge by default. See .github/copilot-instructions.md for the full contributor guide.

License

MIT — see LICENSE.

Author

Jose María Ponce Bernabé. Built as a side-project while studying for AI / Data Engineering interviews — needed a way to revise PharmaGraphRAG and DengueMLOps notes during commutes.

About

Text-to-speech for technical Markdown, with interactive pauses on code blocks

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages