A Claude Code skill that generates spoken audio responses using local TTS. Runs entirely on your machine with MLX Audio - no cloud API calls, no usage limits.
- "read it to me [URL]" - Fetches public web content from a URL and reads it aloud
- "talk to me [topic]" - Generates a conversational spoken response
- "speak" / "say it" / "voice reply" - Converts any response to audio
- macOS with Apple Silicon (M1/M2/M3/M4)
- Claude Code CLI
- uv package manager
-
Install uv (prefer package manager install):
brew install uv
If Homebrew is unavailable, use the official Astral installer from astral.sh.
-
Copy the skill to your Claude Code skills directory:
mkdir -p ~/.claude/skills/audio-reply cp SKILL.md ~/.claude/skills/audio-reply/
-
First run will auto-download the model (~500MB):
uv run mlx_audio.tts.generate \ --model mlx-community/chatterbox-turbo-fp16 \ --text "Hello, testing audio." \ --play
Once installed, just use natural language in Claude Code:
> read it to me https://example.com/article
> talk to me about what's new in Python 3.12
> explain quantum computing, then speak it
- URL fetch mode is for public
http(s)pages only. - Local/private targets (
localhost, RFC1918 ranges, loopback, link-local,.local) should not be fetched. - Do not use private/authenticated/signed URLs; prefer redacted public links or pasted text.
- The skill runs
uvlocally for TTS playback; installuvfrom a trusted source and verify withcommand -v uv && uv --version. - Audio artifacts are temporary and cleaned up, but content/audio may still be retained in chat history by your client.
- Claude Code detects trigger phrases ("read it to me", "talk to me", etc.)
- For URLs: validates safety constraints, then fetches and extracts readable content
- Generates natural conversational text
- Runs MLX Audio TTS locally on your Mac's Neural Engine
- Plays audio and cleans up temp files automatically
This skill uses chatterbox-turbo-fp16 - a fast, natural-sounding TTS model optimized for Apple Silicon via MLX.
- Speed: ~120 tokens/second on M-series chips
- Quality: Natural conversational voice with emotion support
- Size: ~500MB download (cached after first use)
- MLX Audio - The TTS framework powering this skill
- Chatterbox Model - The voice model on Hugging Face
- MLX - Apple's ML framework for Apple Silicon
MIT