Audio Reply Skill for Claude Code

A Claude Code skill that generates spoken audio responses using local TTS. Runs entirely on your machine with MLX Audio - no cloud API calls, no usage limits.

Features

"read it to me [URL]" - Fetches public web content from a URL and reads it aloud
"talk to me [topic]" - Generates a conversational spoken response
"speak" / "say it" / "voice reply" - Converts any response to audio

Requirements

macOS with Apple Silicon (M1/M2/M3/M4)
Claude Code CLI
uv package manager

Installation

Install uv (prefer package manager install):
```
brew install uv
```
If Homebrew is unavailable, use the official Astral installer from astral.sh.

Copy the skill to your Claude Code skills directory:

mkdir -p ~/.claude/skills/audio-reply
cp SKILL.md ~/.claude/skills/audio-reply/

First run will auto-download the model (~500MB):

uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Hello, testing audio." \
  --play

Usage

Once installed, just use natural language in Claude Code:

> read it to me https://example.com/article
> talk to me about what's new in Python 3.12
> explain quantum computing, then speak it

Security Notes

URL fetch mode is for public http(s) pages only.
Local/private targets (localhost, RFC1918 ranges, loopback, link-local, .local) should not be fetched.
Do not use private/authenticated/signed URLs; prefer redacted public links or pasted text.
The skill runs uv locally for TTS playback; install uv from a trusted source and verify with command -v uv && uv --version.
Audio artifacts are temporary and cleaned up, but content/audio may still be retained in chat history by your client.

How It Works

Claude Code detects trigger phrases ("read it to me", "talk to me", etc.)
For URLs: validates safety constraints, then fetches and extracts readable content
Generates natural conversational text
Runs MLX Audio TTS locally on your Mac's Neural Engine
Plays audio and cleans up temp files automatically

Model

This skill uses chatterbox-turbo-fp16 - a fast, natural-sounding TTS model optimized for Apple Silicon via MLX.

Speed: ~120 tokens/second on M-series chips
Quality: Natural conversational voice with emotion support
Size: ~500MB download (cached after first use)

Links

MLX Audio - The TTS framework powering this skill
Chatterbox Model - The voice model on Hugging Face
MLX - Apple's ML framework for Apple Silicon

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Reply Skill for Claude Code

Features

Requirements

Installation

Usage

Security Notes

How It Works

Model

Links

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Audio Reply Skill for Claude Code

Features

Requirements

Installation

Usage

Security Notes

How It Works

Model

Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages