v0.6 — Speech I/O Settings schema (audio foundation) by magicnight · Pull Request #41 · magicnight/Mac-MLX

magicnight · 2026-05-10T13:06:55Z

Summary

Schema-only foundation for v0.6 speech I/O. No runtime audio yet — this is just the persistence layer the upcoming `STTService` / `TTSService` actors will read.

Plan: `docs/superpowers/plans/2026-05-10-v0.6.md`.

What lands

Five new `Settings` fields, audio-off defaults:

Field	Type	Default	Notes
`audioEnabled`	`Bool`	`false`	master toggle (mic + playback off until set)
`sttModel`	`String?`	nil	whisper-small / medium / large-v3 / fun-asr
`ttsModel`	`String?`	nil	marvis / chatterbox / cosyvoice2
`ttsVoice`	`String?`	nil	id passed to TTS; voice cloning via `~/.mac-mlx/audio/voices/.wav`
`ttsAutoSpeak`	`Bool`	`false`	auto-play assistant replies (off by default)

Backwards-compatible decode: pre-v0.6 `settings.json` files (which have none of these keys) load unchanged via `decodeIfPresent` so existing installs upgrade without surprise mic permission prompts.

Test plan

`swift test --package-path MacMLXCore --filter Settings` — 5/5 prior + 3 new green
Default state (audio off, no models)
JSON round-trip with audio fields populated
Legacy v0.5-shape JSON decodes with audio off
Local `xcodebuild macMLX` — green

What's deferred to v0.6 implementation

`mlx-swift-audio` SPM dependency
`STTService` actor (Whisper / Fun-ASR wrapper)
`TTSService` actor (Marvis / Chatterbox / CosyVoice 2 wrapper)
`AVAudioEngine` mic capture + silence detection
Push-to-talk button in `ChatInputView`
Speaker buttons on `ChatMessageView` (assistant role)
AudioSettingsSection in Settings tab
Voice-clone reference recording UI

🤖 Generated with Claude Code

Schema-only foundation. No runtime audio yet — this is just the persistence layer the upcoming STTService / TTSService work will read. Five new Settings fields, audio-off defaults: - audioEnabled : Bool master toggle (mic + playback off until set) - sttModel : String? whisper-small / medium / large-v3 / fun-asr - ttsModel : String? marvis / chatterbox / cosyvoice2 - ttsVoice : String? id passed to TTS; voice cloning via ~/.mac-mlx/audio/voices/<name>.wav - ttsAutoSpeak : Bool auto-play assistant replies (off by default) Backwards-compatible decode: pre-v0.6 settings.json files (which have none of these keys) load unchanged via decodeIfPresent so existing installs upgrade without surprise mic permission prompts. 3 new SettingsAudioTests cover default state, JSON round-trip, and legacy-JSON decode against a hand-written v0.5-shape payload. 136/136 Core tests green; local Xcode app build green. Subsequent v0.6 work (mlx-swift-audio SPM dep, STTService / TTSService actors, mic capture, push-to-talk button, speaker buttons on chat bubbles) is tracked in docs/superpowers/plans/2026-05-10-v0.6.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

magicnight force-pushed the feat/v0.6-audio-foundation branch from c029b2a to a2e8d0d Compare May 10, 2026 13:24

magicnight force-pushed the feat/v0.6-audio-foundation branch from a2e8d0d to 6856665 Compare May 10, 2026 13:37

magicnight merged commit d215590 into main May 10, 2026
2 checks passed

magicnight deleted the feat/v0.6-audio-foundation branch May 10, 2026 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6 — Speech I/O Settings schema (audio foundation)#41

v0.6 — Speech I/O Settings schema (audio foundation)#41
magicnight merged 1 commit into
mainfrom
feat/v0.6-audio-foundation

magicnight commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

magicnight commented May 10, 2026

Summary

What lands

Test plan

What's deferred to v0.6 implementation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant