st is a Go CLI for speech-to-text (stt) and text-to-speech (tts) with pluggable provider integrations.
This repository currently includes:
- OpenAI provider (official Go SDK)
- Batch and streaming transcription
- ffmpeg fallback conversion for unsupported input file extensions
- Disk-backed TOML config at
~/.st/config.toml
go build -o st ./cmd/st./st config initThis creates ~/.st/config.toml.
Set your API key either:
- In config:
openai.api_key = "..." - Via env var (default):
OPENAI_API_KEY
./st stt ./audio.mp3
./st stt ./audio.wav -o transcript.txt./st stt ./audio.mp3 --stream./st tts ./script.txt -o speech.mp3./st tts -t "hello? can you hear me?" > speech.mp3
./st tts -t "hello? can you hear me?" -o speech.mp3Providers implement internal/providers.Provider and register themselves via providers.Register(name, factory).
Add another provider by:
- Implementing the interface in a new package under
internal/providers/<name> - Registering it in
init() - Adding provider-specific config section support
- OpenAI upload limit is 25 MB per transcription request.
- If an input extension is unsupported,
stattempts conversion via ffmpeg towavautomatically. - Without
-o, commands write to stdout.