Skip to content

nick1udwig/st

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

st

st is a Go CLI for speech-to-text (stt) and text-to-speech (tts) with pluggable provider integrations.

This repository currently includes:

  • OpenAI provider (official Go SDK)
  • Batch and streaming transcription
  • ffmpeg fallback conversion for unsupported input file extensions
  • Disk-backed TOML config at ~/.st/config.toml

Install

go build -o st ./cmd/st

Initialize config

./st config init

This creates ~/.st/config.toml.

Set your API key either:

  • In config: openai.api_key = "..."
  • Via env var (default): OPENAI_API_KEY

Usage

Transcribe audio (batch)

./st stt ./audio.mp3
./st stt ./audio.wav -o transcript.txt

Transcribe audio (streaming)

./st stt ./audio.mp3 --stream

Synthesize speech from text file

./st tts ./script.txt -o speech.mp3

Synthesize speech from raw text

./st tts -t "hello? can you hear me?" > speech.mp3
./st tts -t "hello? can you hear me?" -o speech.mp3

Provider architecture

Providers implement internal/providers.Provider and register themselves via providers.Register(name, factory).

Add another provider by:

  1. Implementing the interface in a new package under internal/providers/<name>
  2. Registering it in init()
  3. Adding provider-specific config section support

Notes

  • OpenAI upload limit is 25 MB per transcription request.
  • If an input extension is unsupported, st attempts conversion via ffmpeg to wav automatically.
  • Without -o, commands write to stdout.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors