vox — Multi-Provider AI Multimedia CLI

A Rust CLI that provides a unified interface to multiple AI providers (StepFun, MiniMax) for text, image, speech, video, music, search, and vision capabilities.

Features

Multi-provider: StepFun and MiniMax with shared OpenAI-compatible base adapter
7 capabilities: text chat/completion, image generation, speech synthesis, video generation, music generation, web search, vision/image understanding
Interactive REPL: vox text repl for multi-turn chat with history
Provider management: vox provider add/remove/list/status
Model management: vox models list/set per capability per provider
Diagnostics: vox doctor to check config, connectivity, and auth
Auto-retry: exponential backoff (3 retries) on transient failures
Config migration: auto-upgrades old model names and API URLs
JSON output: --format json for scripting
Shell completion: vox completion bash|zsh|fish|elvish

Install

# From crates.io
cargo install vox-ai

# Or build from source
git clone https://github.com/huangcheng/vox.git
cd vox
cargo build --release
# Binary at target/release/vox

Quick Start

# Add your API key
vox provider add stepfun YOUR_API_KEY
vox provider add minimax YOUR_API_KEY

# Chat
vox text chat --message "Explain Rust ownership"
vox --provider minimax text chat --message "Hello"

# Generate image
vox image generate "A cat in space" --output cat.png

# Speech synthesis
vox speech generate --text "Hello world" --voice cixingnansheng --out hello.mp3

# Web search
vox search query "Rust programming language"

# Vision
vox vision analyze photo.jpg --prompt "What's in this image?"

# Launch TUI mode
vox tui

Configuration

Config file location:

macOS/Linux: ~/.config/vox/config.toml
Windows: %APPDATA%\vox\config.toml

Example (config.example.toml):

provider = "stepfun"

[stepfun]
api_key = "sk-your-api-key-here"

[minimax]
api_key = "your-minimax-api-key-here"

Provider Details

Provider	Base URL	Chat Model	Speech Model
StepFun	`https://api.stepfun.com/v1`	step-1-8k	stepaudio-2.5-tts
MiniMax	`https://api.minimaxi.com/v1`	MiniMax-M2.7	speech-2.8-hd

Model Override

[minimax]
api_key = "..."
model = "MiniMax-M2.7-highspeed"  # Override default chat model

Or per-capability:

vox models set speech speech-2.8-hd
vox models list --provider stepfun

CLI Reference

vox [OPTIONS] [COMMAND]

Commands:
  text        Text generation and chat
  image       Image generation
  speech      Speech synthesis (TTS)
  video       Video generation
  music       Music generation
  search      Web search
  vision      Image understanding
  doctor      Run diagnostics
  provider    Manage providers
  models      Manage models
  config      Manage configuration
  completion  Shell completion script
  tui         Launch terminal UI

Options:
  --provider <PROVIDER>      Provider (minimax, stepfun)
  --model <MODEL>            Model name override
  --format <FORMAT>          Output format (text, json) [default: text]
  --output-dir <DIR>         Default output directory
  --config <PATH>            Config file path
  --quiet                    Suppress progress output
  --verbose                  Debug output

Architecture

src/
  providers/
    mod.rs       AIProvider trait, RetryProvider, factory
    openai.rs    Shared OpenAI-compatible HTTP client
    stepfun.rs   StepFun adapter (~200 LOC)
    minimax.rs   MiniMax adapter (~230 LOC)
  config.rs      Config, migration, provider/model management
  cli.rs         Clap CLI definitions
  app.rs         Command dispatch
  capabilities.rs  Per-provider capability flags
  models.rs      Static model registry

The AIProvider trait defines capabilities (chat, image_generate, speech_synthesize, etc.). The shared OpenAIClient provides default implementations for OpenAI-compatible endpoints — providers only override unique APIs.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
.slim		.slim
docs		docs
skills/vox-cli		skills/vox-cli
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
IMPROVEMENT_PLAN.md		IMPROVEMENT_PLAN.md
README.md		README.md
README_CN.md		README_CN.md
TUI_DESIGN.md		TUI_DESIGN.md
codemap.md		codemap.md
config.example.toml		config.example.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vox — Multi-Provider AI Multimedia CLI

Features

Install

Quick Start

Configuration

Provider Details

Model Override

CLI Reference

Architecture

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vox — Multi-Provider AI Multimedia CLI

Features

Install

Quick Start

Configuration

Provider Details

Model Override

CLI Reference

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages