Sonica

GPU-accelerated audio visualizer video generator.

Takes an audio file, runs FFT analysis, renders visualizations with GPU shaders, and outputs an MP4 video with the original audio. Any resolution, any aspect ratio — landscape, portrait, square.

Requirements

Rust 1.70+
ffmpeg (must be in PATH)
macOS (Metal) — tested
Linux (Vulkan) / Windows (DX12) — should work but untested

Install

cargo install --git https://github.com/rath/sonica

All templates and shaders are embedded in the binary, so no additional files are needed.

With subtitle support

cargo install --git https://github.com/rath/sonica --features subtitles

This adds speech-to-text subtitle overlay via whisper.cpp. Whisper models are automatically downloaded on first use.

Build from source

git clone https://github.com/rath/sonica
cd sonica
cargo build --release
# Or with subtitle support:
cargo build --release --features subtitles

Usage

# Basic — generates output.mp4 with frequency bars + default effects
sonica audio.wav

# Specify output and template
sonica audio.wav -o visualizer.mp4 -t circular_spectrum

# Cycle through all templates (equal duration each)
sonica audio.wav -t all --effects crt

# CRT retro style
sonica audio.wav --effects crt

# Portrait video for Instagram/TikTok (9:16)
sonica audio.wav -t circular_spectrum --width 1080 --height 1920

# Square video for Instagram feed (1:1)
sonica audio.wav -t particle_burst --width 1080 --height 1080

# 4K 60fps high quality
sonica track.flac -t kaleidoscope --width 3840 --height 2160 --fps 60 --crf 12

# Hardware encoding on macOS
sonica audio.wav --codec h264_videotoolbox --pix-fmt nv12

# Korean title with Google Noto Sans KR
sonica audio.wav --title "안녕하세요, SONICA" --font-url "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"

# 실제 동작되는 Noto Sans KR TTF/OTF URL 예시
sonica audio.wav --title "안녕하세요" --font-url "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"

# 로컬 폰트 파일 경로 예시 (macOS)
sonica audio.wav --title "안녕하세요" --font "/System/Library/Fonts/Supplemental/NotoSansCJK-Regular.ttc"

# Speech-to-text subtitles (requires --features subtitles)
sonica audio.wav -o output.mp4 --subtitles

# Subtitles with specific language and model
sonica audio.wav -o output.mp4 --subtitles --whisper-model small --subtitle-lang ko

# Customize subtitle appearance
sonica audio.wav -o output.mp4 --subtitles --subtitle-font-size 64 --subtitle-max-chars 30

# List available templates
sonica --list-templates

Templates

circular_spectrum

Radial spectrum analyzer with beat-reactive radius

demo-circular_spectrum.mp4

particle_burst

Beat-driven particle system

demo-particle_burst.mp4

waveform_scope

PCM waveform oscilloscope with glow

frequency_bars

Classic equalizer bars with log frequency mapping

kaleidoscope

Audio-reactive fractal kaleidoscope

spectrogram

Scrolling time-frequency heatmap

all

Cycle through all templates, equal duration each.

Effects

Post-processing effects can be combined with --effects:

# Single effect
sonica audio.wav --effects bloom

# Multiple effects
sonica audio.wav --effects bloom,vignette,chromatic_aberration

# CRT preset (scanlines + chromatic aberration + vignette + film grain + color grading)
sonica audio.wav --effects crt

Available effects: bloom, chromatic_aberration, vignette, film_grain, crt_scanlines, color_grading

When --effects is not specified, each template uses its own default effects.

Subtitles

Speech-to-text subtitle overlay using local whisper.cpp inference. Requires building with --features subtitles.

# Auto-detect language, base model (downloaded automatically)
sonica audio.wav --subtitles

# Korean speech with small model for better accuracy
sonica audio.wav --subtitles --whisper-model small --subtitle-lang ko

# Use a local model file
sonica audio.wav --subtitles --whisper-model /path/to/ggml-large-v3-turbo.bin

Available models: tiny, base, small, medium, large (and .en English-only variants). Models are cached at ~/.cache/sonica/models/ after first download.

Subtitles are rendered with a semi-transparent black background at the bottom center of the video. Font size and line wrapping can be adjusted with --subtitle-font-size and --subtitle-max-chars.

CLI Reference

sonica [OPTIONS] [INPUT]

Arguments:
  [INPUT]  Input audio file (WAV, MP3, FLAC, OGG)

Options:
  -o, --output <PATH>        Output video file [default: output.mp4]
  -t, --template <NAME>      Template name, or "all" to cycle [default: frequency_bars]
  -b, --bitrate <RATE>       Video bitrate (e.g. 2400k, 5M), overrides --crf
      --width <PX>           Video width [default: 1920]
      --height <PX>          Video height [default: 1080]
      --fps <N>              Frames per second [default: 30]
      --crf <N>              H.264 quality, 0-51, lower=better [default: 18]
      --effects <LIST>       Post-processing effects, comma-separated (use "none" to disable)
      --smoothing <F>        Audio smoothing factor, 0.0-1.0 [default: 0.85]
      --title <TEXT>         Title text overlay (top right)
      --font <PATH>          Font file for title/time overlay (TTF/OTF)
      --font-url <URL>       Font URL for title/time overlay (TTF/OTF or Google Fonts URL)
      --show-time            Show elapsed time overlay, MM:SS.CC (bottom right)
      --param <KEY=VALUE>    Template parameter overrides, comma-separated
      --config <PATH>        Config file path [default: ./sonica.toml]
      --codec <NAME>         FFmpeg video codec [default: libx264]
      --pix-fmt <FMT>        FFmpeg pixel format [default: yuv420p]
      --list-templates       List available templates and exit
      --subtitles            Enable speech-to-text subtitles (requires --features subtitles)
      --whisper-model <M>    Whisper model name or file path [default: base]
      --subtitle-lang <L>    Subtitle language, ISO 639-1 (e.g. "en", "ko"). Auto-detect if omitted
      --subtitle-font-size <PX>  Subtitle font size [default: 48]
      --subtitle-max-chars <N>   Max characters per subtitle line [default: 42]
  -h, --help                 Print help

Template Parameters

Each template defines configurable parameters (bar count, colors, etc.) with --param:

# Change bar count and disable mirroring
sonica audio.wav -t frequency_bars --param bar_count=128,mirror=false,gap_ratio=0.1

# Kaleidoscope with 8-fold symmetry
sonica audio.wav -t kaleidoscope --param symmetry=8,zoom=2.0

# More particles
sonica audio.wav -t particle_burst --param particle_count=500

Use --list-templates to see available templates. Check each template's manifest.json for parameter definitions.

Configuration File

Sonica loads config from the first file found in this order:

--config <path> (explicit)
./sonica.toml (current directory)
~/.config/sonica/config.toml (global, works on all platforms)
Platform-specific config dir (~/Library/Application Support on macOS)

CLI flags always take priority over config values. See sonica.toml.example for a full annotated example.

[output]
width = 1280
height = 720
fps = 60
crf = 15
codec = "libx264"
pix_fmt = "yuv420p"
font = "/path/to/NotoSansKR-Regular.otf"
font_url = "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"

[audio]
smoothing = 0.9

effects = ["bloom", "vignette"]

[subtitle]
whisper_model = "base"
language = "ko"
font_size = 48.0
max_chars_per_line = 42

For Korean text, use a font that includes CJK glyphs (for example NotoSansKR-Regular.otf from Google Fonts) via --font, --font-url, or font / font_url in the config.

Supported Audio Formats

WAV, MP3, FLAC, OGG/Vorbis, AAC — via symphonia.

How It Works

Decode audio to mono PCM samples
Transcribe speech to timed subtitles via whisper.cpp (optional)
Analyze in 3 passes:
- Global stats (peak levels, beat detection, tempo)
- Per-frame FFT with frequency band extraction (parallelized)
- Bidirectional smoothing and normalization
Render each frame on GPU via wgpu (Metal/Vulkan) with WGSL shaders
Post-process through a chain of effect shaders
Overlay title, time, and subtitles on CPU
Encode by piping raw RGBA frames to ffmpeg

Performance

On Apple M2 Max, 100 seconds of audio:

Resolution	Effects	Time	Speed
1280x720	none	~8s	12x realtime
1920x1080	CRT (5 passes)	~43s	2.3x realtime

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
shaders		shaders
src		src
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
sonica.toml.example		sonica.toml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sonica

Requirements

Install

With subtitle support

Build from source

Usage

Templates

circular_spectrum

particle_burst

waveform_scope

frequency_bars

kaleidoscope

spectrogram

all

Effects

Subtitles

CLI Reference

Template Parameters

Configuration File

Supported Audio Formats

How It Works

Performance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sonica

Requirements

Install

With subtitle support

Build from source

Usage

Templates

circular_spectrum

particle_burst

waveform_scope

frequency_bars

kaleidoscope

spectrogram

all

Effects

Subtitles

CLI Reference

Template Parameters

Configuration File

Supported Audio Formats

How It Works

Performance

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages