GPU-accelerated audio visualizer video generator.
Takes an audio file, runs FFT analysis, renders visualizations with GPU shaders, and outputs an MP4 video with the original audio. Any resolution, any aspect ratio — landscape, portrait, square.
- Rust 1.70+
- ffmpeg (must be in PATH)
- macOS (Metal) — tested
- Linux (Vulkan) / Windows (DX12) — should work but untested
cargo install --git https://github.com/rath/sonicaAll templates and shaders are embedded in the binary, so no additional files are needed.
cargo install --git https://github.com/rath/sonica --features subtitlesThis adds speech-to-text subtitle overlay via whisper.cpp. Whisper models are automatically downloaded on first use.
git clone https://github.com/rath/sonica
cd sonica
cargo build --release
# Or with subtitle support:
cargo build --release --features subtitles# Basic — generates output.mp4 with frequency bars + default effects
sonica audio.wav
# Specify output and template
sonica audio.wav -o visualizer.mp4 -t circular_spectrum
# Cycle through all templates (equal duration each)
sonica audio.wav -t all --effects crt
# CRT retro style
sonica audio.wav --effects crt
# Portrait video for Instagram/TikTok (9:16)
sonica audio.wav -t circular_spectrum --width 1080 --height 1920
# Square video for Instagram feed (1:1)
sonica audio.wav -t particle_burst --width 1080 --height 1080
# 4K 60fps high quality
sonica track.flac -t kaleidoscope --width 3840 --height 2160 --fps 60 --crf 12
# Hardware encoding on macOS
sonica audio.wav --codec h264_videotoolbox --pix-fmt nv12
# Korean title with Google Noto Sans KR
sonica audio.wav --title "안녕하세요, SONICA" --font-url "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"
# 실제 동작되는 Noto Sans KR TTF/OTF URL 예시
sonica audio.wav --title "안녕하세요" --font-url "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"
# 로컬 폰트 파일 경로 예시 (macOS)
sonica audio.wav --title "안녕하세요" --font "/System/Library/Fonts/Supplemental/NotoSansCJK-Regular.ttc"
# Speech-to-text subtitles (requires --features subtitles)
sonica audio.wav -o output.mp4 --subtitles
# Subtitles with specific language and model
sonica audio.wav -o output.mp4 --subtitles --whisper-model small --subtitle-lang ko
# Customize subtitle appearance
sonica audio.wav -o output.mp4 --subtitles --subtitle-font-size 64 --subtitle-max-chars 30
# List available templates
sonica --list-templatesRadial spectrum analyzer with beat-reactive radius
demo-circular_spectrum.mp4
Beat-driven particle system
demo-particle_burst.mp4
PCM waveform oscilloscope with glow
Classic equalizer bars with log frequency mapping
Audio-reactive fractal kaleidoscope
Scrolling time-frequency heatmap
Cycle through all templates, equal duration each.
Post-processing effects can be combined with --effects:
# Single effect
sonica audio.wav --effects bloom
# Multiple effects
sonica audio.wav --effects bloom,vignette,chromatic_aberration
# CRT preset (scanlines + chromatic aberration + vignette + film grain + color grading)
sonica audio.wav --effects crtAvailable effects: bloom, chromatic_aberration, vignette, film_grain, crt_scanlines, color_grading
When --effects is not specified, each template uses its own default effects.
Speech-to-text subtitle overlay using local whisper.cpp inference. Requires building with --features subtitles.
# Auto-detect language, base model (downloaded automatically)
sonica audio.wav --subtitles
# Korean speech with small model for better accuracy
sonica audio.wav --subtitles --whisper-model small --subtitle-lang ko
# Use a local model file
sonica audio.wav --subtitles --whisper-model /path/to/ggml-large-v3-turbo.binAvailable models: tiny, base, small, medium, large (and .en English-only variants). Models are cached at ~/.cache/sonica/models/ after first download.
Subtitles are rendered with a semi-transparent black background at the bottom center of the video. Font size and line wrapping can be adjusted with --subtitle-font-size and --subtitle-max-chars.
sonica [OPTIONS] [INPUT]
Arguments:
[INPUT] Input audio file (WAV, MP3, FLAC, OGG)
Options:
-o, --output <PATH> Output video file [default: output.mp4]
-t, --template <NAME> Template name, or "all" to cycle [default: frequency_bars]
-b, --bitrate <RATE> Video bitrate (e.g. 2400k, 5M), overrides --crf
--width <PX> Video width [default: 1920]
--height <PX> Video height [default: 1080]
--fps <N> Frames per second [default: 30]
--crf <N> H.264 quality, 0-51, lower=better [default: 18]
--effects <LIST> Post-processing effects, comma-separated (use "none" to disable)
--smoothing <F> Audio smoothing factor, 0.0-1.0 [default: 0.85]
--title <TEXT> Title text overlay (top right)
--font <PATH> Font file for title/time overlay (TTF/OTF)
--font-url <URL> Font URL for title/time overlay (TTF/OTF or Google Fonts URL)
--show-time Show elapsed time overlay, MM:SS.CC (bottom right)
--param <KEY=VALUE> Template parameter overrides, comma-separated
--config <PATH> Config file path [default: ./sonica.toml]
--codec <NAME> FFmpeg video codec [default: libx264]
--pix-fmt <FMT> FFmpeg pixel format [default: yuv420p]
--list-templates List available templates and exit
--subtitles Enable speech-to-text subtitles (requires --features subtitles)
--whisper-model <M> Whisper model name or file path [default: base]
--subtitle-lang <L> Subtitle language, ISO 639-1 (e.g. "en", "ko"). Auto-detect if omitted
--subtitle-font-size <PX> Subtitle font size [default: 48]
--subtitle-max-chars <N> Max characters per subtitle line [default: 42]
-h, --help Print help
Each template defines configurable parameters (bar count, colors, etc.) with --param:
# Change bar count and disable mirroring
sonica audio.wav -t frequency_bars --param bar_count=128,mirror=false,gap_ratio=0.1
# Kaleidoscope with 8-fold symmetry
sonica audio.wav -t kaleidoscope --param symmetry=8,zoom=2.0
# More particles
sonica audio.wav -t particle_burst --param particle_count=500Use --list-templates to see available templates. Check each template's manifest.json for parameter definitions.
Sonica loads config from the first file found in this order:
--config <path>(explicit)./sonica.toml(current directory)~/.config/sonica/config.toml(global, works on all platforms)- Platform-specific config dir (
~/Library/Application Supporton macOS)
CLI flags always take priority over config values. See sonica.toml.example for a full annotated example.
[output]
width = 1280
height = 720
fps = 60
crf = 15
codec = "libx264"
pix_fmt = "yuv420p"
font = "/path/to/NotoSansKR-Regular.otf"
font_url = "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"
[audio]
smoothing = 0.9
effects = ["bloom", "vignette"]
[subtitle]
whisper_model = "base"
language = "ko"
font_size = 48.0
max_chars_per_line = 42For Korean text, use a font that includes CJK glyphs (for example NotoSansKR-Regular.otf from Google Fonts) via --font, --font-url, or font / font_url in the config.
WAV, MP3, FLAC, OGG/Vorbis, AAC — via symphonia.
- Decode audio to mono PCM samples
- Transcribe speech to timed subtitles via whisper.cpp (optional)
- Analyze in 3 passes:
- Global stats (peak levels, beat detection, tempo)
- Per-frame FFT with frequency band extraction (parallelized)
- Bidirectional smoothing and normalization
- Render each frame on GPU via wgpu (Metal/Vulkan) with WGSL shaders
- Post-process through a chain of effect shaders
- Overlay title, time, and subtitles on CPU
- Encode by piping raw RGBA frames to ffmpeg
On Apple M2 Max, 100 seconds of audio:
| Resolution | Effects | Time | Speed |
|---|---|---|---|
| 1280x720 | none | ~8s | 12x realtime |
| 1920x1080 | CRT (5 passes) | ~43s | 2.3x realtime |
MIT