Skip to content

rath/sonica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sonica

GPU-accelerated audio visualizer video generator.

Takes an audio file, runs FFT analysis, renders visualizations with GPU shaders, and outputs an MP4 video with the original audio. Any resolution, any aspect ratio — landscape, portrait, square.

Requirements

  • Rust 1.70+
  • ffmpeg (must be in PATH)
  • macOS (Metal) — tested
  • Linux (Vulkan) / Windows (DX12) — should work but untested

Install

cargo install --git https://github.com/rath/sonica

All templates and shaders are embedded in the binary, so no additional files are needed.

With subtitle support

cargo install --git https://github.com/rath/sonica --features subtitles

This adds speech-to-text subtitle overlay via whisper.cpp. Whisper models are automatically downloaded on first use.

Build from source

git clone https://github.com/rath/sonica
cd sonica
cargo build --release
# Or with subtitle support:
cargo build --release --features subtitles

Usage

# Basic — generates output.mp4 with frequency bars + default effects
sonica audio.wav

# Specify output and template
sonica audio.wav -o visualizer.mp4 -t circular_spectrum

# Cycle through all templates (equal duration each)
sonica audio.wav -t all --effects crt

# CRT retro style
sonica audio.wav --effects crt

# Portrait video for Instagram/TikTok (9:16)
sonica audio.wav -t circular_spectrum --width 1080 --height 1920

# Square video for Instagram feed (1:1)
sonica audio.wav -t particle_burst --width 1080 --height 1080

# 4K 60fps high quality
sonica track.flac -t kaleidoscope --width 3840 --height 2160 --fps 60 --crf 12

# Hardware encoding on macOS
sonica audio.wav --codec h264_videotoolbox --pix-fmt nv12

# Korean title with Google Noto Sans KR
sonica audio.wav --title "안녕하세요, SONICA" --font-url "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"

# 실제 동작되는 Noto Sans KR TTF/OTF URL 예시
sonica audio.wav --title "안녕하세요" --font-url "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"

# 로컬 폰트 파일 경로 예시 (macOS)
sonica audio.wav --title "안녕하세요" --font "/System/Library/Fonts/Supplemental/NotoSansCJK-Regular.ttc"

# Speech-to-text subtitles (requires --features subtitles)
sonica audio.wav -o output.mp4 --subtitles

# Subtitles with specific language and model
sonica audio.wav -o output.mp4 --subtitles --whisper-model small --subtitle-lang ko

# Customize subtitle appearance
sonica audio.wav -o output.mp4 --subtitles --subtitle-font-size 64 --subtitle-max-chars 30

# List available templates
sonica --list-templates

Templates

circular_spectrum

Radial spectrum analyzer with beat-reactive radius

demo-circular_spectrum.mp4

particle_burst

Beat-driven particle system

demo-particle_burst.mp4

waveform_scope

PCM waveform oscilloscope with glow

frequency_bars

Classic equalizer bars with log frequency mapping

kaleidoscope

Audio-reactive fractal kaleidoscope

spectrogram

Scrolling time-frequency heatmap

all

Cycle through all templates, equal duration each.

Effects

Post-processing effects can be combined with --effects:

# Single effect
sonica audio.wav --effects bloom

# Multiple effects
sonica audio.wav --effects bloom,vignette,chromatic_aberration

# CRT preset (scanlines + chromatic aberration + vignette + film grain + color grading)
sonica audio.wav --effects crt

Available effects: bloom, chromatic_aberration, vignette, film_grain, crt_scanlines, color_grading

When --effects is not specified, each template uses its own default effects.

Subtitles

Speech-to-text subtitle overlay using local whisper.cpp inference. Requires building with --features subtitles.

# Auto-detect language, base model (downloaded automatically)
sonica audio.wav --subtitles

# Korean speech with small model for better accuracy
sonica audio.wav --subtitles --whisper-model small --subtitle-lang ko

# Use a local model file
sonica audio.wav --subtitles --whisper-model /path/to/ggml-large-v3-turbo.bin

Available models: tiny, base, small, medium, large (and .en English-only variants). Models are cached at ~/.cache/sonica/models/ after first download.

Subtitles are rendered with a semi-transparent black background at the bottom center of the video. Font size and line wrapping can be adjusted with --subtitle-font-size and --subtitle-max-chars.

CLI Reference

sonica [OPTIONS] [INPUT]

Arguments:
  [INPUT]  Input audio file (WAV, MP3, FLAC, OGG)

Options:
  -o, --output <PATH>        Output video file [default: output.mp4]
  -t, --template <NAME>      Template name, or "all" to cycle [default: frequency_bars]
  -b, --bitrate <RATE>       Video bitrate (e.g. 2400k, 5M), overrides --crf
      --width <PX>           Video width [default: 1920]
      --height <PX>          Video height [default: 1080]
      --fps <N>              Frames per second [default: 30]
      --crf <N>              H.264 quality, 0-51, lower=better [default: 18]
      --effects <LIST>       Post-processing effects, comma-separated (use "none" to disable)
      --smoothing <F>        Audio smoothing factor, 0.0-1.0 [default: 0.85]
      --title <TEXT>         Title text overlay (top right)
      --font <PATH>          Font file for title/time overlay (TTF/OTF)
      --font-url <URL>       Font URL for title/time overlay (TTF/OTF or Google Fonts URL)
      --show-time            Show elapsed time overlay, MM:SS.CC (bottom right)
      --param <KEY=VALUE>    Template parameter overrides, comma-separated
      --config <PATH>        Config file path [default: ./sonica.toml]
      --codec <NAME>         FFmpeg video codec [default: libx264]
      --pix-fmt <FMT>        FFmpeg pixel format [default: yuv420p]
      --list-templates       List available templates and exit
      --subtitles            Enable speech-to-text subtitles (requires --features subtitles)
      --whisper-model <M>    Whisper model name or file path [default: base]
      --subtitle-lang <L>    Subtitle language, ISO 639-1 (e.g. "en", "ko"). Auto-detect if omitted
      --subtitle-font-size <PX>  Subtitle font size [default: 48]
      --subtitle-max-chars <N>   Max characters per subtitle line [default: 42]
  -h, --help                 Print help

Template Parameters

Each template defines configurable parameters (bar count, colors, etc.) with --param:

# Change bar count and disable mirroring
sonica audio.wav -t frequency_bars --param bar_count=128,mirror=false,gap_ratio=0.1

# Kaleidoscope with 8-fold symmetry
sonica audio.wav -t kaleidoscope --param symmetry=8,zoom=2.0

# More particles
sonica audio.wav -t particle_burst --param particle_count=500

Use --list-templates to see available templates. Check each template's manifest.json for parameter definitions.

Configuration File

Sonica loads config from the first file found in this order:

  1. --config <path> (explicit)
  2. ./sonica.toml (current directory)
  3. ~/.config/sonica/config.toml (global, works on all platforms)
  4. Platform-specific config dir (~/Library/Application Support on macOS)

CLI flags always take priority over config values. See sonica.toml.example for a full annotated example.

[output]
width = 1280
height = 720
fps = 60
crf = 15
codec = "libx264"
pix_fmt = "yuv420p"
font = "/path/to/NotoSansKR-Regular.otf"
font_url = "https://raw.githubusercontent.com/notofonts/noto-cjk/main/Sans/SubsetOTF/KR/NotoSansKR-Regular.otf"

[audio]
smoothing = 0.9

effects = ["bloom", "vignette"]

[subtitle]
whisper_model = "base"
language = "ko"
font_size = 48.0
max_chars_per_line = 42

For Korean text, use a font that includes CJK glyphs (for example NotoSansKR-Regular.otf from Google Fonts) via --font, --font-url, or font / font_url in the config.

Supported Audio Formats

WAV, MP3, FLAC, OGG/Vorbis, AAC — via symphonia.

How It Works

  1. Decode audio to mono PCM samples
  2. Transcribe speech to timed subtitles via whisper.cpp (optional)
  3. Analyze in 3 passes:
    • Global stats (peak levels, beat detection, tempo)
    • Per-frame FFT with frequency band extraction (parallelized)
    • Bidirectional smoothing and normalization
  4. Render each frame on GPU via wgpu (Metal/Vulkan) with WGSL shaders
  5. Post-process through a chain of effect shaders
  6. Overlay title, time, and subtitles on CPU
  7. Encode by piping raw RGBA frames to ffmpeg

Performance

On Apple M2 Max, 100 seconds of audio:

Resolution Effects Time Speed
1280x720 none ~8s 12x realtime
1920x1080 CRT (5 passes) ~43s 2.3x realtime

License

MIT

About

GPU-accelerated audio visualizer that renders WGSL shader templates to MP4 via wgpu and ffmpeg.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors