Skip to content

kadirnar/fast-dacvae

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast-DACVAE

Fast inference engine for DACVAE, a neural audio codec that compresses and reconstructs audio using a convolutional encoder-decoder with a VAE bottleneck. This library accelerates DACVAE inference up to 11.2x on NVIDIA GPUs through graph-level optimizations — with no custom kernels, no quality loss at FP32, and no changes to model weights.

Benchmark

NVIDIA H100 PCIe | facebook/dacvae-watermarked (107.7M params) | 100s audio @ 48kHz

Full Precision (FP32) — Zero Quality Loss

Method Latency Speedup Real-time Factor
PyTorch FP32 1,047 ms 1.0x 96x
+ channels_last + wn_off 549 ms 1.9x 182x
+ torch.compile + graph 209 ms 5.0x 478x

Half Precision (FP16 / BF16)

Method Latency Speedup RTF SNR vs FP32
PyTorch FP16 775 ms 1.4x 129x 40.4 dB
+ channels_last + wn_off 307 ms 3.4x 326x 40.2 dB
+ torch.compile + graph (FP16) 93 ms 11.2x 1,071x 40.2 dB
+ torch.compile + graph (BF16) 100 ms 10.5x 1,004x 29.8 dB

Quick Start

pip install git+https://github.com/kadirnar/fast-dacvae.git
from dacvae import DACVAE
from dacvae.optimize import optimize_dacvae
import torch

model = DACVAE.load("facebook/dacvae-watermarked").cuda().eval()
audio = torch.randn(1, 1, 4800000, device="cuda")

# FP32 — zero quality loss, ~209ms
replay = optimize_dacvae(model, audio, dtype="fp32")
output = replay()

# FP16 — fastest, ~93ms
replay = optimize_dacvae(model, audio, dtype="fp16")
output = replay()

# BF16 — ~100ms
replay = optimize_dacvae(model, audio, dtype="bf16")
output = replay()

Requirements

  • PyTorch 2.9+
  • NVIDIA GPU (Hopper/Ampere)

License

Apache 2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages