Model/Pipeline/Scheduler description
Stability AI recently released Stable Audio 3 Medium, a new 2-billion-parameter text-to-audio diffusion model. It is capable of generating high-quality music and sound effects, and performing targeted audio editing based on text prompts.
Given Diffusers' expanding support for audio modalities, adding a dedicated pipeline for Stable Audio 3 Medium would allow the community to easily download, run, and fine-tune this model using standard Diffusers syntax.
Open source status
Provide useful links for the implementation
Model Weights: https://huggingface.co/stabilityai/stable-audio-3-medium
Original Code/Tools: https://github.com/Stability-AI/stable-audio-tools
Announcement/Paper: https://stability.ai/news/stable-audio-open
Motivation:
Stable Audio 3 Medium represents a massive leap in open-weight audio generation. Currently, running it requires pulling custom code from the original research repository, which can be difficult to manage in production environments.
Integrating this into the diffusers ecosystem would
i. Make it vastly more accessible to developers building custom audio tools, AI-assisted DAWs, or automated sound design workflows.
ii. Allow the model to benefit from standard Diffusers memory optimizations (like sequential CPU offloading), enabling users to run this heavy model efficiently on consumer hardware.
iii. Allow seamless chaining with other modalities (e.g., generating video with a video pipeline, then passing the context to this pipeline for matching audio).
Model/Pipeline/Scheduler description
Stability AI recently released Stable Audio 3 Medium, a new 2-billion-parameter text-to-audio diffusion model. It is capable of generating high-quality music and sound effects, and performing targeted audio editing based on text prompts.
Given Diffusers' expanding support for audio modalities, adding a dedicated pipeline for Stable Audio 3 Medium would allow the community to easily download, run, and fine-tune this model using standard Diffusers syntax.
Open source status
Provide useful links for the implementation
Model Weights: https://huggingface.co/stabilityai/stable-audio-3-medium
Original Code/Tools: https://github.com/Stability-AI/stable-audio-tools
Announcement/Paper: https://stability.ai/news/stable-audio-open
Motivation:
Stable Audio 3 Medium represents a massive leap in open-weight audio generation. Currently, running it requires pulling custom code from the original research repository, which can be difficult to manage in production environments.
Integrating this into the diffusers ecosystem would
i. Make it vastly more accessible to developers building custom audio tools, AI-assisted DAWs, or automated sound design workflows.
ii. Allow the model to benefit from standard Diffusers memory optimizations (like sequential CPU offloading), enabling users to run this heavy model efficiently on consumer hardware.
iii. Allow seamless chaining with other modalities (e.g., generating video with a video pipeline, then passing the context to this pipeline for matching audio).