## AudioLDM 2, but faster ⚡️

AudioLDM 2 was proposed in [AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining](https://arxiv.org/abs/2308.05734)
by Haohe Liu et al. AudioLDM 2 takes a text prompt as input and predicts the corresponding audio. It can generate realistic sound effects, human speech and music.

In this Colab, we showcase how to use AudioLDM 2 in the Hugging Face 🧨 Diffusers library, exploring a range of code optimisations such as half-precision and flash attention, and model optimisations such as scheduler choice and negative prompting, to reduce the inference time by over **10 times**, with minimal degradation in quality of the output audio.

Read to the end to find out how to generate a 10 second audio sample in just 2 seconds using a T4 GPU.

In [None]:
!nvidia-smi

In [None]:
!pip install --quiet --upgrade git+https://github.com/huggingface/diffusers.git git+https://github.com/huggingface/transformers.git accelerate

In [None]:
from diffusers import AudioLDM2Pipeline
import torch
import scipy
import os


model_id = "cvssp/audioldm2"  # model name
pipe = AudioLDM2Pipeline.from_pretrained(model_id, torch_dtype=torch.float16)

In [None]:
pipe.to("cuda");

In [None]:
generator = torch.Generator("cuda").manual_seed(0)

In [None]:
negative_prompt = "Low quality, average quality."

In [None]:
pipe.scheduler.compatibles

In [None]:
from diffusers import DPMSolverMultistepScheduler

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()


In [None]:
import pandas as pd

dataset = pd.read_excel('path_excel_file_with_all_the_captions_associated_to_each_file_name_respectively')
descriptions = list(dataset['Selected Caption'])
names = list(dataset['file_name'])
print(descriptions)
print(names)


for idx in range(len(descriptions)):
    audio = pipe(descriptions[idx], negative_prompt=negative_prompt, audio_length_in_s=10, num_inference_steps=20, generator=generator).audios[0]
    scipy.io.wavfile.write(names[idx], rate=16000, data=audio)
    print("saving audio: ", names[idx])


