# Audio Generation with AudioLDM

Generate speech, sound effects, music and beyond.

https://audioldm.github.io

## This notebook supports

- Text2Audio Generation: generate audio given text input.
- Audio2Audio Generation: given an audio, generate another audio that contain the same type of sound. 
- Audio2Audio Style Transfer: transfer the sound of an audio into another one using the text description.

## Important tricks to make your generated audio sound better

1. Try to provide more hints to AudioLDM, such as using more adjectives to describe your sound (e.g., clearly, high quality) or make your target more specific (e.g., "water stream in a forest" instead of "stream"). This can make sure AudioLDM understand what you want. 
2. It's best to use general terms like 'man' or 'woman' instead of specific names for individuals or abstract objects that humans may not be familiar with.

In [None]:
#@title Settings

import os

connect_google_drive = False #@param {type:"boolean"}
use_model_path = False #@param {type:"boolean"}
model_path = '' #@param {type:"string"}

In [None]:
#@title Dependencies

if connect_google_drive:
  from google.colab import drive
  drive.mount('/gdrive')

if not use_model_path:
  model_path = '/content/AudioLDM-S-Full/'
  print('Getting AudioLDM model...')
  os.system('curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash')
  os.system('sudo apt-get install git-lfs')
  os.system('git clone https://huggingface.co/haoheliu/AudioLDM-S-Full')
  os.system('mv AudioLDM-S-Full/audioldm-s-full AudioLDM-S-Full/audioldm-s-full.ckpt')
  print('Done getting AudioLDM model!')

ckpt_path = os.path.join(model_path, 'audioldm-s-full.ckpt')
%env AUDIOLDM_CACHE_DIR=$model_path

!pip3 install audioldm==0.0.19

In [None]:
#@title Imports

import random
import uuid
import audioldm
import soundfile as sf
from IPython.display import Audio, display

model = audioldm.build_model(ckpt_path=ckpt_path)

def save_waveform(waveform, file_prefix):
    if len(waveform) == 1:
        waveform = waveform[0]
    file_name = f'{file_prefix}_{uuid.uuid4()}.wav'
    sf.write(file_name, waveform.T, 16000)
    display(Audio(filename=file_name, autoplay=False))
    print(f'Created {file_name} file')

## Text2Audio Generation

In [None]:
#@title Generate an audio guided by a text

text = '' #@param {type:"string"}

waveform = audioldm.text_to_audio(
    model,
    text,
    None,
    random.randint(0, 10_000_000),
    duration=10.0,
    guidance_scale=2.5,
    ddim_steps=200,
    n_candidate_gen_per_text=3,
    batchsize=1
)
save_waveform(waveform, 'text2audio')

## Audio2Audio Generation

In [None]:
#@title Generate an audio guided by an audio (output will have similar audio events as the input audio file).

audio_file_path = '' #@param {type:"string"}

assert audio_file_path is not None
assert os.path.exists(audio_file_path), 'The original audio file \'%s\' for style transfer does not exist.' % audio_file_path

waveform = audioldm.text_to_audio(
    model,
    '',
    audio_file_path,
    random.randint(0, 10_000_000),
    duration=10.0,
    guidance_scale=2.5,
    ddim_steps=200,
    n_candidate_gen_per_text=3,
    batchsize=1
)
save_waveform(waveform, 'audio2audio')

## Audio2Audio Style Transfer

In [None]:
#@title Text-guided Audio-to-Audio Style Transfer

audio_file_path = '' #@param {type:"string"}
text = '' #@param {type:"string"}
transfer_strength = 0.5 #@param {type:"slider", min:0.0, max:1.0, step:0.1}

assert audio_file_path is not None
assert os.path.exists(audio_file_path), 'The original audio file \'%s\' for style transfer does not exist.' % audio_file_path

waveform = audioldm.style_transfer(
    model,
    text,
    audio_file_path,
    transfer_strength,
    random.randint(0, 10_000_000),
    duration=10.0,
    guidance_scale=2.5,
    ddim_steps=200,
    batchsize=1
)
save_waveform(waveform, 'transfer_audio2audio')