In [None]:
#@markdown <h3> 🔧 Prepare and Setup Environment</h3>
!git clone https://github.com/glory20h/VoiceLDM.git
%cd VoiceLDM
!pip install -r requirements.txt

import torch
import torchaudio
from IPython.display import Audio
from voiceldm import VoiceLDMPipeline

if torch.cuda.is_available():
    device = torch.device("cuda:0")
else:
    device = torch.device("cpu")

pipe = VoiceLDMPipeline(device=device)

## 💡 Tips for Better Audio Generation

### Dual Classifier-Free Guidance Matters!

It's crucial to appropriately adjust the weights for dual classifier-free guidance. We find that this adjustment greatly influences the likelihood of obtaining satisfactory results. Here are some key tips:

1. Some weight settings are more effective for different prompts. Experiment with the weights and find the ideal combination that suits the specific use case.

2. Starting with 7 for both `desc_guidance_scale` and `cont_guidance_scale` is a good starting point.

2. If you feel that the generated audio doesn't align well with the provided content prompt, try decreasing the `desc_guidance_scale` and increase the `cont_guidance_scale`.

3. If you feel that the generated audio doesn't align well with the provided description prompt, try decreasing the `cont_guidance_scale` and increase the `desc_guidance_scale`.

In [None]:
desc_prompt = "She is talking in a park."
cont_prompt = "Good morning! How are you feeling today?"
audio_prompt = None
num_inference_steps = 50
desc_guidance_scale = 7
cont_guidance_scale = 7

audio = pipe(
    desc_prompt=desc_prompt,
    cont_prompt=cont_prompt,
    audio_prompt=audio_prompt,
    num_inference_steps=num_inference_steps,
    desc_guidance_scale=desc_guidance_scale,
    cont_guidance_scale=cont_guidance_scale,
    device=device,
)

Audio(data=audio, rate=16000)