# AudioGen
Welcome to AudioGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use AudioGen in different settings.

First, we start by initializing AudioGen. For now, we provide only a medium sized model for AudioGen: `facebook/audiogen-medium` - 1.5B transformer decoder.

**Important note:** This variant is different from the original AudioGen model presented at ["AudioGen: Textually-guided audio generation"](https://arxiv.org/abs/2209.15352) as the model architecture is similar to MusicGen with a smaller frame rate and multiple streams of tokens, allowing to reduce generation time.

## Installation of needed packages and github repositories

In [None]:
# installation of needed packages
!python -m pip install -U audiocraft
!python -m pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge

In [None]:
from audiocraft.models import AudioGen

model = AudioGen.get_pretrained('facebook/audiogen-medium')

Next, let us configure the generation parameters. Specifically, you can control the following:
* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.
* `top_k` (int, optional): top_k used for sampling. Defaults to 250.
* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.
* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.
* `duration` (float, optional): duration of the generated waveform. Defaults to 10.0.
* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.

When left unchanged, AudioGen will revert to its default parameters.

In [None]:
model.set_generation_params(
    use_sampling=True,
    top_k=250,
    duration=10
)

Next, we can go ahead and start generating sound using one of the following modes:
* Audio continuation using `model.generate_continuation`
* Text-conditional samples using `model.generate`

### Text-conditional Generation

In [None]:
from audiocraft.utils.notebook import display_audio
import os
from audiocraft.data.audio import audio_write
import time
import pandas as pd
import math

dataset = pd.read_excel('path_excel_file_with_all_the_captions_associated_to_each_file_name_respectively')
descriptions = list(dataset['Selected Caption'])
names = list(dataset['file_name'])
print(descriptions)
print(names)
start_time = time.time()

batch_size = 4  

num_batches = math.ceil(len(descriptions) / batch_size)

for batch_idx in range(num_batches):
    start_idx = batch_idx * batch_size
    end_idx = (batch_idx + 1) * batch_size
    batch_descriptions = descriptions[start_idx:end_idx]
    batch_names = names[start_idx:end_idx]
    
    output = model.generate(batch_descriptions, progress=True)

    for idx, name in enumerate(batch_names):
        audio_write(f"{name}", output[idx], model.sample_rate)
        print("saving audio number: ", idx)
        print("saving audio", name)
        
end_time = time.time()
execution_time = end_time - start_time
print("Execution Time:", execution_time, "seconds")