<a href="https://colab.research.google.com/github/naashonomics/GenAI/blob/main/Build_meta_audiocraft_app.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**AudioCraft**
AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

**Installation**
AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:

Source: https://github.com/facebookresearch/audiocraft

In [1]:
# Don't run this if you already have PyTorch installed.
!python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
!python -m pip install setuptools wheel
# Then proceed to one of the following
!python -m pip install -U audiocraft  # stable release
!python -m pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
!python -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).
!python -m pip install -e '.[wm]'  # if you want to train a watermarking model

Collecting torch==2.1.0
  Downloading torch-2.1.0-cp311-cp311-manylinux1_x86_64.whl.metadata (25 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.1.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.1.0)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.1.0)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.1.0)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.1.0)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.1.0)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylin

**Packages we will use**


*   **torchaudio**: This library will be used for audio processing.
*   **audiocraft**: This library will provide music generation models.
*   **MusicGen from audiocraft.models**: This model will generate music as per the given prompt.
*   **Audio from IPython.display**: This module will display the audio output.
*   **Textarea from ipywidgets**: This module will be used to display a text area similar to HTML.
*   **Button from ipywidgets**: This module will be used to display a button similar to HTML.

In [2]:
import torchaudio,audiocraft
from audiocraft.models import MusicGen
from IPython.display import Audio
from ipywidgets import Textarea
from ipywidgets import Button

 **Loading the Pretrained Model**
 The MusicGen model has already learned the relationships between text descriptions and music audio from many examples.

Next we doing the following

*   Load the MusicGen model with its get_pretrained() method and store the model in a variable for later use.
*   Load the facebook/musicgen-small model type because it’s lightweight and faster.




In [3]:
model = MusicGen.get_pretrained('facebook/musicgen-small')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


state_dict.bin:   0%|          | 0.00/841M [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

compression_state_dict.bin:   0%|          | 0.00/236M [00:00<?, ?B/s]



**Configure Model Parameters**

Next step is to configure some parameters that control how music generation works. The MusicGen model has various options you can configure before generating music samples. Some of these options are as follows:

* Model Size: Different model variants, such as small, medium, large, or melody, which impacts the quality and computational requirements.

* Duration: The length of the generated music (e.g., 10 seconds, 30 seconds, etc.).

* Top-k Sampling: Controls how many of the highest probability tokens are considered during sampling, influencing diversity.

* Top-p Sampling (Nucleus Sampling): Filters tokens based on cumulative probability, keeping sampling more controlled.

In this task, you’ll set the duration of the music from the model using the set_generation_params() function

In [9]:
model.set_generation_params(duration=8)

The key steps are to pass text prompts to describe the music, generate audio with the model, and then play the audio.

1. give the model text prompts describing the music you want it to generate. If
you’d like to play classic rock, then prompt it with a line instruction like “classic rock song” or if you’d like to play a catchy pop song, then prompt it with “catchy pop song” instruction.

2. Use the generate method on your model and pass a list of text prompts in square brackets.

3. Use the sampling_rate parameter to specify the sampling rate for the audio.

4. Next, take the generated audio samples and play them.

5. Take the result from the generated music that you chose and use the Audio() function to play the audio.



> Note: Music generation might take couple of minutes.



In [10]:
 results = model.generate(['classic gutair beats '])
 sampling_rate =  model.sample_rate
 Audio(results[0].numpy(), rate=sampling_rate)

**Create an Input Text Box**

1. For imporved interactivity between user and code we use Widgets
Create an empty text box inside the Jupyter Notebook to allow multiline input using the Textarea() function. This will allow the user to enter a text description to generate music.

2. Display the text box.



>Note: This button box currently doesn’t poseess any functionality as we haven’t linked it yet.



In [11]:
description = Textarea(rows=4)
description

Textarea(value='', rows=4)

In [5]:
generate_button = Button(description="Generate Tune")
generate_button

Button(description='Generate Tune', style=ButtonStyle())

In [None]:
# Create a text area and a button
description = Textarea(value='', placeholder='Give a music prompt', disabled=False, rows=4)
generate_button = Button(description="Generate Tune")


# A function to generate music as prompted
def generate_tune(event):
    results = model.generate([description.value])
    sampling_rate =  model.sample_rate
display( Audio(results[0].numpy(), rate=sampling_rate) )

# Create a click event on the button
generate_button.on_click(generate_tune)


# Display the UI items
display(description)
display(generate_button)

In [None]:
 results = model.generate(['rock song'])
 sampling_rate =  model.sample_rate
 Audio(results[0].numpy(), rate=sampling_rate)

In [None]:
 results = model.generate(['upbeat rock song with guitar solo'])
 sampling_rate =  model.sample_rate
 Audio(results[0].numpy(), rate=sampling_rate)