# Audio data augmentation tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audio/audio_data_augmentation.ipynb)


In this tutorial, we will explore how to augment audio data using both the `audiomentations` and `torch_audiomentations` libraries in combination with the `senselab` package. Data augmentation helps create variations of audio data and can be used, for example to improve the robustness of machine learning models by simulating different real-world conditions.

First, we should install senselab if it has not already been installed.

In [None]:
%pip install 'senselab[audio]'

Now, we start by importing the modules required for the augmentation, plotting, and audio processing tasks.

In [1]:
# import all necessary modules
from audiomentations import Compose as AudiomentationsCompose
from audiomentations import Gain as AudiomentationsGain
from torch_audiomentations import Compose as TorchAudiomentationsCompose
from torch_audiomentations import Gain as TorchAudiomentationsGain

from senselab.audio.data_structures import Audio
from senselab.audio.tasks.data_augmentation import augment_audios
from senselab.audio.tasks.plotting import play_audio, plot_waveform

Now, we define the augmentations that we will apply. We will create one augmentation pipeline using the `audiomentations` library and another using the `torch_audiomentations` library.

In this example, we will apply a simple `Gain` augmentation, which increases the volumne of the audio.
- `min_gain_in_db` and `max_gain_in_db` specify the range of gain (in decibels) to apply to the audio.
- p=1.0 ensures that the transformation is applied 100% of the time.

In [None]:
# Define augmentation
augment = AudiomentationsCompose([
    AudiomentationsGain(min_gain_in_db=14.99, max_gain_in_db=15, p=1.0)
    ])

# Define torch-based augmentation
torch_augment = TorchAudiomentationsCompose([
    TorchAudiomentationsGain(min_gain_in_db=14.99, max_gain_in_db=15, p=1.0)
    ])

Next, we load an audio file and perform basic analysis by playing the audio and visualizing its waveform.

In [None]:
# Load an audio file
!mkdir -p tutorial_audio_files
!wget -O tutorial_audio_files/audio_48khz_mono_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav

audio = Audio(filepath="tutorial_audio_files/audio_48khz_mono_16bits.wav")

# Play the audio
play_audio(audio)
# Plot the log-mel-spectrogram
plot_waveform(audio)

We will now apply the `audiomentations` augmentation pipeline to the audio and visualize the changes.

In [None]:
# Apply the augmentations using the wrapper
augmented_audios = augment_audios([audio, audio, audio], augment)
# Play the augmented audio
play_audio(augmented_audios[0])
# Plot the log-mel-spectrogram of the augmented audio
plot_waveform(augmented_audios[0])

Similarly, let's apply the augmentation using the `torch_audiomentations` library.

In [None]:
# Apply the augmentations using the wrapper with torch
torch_augmented_audios = augment_audios([audio], torch_augment)
# Play the audio augmented with torch
play_audio(torch_augmented_audios[0])
# Plot the log-mel-spectrogram of the audio augmented with torch
plot_waveform(torch_augmented_audios[0])