
# Audio Data Augmentation

**Author**: [Moto Hira](moto@meta.com)_

``torchaudio`` provides a variety of ways to augment audio data.

In this tutorial, we look into a way to apply effects, filters,
RIR (room impulse response) and codecs.

At the end, we synthesize noisy speech over phone from clean speech.


In [None]:
import torch
import torchaudio
import torchaudio.functional as F

print(torch.__version__)
print(torchaudio.__version__)

import os
import random

## Preparation

First, we import the modules and download the audio assets we use in this tutorial.




In [None]:
from IPython.display import Audio
import matplotlib.pyplot as plt

In [None]:
print(os.getcwd())
print(os.listdir("../../../datasets/GTZAN/gtzan_genre/genres/blues"))

In [None]:
root = '../../../datasets/GTZAN/gtzan_genre/genres/'
genres = ["blues", "classical", "country", "disco", "hiphop", "jazz", "metal", "pop", "reggae", "rock"]

test_list = []
for genre in genres:
    song = random.choice(os.listdir(root + genre))
    audio, sr = torchaudio.load(os.path.join(root, genre, song))
    test_list.append(['test_audio_' + str(genre), audio, sr])

print(test_list)

## Applying effects and filtering

:py:func:`torchaudio.sox_effects` allows for directly applying filters similar to
those available in ``sox`` to Tensor objects and file object audio sources.

There are two functions for this:

-  :py:func:`torchaudio.sox_effects.apply_effects_tensor` for applying effects
   to Tensor.
-  :py:func:`torchaudio.sox_effects.apply_effects_file` for applying effects to
   other audio sources.

Both functions accept effect definitions in the form
``List[List[str]]``.
This is mostly consistent with how ``sox`` command works, but one caveat is
that ``sox`` adds some effects automatically, whereas ``torchaudio``’s
implementation does not.

For the list of available effects, please refer to [the sox
documentation](http://sox.sourceforge.net/sox.html)_.

**Tip** If you need to load and resample your audio data on the fly,
then you can use :py:func:`torchaudio.sox_effects.apply_effects_file`
with effect ``"rate"``.

**Note** :py:func:`torchaudio.sox_effects.apply_effects_file` accepts a
file-like object or path-like object.
Similar to :py:func:`torchaudio.load`, when the audio format cannot be
inferred from either the file extension or header, you can provide
argument ``format`` to specify the format of the audio source.

**Note** This process is not differentiable.




In [None]:
torchaudio.sox_effects.effect_names()

https://sox.sourceforge.net/sox.html#EFFECTS

In [None]:
effects_to_keep = ['allpass',
 'band',
 'bandpass',
 'bandreject',
 'bass',
 'bend',
 'chorus',
 'compand',
 'contrast',
 'delay',
 'dither',
 'divide',
 'earwax',
 'echo',
 'echos',
 'equalizer',
 'flanger',
 'highpass',
 'hilbert',
 'loudness',
 'lowpass',
 'mcompand',
 'norm',
 'overdrive',
 'phaser',
 'pitch',
 'reverb',
 'speed',
 'stretch',
 'tempo',
 'treble',
 'tremolo']

In [None]:
for audio in test_list:
    print(audio[0])
    Audio(audio[1], rate=audio[2])

In [None]:
def plot_waveform(waveform, sample_rate, title="Waveform", xlim=None):
    waveform = waveform.numpy()

    num_channels, num_frames = waveform.shape
    time_axis = torch.arange(0, num_frames) / sample_rate

    figure, axes = plt.subplots(num_channels, 1)
    if num_channels == 1:
        axes = [axes]
    for c in range(num_channels):
        axes[c].plot(time_axis, waveform[c], linewidth=1)
        axes[c].grid(True)
        if num_channels > 1:
            axes[c].set_ylabel(f"Channel {c+1}")
        if xlim:
            axes[c].set_xlim(xlim)
    figure.suptitle(title)
    plt.show(block=False)

In [None]:
def play_audio(waveform, sample_rate):
  waveform = waveform.numpy()

  num_channels, num_frames = waveform.shape
  if num_channels == 1:
    display(Audio(waveform[0], rate=sample_rate))
  elif num_channels == 2:
    display(Audio((waveform[0], waveform[1]), rate=sample_rate))
  else:
    raise ValueError("Waveform with more than 2 channels are not supported.")

In [None]:
for i in range(len(test_list)):
    plot_waveform(test_list[i][1], test_list[i][2], title=str(test_list[i][0]), xlim=None)
    play_audio(test_list[i][1], test_list[i][2])

In [None]:
trans_list = []
for audio in test_list:
    # Define effects
    effects = [
        ["lowpass", "-1", "1000"],  # apply single-pole lowpass filter
        ["speed", "1.5"],  # reduce the speed
        # This only changes sample rate, so it is necessary to
        # add `rate` effect with original sample rate after this.
        ["rate", f"{audio[2]}"],
        ["reverb", "-w"],  # Reverbration
    ]

    # Apply effects
    y_trans, sr_trans = torchaudio.sox_effects.apply_effects_tensor(audio[1], audio[2], effects)
    trans_list.append([audio[0] + '_trans', y_trans, sr_trans])
    print(y_trans.shape)

### Effects applied:




In [None]:
for i in range(len(trans_list)):
    plot_waveform(trans_list[i][1], trans_list[i][2], title=str(trans_list[i][0]), xlim=None)
    play_audio(trans_list[i][1], trans_list[i][2])