### Overview of the code:

Below code defines a Python class `AudioAugmentor` that performs various audio augmentations using libraries such as `torch`, `librosa`, `soundfile`, and `pydub`. Here's an overview of its components and functionalities:

### Components and Functionalities

1. **Class Initialization**:
   - The `AudioAugmentor` class is initialized with an optional configuration dictionary (`config`). This configuration dictates which augmentations to apply and their parameters.

2. **Loading and Saving Audio**:
   - `load_audio`: Loads an audio file using `librosa` and converts it to a PyTorch tensor.
   - `save_audio`: Saves a PyTorch tensor as an audio file using `soundfile`.

3. **File Conversion**:
   - `mp3_to_wav`: Converts an MP3 file to WAV format using `pydub`.

4. **Audio Augmentation Methods**:
   - `add_noise`: Adds Gaussian noise to the audio.
   - `reverse_audio`: Reverses the audio.
   - `change_speed`: Changes the speed of the audio using `librosa.effects.time_stretch`.
   - `slow_down_audio`: Slows down the audio.
   - `add_echo`: Adds an echo effect to the audio.
   - `add_sonic_boom_effect`: Adds a sonic boom effect, which is a combination of a delay and amplification followed by decay.
   - `pitch_shift`: Shifts the pitch of the audio.
   - `time_masking`: Masks a portion of the audio by setting it to zero.

5. **Applying Augmentations**:
   - `apply_augmentation`: Applies the selected augmentations based on the configuration dictionary. It returns a dictionary of augmented audio tensors.

### Detailed Method Descriptions

- **Loading and Saving**:
  - **load_audio**: Uses `librosa.load` to read the audio file and resample it to the specified sample rate (`sr`). It returns a PyTorch tensor and the sample rate.
  - **save_audio**: Uses `soundfile.write` to write a PyTorch tensor as an audio file.

- **Conversion**:
  - **mp3_to_wav**: Converts an MP3 file to WAV format using `pydub.AudioSegment.from_mp3` and `export`.

- **Augmentation Methods**:
  - **add_noise**: Adds random Gaussian noise to the audio tensor. The `noise_factor` controls the intensity of the noise.
  - **reverse_audio**: Uses `torch.flip` to reverse the audio tensor.
  - **change_speed**: Uses `librosa.effects.time_stretch` to change the speed of the audio.
  - **slow_down_audio**: Similar to `change_speed`, but specifically slows down the audio.
  - **add_echo**: Adds an echo effect by delaying and decaying the original audio signal.
  - **add_sonic_boom_effect**: Adds a sonic boom effect by amplifying the delayed signal and then decaying it.
  - **pitch_shift**: Shifts the pitch of the audio using `librosa.effects.pitch_shift`.
  - **time_masking**: Masks a random portion of the audio tensor by setting it to zero.

- **apply_augmentation**:
  - This method applies the selected augmentations as specified in the `config` dictionary. Each augmentation can be enabled or disabled and configured with specific parameters (e.g., noise factor, speed factor). It collects the results in a dictionary and returns it.


In [None]:
import torch
import librosa
import soundfile as sf
from pydub import AudioSegment
import numpy as np


# References:https://pytorch.org/audio/stable/tutorials/audio_data_augmentation_tutorial.html
class AudioAugmentor:
    def __init__(self, config=None):
        self.config = config if config else {}

    def load_audio(self, file_path, sample_rate=16000):
        audio, sr = librosa.load(file_path, sr=sample_rate)
        return torch.tensor(audio).float(), sr

    def save_audio(self, audio, file_path, sr=16000):
        sf.write(file_path, audio.numpy(), sr)

    def mp3_to_wav(self, mp3_path, wav_path):
        mp3_audio = AudioSegment.from_mp3(mp3_path)
        mp3_audio.export(wav_path, format="wav")
        print("MP3 converted to WAV successfully!")

    def add_noise(self, audio, noise_factor=0.3):
        noise = torch.randn(audio.size())
        augmented_audio = audio + noise_factor * noise
        return augmented_audio

    def reverse_audio(self, audio):
        return torch.flip(audio, dims=[0])

    def change_speed(self, audio, speed_factor=2.0):
        return torch.tensor(librosa.effects.time_stretch(audio.numpy(), rate=speed_factor)).float()

    def slow_down_audio(self, audio, rate=0.5):
        return torch.tensor(librosa.effects.time_stretch(audio.numpy(), rate=rate)).float()

    def add_echo(self, audio, sr, delay=0.5, decay=0.6):
        delay_samples = int(sr * delay)
        echo_audio = torch.zeros(len(audio) + delay_samples)
        echo_audio[:len(audio)] += audio
        echo_audio[delay_samples:] += decay * audio
        return echo_audio[:len(audio)]

    def add_sonic_boom_effect(self, audio, sr, delay=0.05, increase_factor=10, decay=0.6):
        delay_samples = int(sr * delay)
        boom_audio = torch.zeros(len(audio) + delay_samples)
        boom_audio[:len(audio)] += audio
        boom_audio[delay_samples:] += increase_factor * audio
        boom_audio[delay_samples:] *= decay
        return boom_audio[:len(audio)]

    def pitch_shift(self, audio, sr=16000, n_steps=5):
        audio_np = audio.numpy()
        shifted_audio = librosa.effects.pitch_shift(audio_np, sr=sr, n_steps=n_steps)
        return torch.tensor(shifted_audio).float()

    def time_masking(self, audio, mask_time=0.5):
        mask_samples = int(mask_time * len(audio))
        masked_audio = audio.clone()
        start_idx = np.random.randint(0, len(audio) - mask_samples)
        masked_audio[start_idx:start_idx + mask_samples] = 0.0
        return masked_audio

    def apply_augmentation(self, audio, sr):
        augmented_audios = {}
        if self.config.get('add_noise', {}).get('enabled', False):
            noise_factor = self.config['add_noise']['noise_factor']
            augmented_audios['add_noise'] = self.add_noise(audio, noise_factor)
        if self.config.get('change_speed', {}).get('enabled', False):
            speed_factor = self.config['change_speed'].get('speed_factor', 2.0)
            augmented_audios['change_speed'] = self.change_speed(audio, speed_factor)
        if self.config.get('reverse_audio', {}).get('enabled', False):
            augmented_audios['reverse_audio'] = self.reverse_audio(audio)
        if self.config.get('slow_down_audio', {}).get('enabled', False):
            rate = self.config['slow_down_audio'].get('rate', 0.5)
            augmented_audios['slow_down_audio'] = self.slow_down_audio(audio, rate)
        if self.config.get('add_echo', {}).get('enabled', False):
            delay = self.config['add_echo'].get('delay', 0.5)
            decay = self.config['add_echo'].get('decay', 0.6)
            augmented_audios['add_echo'] = self.add_echo(audio, sr, delay, decay)
        if self.config.get('pitch_shift', {}).get('enabled', False):
            n_steps = self.config['pitch_shift'].get('n_steps', 2)
            augmented_audios['pitch_shift'] = self.pitch_shift(audio, sr, n_steps)
        if self.config.get('time_masking', {}).get('enabled', False):
            mask_time = self.config['time_masking'].get('mask_time', 0.5)
            augmented_audios['time_masking'] = self.time_masking(audio, mask_time)
        if self.config.get('add_sonic_boom_effect', {}).get('enabled', False):
            delay = self.config['add_sonic_boom_effect'].get('delay', 0.05)
            increase_factor = self.config['add_sonic_boom_effect'].get('increase_factor', 10)
            decay = self.config['add_sonic_boom_effect'].get('decay', 0.6)
            augmented_audios['add_sonic_boom_effect'] = self.add_sonic_boom_effect(audio, sr, delay, increase_factor, decay)

        return augmented_audios
