# **Audio Processing For Machine Learning**

## **Working with Audio in Python**

In this project, we will provide a concise yet comprehensive tutorial on processing audio files and preparing them for machine learning applications. Audio data is rich and complex source of infromation, widely used in domains like speech recognition, music classification, and sentimental analysis. This tutorial will guide you through the essential steps to analyze and extract features from audio files, enabling their use in machine learning models.

1. **Understanding Audio as Data:** <br>
    * How audio is represented in the digital format(sample rate, amplitude, frequency)
    * How fundamentals of waveform visualisation and spectrograms.
2. **Preprocessing Audio:** <br>
    * Techniques for loading audio data using libraries like <librosa>
    * Methods to trim, normalize, and resample audio files for consistency.
3. **Feature Extraction:** <br>
    * Extracting meaningful features such as Mel Frequency Cepstral Coefficients (MFCCs), chroma features, and spectral contrast.
    * Converting raw audio into numerical representations suitable for machine learning algorithms.
4. **Building Machine Learning Models:** <br>
    * Using extracted features to train machine learning models to recognize patterns in audio.
    * Implementing classifiers for tasks like speech emotion recognition or genre classification. 

### **Why Process Audio Data for Machine Learning**
Audio data contains both temporal and frequency information, making it unique and challenging to work with. By transforming audio into numerical formats, such as spectrograms or MFCCs, we can leverage machine learning models to analyze, classify, and make predictions from audio inputs.

In [1]:
# Data Manipulation and Analysis
import pandas as pd
import numpy as np

# Visualisation Libraries
import matplotlib.pyplot as plt
import seaborn as sns

# File Handling
from glob import glob

# Audio Processing
import librosa
import librosa.display

# Interactive Audio Playback
import IPython.display as ipd

# Utility Functions
from itertools import cycle

# Set Plotting Theme
sns.set_theme(style="white", palette=None)

# Define Color Palette and Cycle for Visualizations
color_pal = plt.rcParams["axes.prop_cycle"].by_key()["color"]
color_cycle = cycle(color_pal)

### **Key Concepts in Digital Audio**
To work effectively with audio in its digital form, it's important to understand several key concepts. These fundamental terms define how audio is represented, measured, and processed in digital systems. <br>

1. **Frequency (Hz)**

    * **Definition**: Frequency represents the number of complete wave cycles that occur in one second. It determines the pitch of a sound. The unit of frequency is **Hertz(Hz)**.
    * **Key Insights**:
        * High-frequency sounds have shorter wavelengths and are perceived as high-pitched (e.g., a whistle or a violin).
        * Low-frequency sounds have longer wavelengths and are perceived as low-pitched (e.g., a drum or bass guitar).
    * **Real-Life Examples**:
        * The typical human hearing range is 20Hz to 20,000 Hz.
        * Speech frequencies generally range between 250Hz and 8,000 Hz.</n>

2. **Intensity (dB/Power)**:

    * **Definition**: Intensity measures the strength or power of a sound wave, represented by its amplitude. In audio processing, intensity is often expressed in decibles(dB)
    * **Key Insights**:
        * Larger amplitudes indicate louder sounds, while smaller ampllutudes correspond to quieter sounds.
        * Intensity is a logarithmic measure, meaning a small increase in decibles represents a significant change in perceived loudness.
    * **Real-Life Examples**:
        * A whisper is around 30 dB, normal conversation is about 60 dB, and a rock concert can reach 110 dB or more.
        * Prolonged exposure to sounds above 85 dB can cause hearing damage.

3. **Sample Rate**:

    * **Definition**: The sample rate specifies how frequently a digital system captures samples of an audio signal per second. It is measured in Herts (Hz) or samples per second.
    * **Key Insights**:
        * Higher sample rates capture more detail in the audio but result in larger file sizes.
        * A sample rate must be at least twice the highest frequency in the audio to avoid aliasing, as dedicated by the Nyquist Theorem.
    * **Common Sample Rates**:
        * 44.1 kHz: Uses in CDs and many digital audio formats.
        * 48 kHz: Standard for professional audio and video production.
        * 96 kHz and 192 kHz: Used for high-resolution audio applications.

In [2]:
# Load in Dataset using glob
audio_file = glob("ravdess-emotional-speech-audio/*/*.wav")

In [3]:
# Play audio file
ipd.Audio(audio_file[5])

In [None]:
# Load the audio file
# 