# 🎵 Audio Fundamentals for Machine Learning

Welcome to the Audio ML Learning Journey! This notebook will teach you the fundamentals of working with audio data for machine learning applications.

## 📚 What You'll Learn

- **Audio Signal Basics**: Understanding digital audio and sound waves
- **Time vs Frequency Domain**: How to analyze audio in different domains
- **Audio Processing**: Loading, visualizing, and manipulating audio data
- **Practical Applications**: Real-world examples and interactive experiments

## 🎯 Learning Objectives

By the end of this notebook, you'll be able to:
1. Load and visualize audio files
2. Understand the difference between time and frequency domain representations
3. Apply basic audio processing techniques
4. Create meaningful audio visualizations
5. Prepare audio data for machine learning

Let's begin our audio adventure! 🚀

## 📦 Import Required Libraries

First, let's import all the libraries we'll need for our audio processing journey.

In [None]:
# Core libraries for numerical computing and data manipulation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Audio processing libraries
import librosa
import librosa.display
import soundfile as sf

# Machine learning libraries
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Deep learning libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers

# Visualization libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import Audio, display

# System libraries
import os
import sys
import warnings
from pathlib import Path

# Add our custom modules to path
sys.path.append('../src')

# Import our custom audio processing modules
try:
    from audio_processing import AudioProcessor, create_synthetic_audio
    from feature_extraction import FeatureExtractor, AudioFeatureExtractor
    from visualization import AudioVisualizer
    print("✅ Successfully imported custom audio modules!")
except ImportError as e:
    print(f"⚠️ Could not import custom modules: {e}")
    print("Please ensure you're running this notebook from the notebooks/ directory")

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

print("🎵 All libraries imported successfully!")
print(f"   NumPy version: {np.__version__}")
print(f"   Librosa version: {librosa.__version__}")
print(f"   TensorFlow version: {tf.__version__}")

## 🌊 Understanding Audio Signals

### What is Sound?

Sound is a **pressure wave** that travels through air (or other media). When we digitize sound:

- **Sampling Rate**: How many times per second we measure the wave (Hz)
- **Amplitude**: The strength/loudness of the sound
- **Frequency**: How fast the wave oscillates (pitch)

Let's create some synthetic audio to understand these concepts!

In [None]:
# Create different types of synthetic audio signals

duration = 2.0  # seconds
sample_rate = 22050  # samples per second

print("🎼 Creating synthetic audio signals...")

# 1. Pure sine wave (musical note A4 = 440 Hz)
sine_wave = create_synthetic_audio(duration, sample_rate, frequency=440, wave_type='sine')

# 2. Square wave (more harmonic content)
square_wave = create_synthetic_audio(duration, sample_rate, frequency=440, wave_type='square')

# 3. Sawtooth wave (rich harmonics)
sawtooth_wave = create_synthetic_audio(duration, sample_rate, frequency=440, wave_type='sawtooth')

# 4. White noise (random)
noise = create_synthetic_audio(duration, sample_rate, frequency=440, wave_type='noise')

# Create time axis for plotting
time_axis = np.linspace(0, duration, len(sine_wave))

print(f"✅ Created {len(sine_wave):,} samples for each signal")
print(f"   Duration: {duration} seconds")
print(f"   Sample rate: {sample_rate:,} Hz")

### 📊 Visualizing Audio Waveforms

Let's visualize these different types of audio signals to understand their characteristics:

In [None]:
# Create a comprehensive visualization of different waveforms

fig, axes = plt.subplots(4, 1, figsize=(14, 12))

signals = {
    'Sine Wave (Pure Tone)': sine_wave,
    'Square Wave (Harmonic Rich)': square_wave, 
    'Sawtooth Wave (Very Harmonic)': sawtooth_wave,
    'White Noise (Random)': noise
}

colors = ['blue', 'green', 'red', 'purple']

for idx, (name, signal) in enumerate(signals.items()):
    # Show only first 0.01 seconds for clarity
    samples_to_show = int(0.01 * sample_rate)
    
    axes[idx].plot(time_axis[:samples_to_show], signal[:samples_to_show], 
                   color=colors[idx], linewidth=1.5, alpha=0.8)
    
    axes[idx].set_title(f'{name}', fontsize=14, fontweight='bold')
    axes[idx].set_ylabel('Amplitude')
    axes[idx].grid(True, alpha=0.3)
    
    # Add statistics
    rms = np.sqrt(np.mean(signal**2))
    max_amp = np.max(np.abs(signal))
    axes[idx].text(0.02, 0.98, f'RMS: {rms:.3f}\nMax: {max_amp:.3f}', 
                   transform=axes[idx].transAxes, verticalalignment='top',
                   bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

axes[-1].set_xlabel('Time (seconds)')
plt.suptitle('Different Types of Audio Waveforms', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("🎵 Notice how different waveforms have different shapes and characteristics!")

### 🎧 Listen to the Audio Signals

Let's actually listen to these different signals to hear how they sound:

In [None]:
print("🎧 Listen to different waveform types:")
print("\n1. Sine Wave (Pure tone - sounds like a whistle):")
display(Audio(sine_wave, rate=sample_rate))

print("\n2. Square Wave (Harsh, digital sound):")
display(Audio(square_wave, rate=sample_rate))

print("\n3. Sawtooth Wave (Buzzy, rich sound):")
display(Audio(sawtooth_wave, rate=sample_rate))

print("\n4. White Noise (Static sound):")
display(Audio(noise, rate=sample_rate))

## 🔄 Time Domain vs Frequency Domain

### Time Domain
Shows how the signal changes over **time** (what we plotted above)

### Frequency Domain 
Shows what **frequencies** are present in the signal (using FFT - Fast Fourier Transform)

Let's explore both representations:

In [None]:
def plot_time_and_frequency_domain(signal, signal_name, sample_rate):
    """Plot both time domain and frequency domain representations."""
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Time domain (first 0.01 seconds)
    samples_to_show = int(0.01 * sample_rate)
    time_short = np.linspace(0, 0.01, samples_to_show)
    
    ax1.plot(time_short, signal[:samples_to_show], linewidth=2)
    ax1.set_title(f'{signal_name} - Time Domain')
    ax1.set_xlabel('Time (seconds)')
    ax1.set_ylabel('Amplitude')
    ax1.grid(True, alpha=0.3)
    
    # Frequency domain (FFT)
    fft = np.fft.fft(signal)
    freqs = np.fft.fftfreq(len(signal), 1/sample_rate)
    
    # Only show positive frequencies up to 2000 Hz for clarity
    positive_freqs = freqs[:len(freqs)//2]
    positive_fft = np.abs(fft[:len(fft)//2])
    
    freq_mask = positive_freqs <= 2000
    
    ax2.plot(positive_freqs[freq_mask], positive_fft[freq_mask], linewidth=2)
    ax2.set_title(f'{signal_name} - Frequency Domain')
    ax2.set_xlabel('Frequency (Hz)')
    ax2.set_ylabel('Magnitude')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# Analyze each signal
signals_to_analyze = [
    (sine_wave, "Sine Wave"),
    (square_wave, "Square Wave"),
    (sawtooth_wave, "Sawtooth Wave"),
    (noise, "White Noise")
]

for signal, name in signals_to_analyze:
    plot_time_and_frequency_domain(signal, name, sample_rate)
    print(f"\n{name}: Notice the frequency content!")
    if name == "Sine Wave":
        print("   - Single peak at 440 Hz (pure tone)")
    elif name == "Square Wave":
        print("   - Multiple peaks at odd harmonics (440, 1320, 2200 Hz...)")
    elif name == "Sawtooth Wave":
        print("   - Many harmonics at all multiples of 440 Hz")
    elif name == "White Noise":
        print("   - Energy spread across all frequencies")
    print()

## 🎛️ Interactive Audio Explorer

Let's create an interactive widget to explore how frequency and amplitude affect audio signals:

In [None]:
# Interactive audio parameter explorer

def interactive_audio_explorer(frequency=440, amplitude=0.5, wave_type='sine'):
    """Interactive function to explore audio parameters."""
    
    # Create audio signal
    duration = 1.0
    t = np.linspace(0, duration, int(sample_rate * duration), False)
    
    if wave_type == 'sine':
        signal = amplitude * np.sin(2 * np.pi * frequency * t)
    elif wave_type == 'square':
        signal = amplitude * np.sign(np.sin(2 * np.pi * frequency * t))
    elif wave_type == 'sawtooth':
        signal = amplitude * 2 * (t * frequency - np.floor(t * frequency + 0.5))
    
    # Plot time and frequency domain
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Time domain (first 0.02 seconds)
    samples_to_show = int(0.02 * sample_rate)
    time_short = t[:samples_to_show]
    
    ax1.plot(time_short, signal[:samples_to_show], linewidth=2, color='blue')
    ax1.set_title(f'{wave_type.capitalize()} Wave - {frequency} Hz')
    ax1.set_xlabel('Time (seconds)')
    ax1.set_ylabel('Amplitude')
    ax1.set_ylim(-1, 1)
    ax1.grid(True, alpha=0.3)
    
    # Frequency domain
    fft = np.fft.fft(signal)
    freqs = np.fft.fftfreq(len(signal), 1/sample_rate)
    positive_freqs = freqs[:len(freqs)//2]
    positive_fft = np.abs(fft[:len(fft)//2])
    
    freq_mask = positive_freqs <= 2000
    
    ax2.plot(positive_freqs[freq_mask], positive_fft[freq_mask], linewidth=2, color='red')
    ax2.set_title('Frequency Spectrum')
    ax2.set_xlabel('Frequency (Hz)')
    ax2.set_ylabel('Magnitude')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Display audio
    print(f"🎵 Playing {wave_type} wave at {frequency} Hz with amplitude {amplitude}")
    display(Audio(signal, rate=sample_rate))

# Create interactive widgets
frequency_slider = widgets.IntSlider(
    value=440, min=100, max=1000, step=10,
    description='Frequency (Hz):'
)

amplitude_slider = widgets.FloatSlider(
    value=0.5, min=0.1, max=1.0, step=0.1,
    description='Amplitude:'
)

wave_type_dropdown = widgets.Dropdown(
    options=['sine', 'square', 'sawtooth'],
    value='sine',
    description='Wave Type:'
)

# Create interactive widget
interactive_widget = widgets.interact(
    interactive_audio_explorer,
    frequency=frequency_slider,
    amplitude=amplitude_slider,
    wave_type=wave_type_dropdown
)

print("🎛️ Use the controls above to explore different audio parameters!")
print("   - Try different frequencies to hear pitch changes")
print("   - Change amplitude to hear volume changes")
print("   - Switch wave types to hear timbre differences")

## 🔧 Audio Processing with Our Custom Modules

Now let's use our custom audio processing modules to explore more advanced concepts:

In [None]:
# Initialize our custom audio processor
processor = AudioProcessor(sample_rate=22050)
visualizer = AudioVisualizer()

print("🔧 Initialized custom audio processing modules")

# Create a more complex audio signal for processing
print("\n🎼 Creating a complex musical signal...")

duration = 4.0
t = np.linspace(0, duration, int(sample_rate * duration), False)

# Create a chord (multiple frequencies)
frequencies = [261.63, 329.63, 392.00]  # C major chord (C, E, G)
complex_signal = np.zeros_like(t)

for i, freq in enumerate(frequencies):
    amplitude = 0.3 / (i + 1)  # Decreasing amplitude for each note
    complex_signal += amplitude * np.sin(2 * np.pi * freq * t)

# Add some amplitude modulation (vibrato effect)
modulation = 1 + 0.1 * np.sin(2 * np.pi * 5 * t)  # 5 Hz vibrato
complex_signal *= modulation

# Add a bit of noise for realism
complex_signal += 0.02 * np.random.normal(0, 1, len(complex_signal))

print(f"✅ Created complex signal with {len(frequencies)} frequencies")

# Get signal information
info = processor.get_audio_info(complex_signal)
print("\n📊 Audio Signal Information:")
for key, value in info.items():
    print(f"   {key}: {value:.4f}")

In [None]:
# Visualize the complex signal
print("📊 Visualizing the complex musical signal...")

visualizer.plot_waveform(
    complex_signal, 
    sample_rate=sample_rate,
    title="Complex Musical Signal (C Major Chord with Vibrato)"
)

# Let's listen to it
print("\n🎧 Listen to the complex musical signal:")
display(Audio(complex_signal, rate=sample_rate))

In [None]:
# Create spectrogram visualization
print("🌈 Creating spectrogram visualization...")

visualizer.plot_spectrogram(
    complex_signal,
    sample_rate=sample_rate,
    title="Spectrogram - Time vs Frequency vs Amplitude"
)

print("\n🔍 In the spectrogram, you can see:")
print("   - Horizontal lines at ~262, 330, and 392 Hz (the chord frequencies)")
print("   - Color intensity shows amplitude over time")
print("   - Any vibrato would show as slight frequency variations")

## ✂️ Audio Segmentation and Processing

For machine learning, we often need to break audio into smaller, manageable segments:

In [None]:
# Demonstrate audio segmentation
print("✂️ Demonstrating audio segmentation...")

# Create segments
segment_length = 1.0  # 1 second segments
segments = processor.segment_audio(
    complex_signal, 
    segment_length=segment_length,
    hop_length=0.5,  # 50% overlap
    pad_final=True
)

print(f"\n📊 Created {len(segments)} segments from {info['duration']:.2f}s audio")

# Visualize first few segments
fig, axes = plt.subplots(min(4, len(segments)), 1, figsize=(12, 8))
if len(segments) == 1:
    axes = [axes]

for i in range(min(4, len(segments))):
    segment_time = np.linspace(0, segment_length, len(segments[i]))
    axes[i].plot(segment_time, segments[i], linewidth=1, alpha=0.8)
    axes[i].set_title(f'Segment {i+1}')
    axes[i].set_ylabel('Amplitude')
    axes[i].grid(True, alpha=0.3)
    
    if i == len(axes) - 1:
        axes[i].set_xlabel('Time (seconds)')

plt.suptitle('Audio Segments', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Listen to individual segments
print("\n🎧 Listen to individual segments:")
for i in range(min(3, len(segments))):
    print(f"\nSegment {i+1}:")
    display(Audio(segments[i], rate=sample_rate))

## 🔄 Audio Data Augmentation

For machine learning, we often need to create variations of our data to improve model robustness:

In [None]:
# Demonstrate data augmentation techniques
print("🔄 Demonstrating audio data augmentation...")

# Use first segment for demonstration
original_segment = segments[0]

# 1. Add noise
noisy_audio = processor.add_noise(original_segment, noise_factor=0.05)

# 2. Time shift
shifted_audio = processor.time_shift(original_segment, shift_samples=1000)

# 3. Speed change
fast_audio = processor.change_speed(original_segment, speed_factor=1.2)
slow_audio = processor.change_speed(original_segment, speed_factor=0.8)

# Visualize augmentations
augmented_signals = {
    'Original': original_segment,
    'With Noise': noisy_audio,
    'Time Shifted': shifted_audio,
    'Faster (1.2x)': fast_audio,
    'Slower (0.8x)': slow_audio
}

fig, axes = plt.subplots(len(augmented_signals), 1, figsize=(12, 12))

for idx, (name, signal) in enumerate(augmented_signals.items()):
    # Show first 0.5 seconds for clarity
    samples_to_show = int(0.5 * sample_rate)
    if len(signal) >= samples_to_show:
        time_short = np.linspace(0, 0.5, samples_to_show)
        axes[idx].plot(time_short, signal[:samples_to_show], linewidth=1)
    else:
        time_short = np.linspace(0, len(signal)/sample_rate, len(signal))
        axes[idx].plot(time_short, signal, linewidth=1)
    
    axes[idx].set_title(f'{name}')
    axes[idx].set_ylabel('Amplitude')
    axes[idx].grid(True, alpha=0.3)
    
    # Add RMS value
    rms = np.sqrt(np.mean(signal**2))
    axes[idx].text(0.02, 0.98, f'RMS: {rms:.3f}', 
                   transform=axes[idx].transAxes, verticalalignment='top',
                   bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

axes[-1].set_xlabel('Time (seconds)')
plt.suptitle('Audio Data Augmentation Techniques', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n🎧 Listen to the augmented versions:")
for name, signal in list(augmented_signals.items())[:3]:  # First 3 for brevity
    print(f"\n{name}:")
    display(Audio(signal, rate=sample_rate))

## 🧠 Key Concepts Summary

### What We've Learned:

1. **Audio Signals**: Digital representations of sound waves
2. **Time Domain**: Shows amplitude changes over time
3. **Frequency Domain**: Shows what frequencies are present (FFT)
4. **Waveform Types**: Sine, square, sawtooth have different harmonic content
5. **Audio Processing**: Normalization, segmentation, augmentation
6. **Visualization**: Waveforms, spectrograms help understand audio

### Why This Matters for ML:

- **Feature Extraction**: We need to convert audio to numbers ML can understand
- **Data Preparation**: Segmentation creates consistent input sizes
- **Augmentation**: Creates more training data and robust models
- **Visualization**: Helps debug and understand model behavior

## 🎯 Practice Exercises

Try these exercises to reinforce your learning:

In [None]:
# Exercise 1: Create your own musical chord
print("🎵 Exercise 1: Create Your Own Musical Chord")
print("Try creating different chords by changing the frequencies below:")
print()

# Define some musical notes (frequencies in Hz)
notes = {
    'C4': 261.63, 'D4': 293.66, 'E4': 329.63, 'F4': 349.23,
    'G4': 392.00, 'A4': 440.00, 'B4': 493.88, 'C5': 523.25
}

print("Available notes:", list(notes.keys()))
print("\nTry creating different chords:")
print("- Major chord: C4, E4, G4")
print("- Minor chord: C4, D#4 (use 311.13), G4")
print("- Sus4 chord: C4, F4, G4")

# TODO: Modify this list to create your own chord!
my_chord_notes = ['C4', 'E4', 'G4']  # C Major chord

# Create the chord
duration = 3.0
t = np.linspace(0, duration, int(sample_rate * duration), False)
my_chord = np.zeros_like(t)

for note in my_chord_notes:
    if note in notes:
        freq = notes[note]
        my_chord += 0.3 * np.sin(2 * np.pi * freq * t)

print(f"\n🎼 Your chord: {' + '.join(my_chord_notes)}")
display(Audio(my_chord, rate=sample_rate))

# Visualize your chord
visualizer.plot_waveform(my_chord, sample_rate, title=f"Your Chord: {' + '.join(my_chord_notes)}")

In [None]:
# Exercise 2: Analyze the frequency content of your chord
print("🔍 Exercise 2: Frequency Analysis of Your Chord")

# Create FFT analysis
fft = np.fft.fft(my_chord)
freqs = np.fft.fftfreq(len(my_chord), 1/sample_rate)
positive_freqs = freqs[:len(freqs)//2]
positive_fft = np.abs(fft[:len(fft)//2])

# Plot frequency spectrum
plt.figure(figsize=(12, 6))
plt.plot(positive_freqs[:len(positive_freqs)//10], positive_fft[:len(positive_fft)//10])
plt.title(f'Frequency Spectrum of Your Chord: {" + ".join(my_chord_notes)}')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude')
plt.grid(True, alpha=0.3)

# Mark expected frequencies
for note in my_chord_notes:
    if note in notes:
        freq = notes[note]
        plt.axvline(x=freq, color='red', linestyle='--', alpha=0.7, label=f'{note} ({freq:.1f} Hz)')

plt.legend()
plt.show()

print("\n🎯 Can you identify the peaks corresponding to your chord notes?")
print("🎯 Do you see any harmonics (multiples of the fundamental frequencies)?")

## 🚀 Next Steps

Congratulations! You've learned the fundamentals of audio processing for machine learning. 

### What's Next:

1. **Feature Extraction** (Notebook 02): Learn to extract MFCC, spectral, and other ML features
2. **Model Training** (Notebook 03): Build and train audio classification models
3. **Real-time Processing** (Notebook 04): Apply your knowledge to live audio

### Key Takeaways:

- Audio is just numbers that can be processed mathematically
- Time and frequency domains provide different perspectives
- Visualization helps understand and debug audio processing
- Proper preprocessing is crucial for ML success

### Resources for Further Learning:

- [Librosa Documentation](https://librosa.org/)
- [Digital Signal Processing Basics](https://www.dspguide.com/)
- [Audio ML Papers](https://paperswithcode.com/area/audio)

Happy audio learning! 🎵🤖