"""
RAVDESS Emotion Recognition Tutorial
====================================

This tutorial teaches you how to build an emotion recognition system using
the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset.

Learning Objectives:
- Understand audio feature extraction (MFCC, Mel-spectrogram, etc.)
- Learn LSTM neural networks for sequence classification
- Apply machine learning to audio emotion recognition
- Visualize and interpret audio features

YOUR GOAL IS to improve the performance to acc 25%

# New Section

In [None]:

# ========================= STEP 1: Install and Import Libraries =========================
print("\n📚 STEP 1: Installing and Importing Required Libraries")
print("-" * 50)
print("We need several Python libraries for audio processing and machine learning:")
print("• kagglehub: Download datasets from Kaggle")
print("• librosa: Audio analysis and feature extraction")
print("• tensorflow: Deep learning framework")
print("• sklearn: Machine learning utilities")
print("• matplotlib/seaborn: Data visualization")

!pip install kagglehub librosa soundfile -q
!pip install tensorflow
import os
import numpy as np
import pandas as pd
import kagglehub
import librosa
import librosa.display
import soundfile as sf
import zipfile
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import warnings
warnings.filterwarnings('ignore')

print("✅ All libraries imported successfully!")
print("\nNext: Let's download the RAVDESS dataset...")



📚 STEP 1: Installing and Importing Required Libraries
--------------------------------------------------
We need several Python libraries for audio processing and machine learning:
• kagglehub: Download datasets from Kaggle
• librosa: Audio analysis and feature extraction
• tensorflow: Deep learning framework
• sklearn: Machine learning utilities
• matplotlib/seaborn: Data visualization
✅ All libraries imported successfully!

Next: Let's download the RAVDESS dataset...


In [None]:

print("\n📖 Understanding RAVDESS filename format:")
print("Example: 03-01-06-01-02-01-12.wav")
print("Position 1: Modality (01=full-AV, 02=video-only, 03=audio-only)")
print("Position 2: Vocal channel (01=speech, 02=song)")
print("Position 3: Emotion (01=neutral, 02=calm, 03=happy, 04=sad, 05=angry, 06=fearful, 07=disgust, 08=surprised)")
print("Position 4: Emotional intensity (01=normal, 02=strong)")
print("Position 5: Statement (01='Kids are talking by the door', 02='Dogs are sitting by the door')")
print("Position 6: Repetition (01=1st repetition, 02=2nd repetition)")
print("Position 7: Actor (01-24, odd=male, even=female)")



📖 Understanding RAVDESS filename format:
Example: 03-01-06-01-02-01-12.wav
Position 1: Modality (01=full-AV, 02=video-only, 03=audio-only)
Position 2: Vocal channel (01=speech, 02=song)
Position 3: Emotion (01=neutral, 02=calm, 03=happy, 04=sad, 05=angry, 06=fearful, 07=disgust, 08=surprised)
Position 4: Emotional intensity (01=normal, 02=strong)
Position 5: Statement (01='Kids are talking by the door', 02='Dogs are sitting by the door')
Position 6: Repetition (01=1st repetition, 02=2nd repetition)
Position 7: Actor (01-24, odd=male, even=female)


In [None]:

# ========================= STEP 3: Configuration and Setup =========================
print("\n⚙️ STEP 3: Setting Up Configuration Parameters")
print("-" * 50)
print("Let's define our emotion labels and audio processing parameters.")

# Emotion mapping - this tells us what each number means
EMOTIONS = {
    1: 'neutral',     # No particular emotion
    2: 'calm',        # Peaceful, relaxed
    3: 'happy',       # Joyful, positive
    4: 'sad',         # Sorrowful, negative
    5: 'angry',       # Aggressive, hostile
    6: 'fearful',     # Scared, anxious
    7: 'disgust',     # Repulsed, disgusted
    0: 'surprised'    # Shocked, amazed (originally 8, changed to 0)
}


⚙️ STEP 3: Setting Up Configuration Parameters
--------------------------------------------------
Let's define our emotion labels and audio processing parameters.



🔧 STEP 4: Creating Functions to Parse Filenames (DEBUG VERSION)
--------------------------------------------------

🔍 Checking multiple possible dataset locations...

❌ Path not found: /kaggle/input/ravdess-emotional-speech-audio

❌ Path not found: /kaggle/input/ravdess-emotional-speech-audio/audio_speech_actors_01-24

❌ Path not found: /kaggle/input/ravdess

✅ Path exists: /kaggle/input/

🔍 Exploring directory: /kaggle/input/

📁 Contents of /kaggle/input/:

🎵 Searching for .wav files in all subdirectories...

❌ No .wav files found in /kaggle/input/

✅ Path exists: .

🔍 Exploring directory: .

📁 Contents of .:
  📁 .config/
  📁 sample_data/

🎵 Searching for .wav files in all subdirectories...

❌ No .wav files found in .

❌ Path not found: ./ravdess-emotional-speech-audio

❌ Path not found: ./audio_speech_actors_01-24

🔍 Let's check what's available in /kaggle/input/

Available datasets in /kaggle/input/:


💡 RECOMMENDATIONS:
❌ No RAVDESS audio files found!

Possible solutions:
1. Make 

In [None]:
# ========================= STEP 4: Filename Parsing Functions =========================
import os
import pandas as pd
print("\n🔧 STEP 4: Creating Functions to Parse Filenames")
print("-" * 50)
print("We need to extract emotion information from the filename structure.")

def parse_filename(filename):
    """
    Parse RAVDESS filename to extract metadata.

    Args:
        filename (str): The audio filename to parse

    Returns:
        dict: Dictionary containing parsed information, or None if invalid

    Example:
        Input: "03-01-06-01-02-01-12.wav"
        Output: {'emotion': 6, 'emotion_label': 'fearful', 'intensity': 1, ...}
    """
    # Remove .wav extension and split by dashes
    parts = filename.replace('.wav', '').split('-')

    # Check if we have exactly 7 parts
    if len(parts) != 7:
        print(f"⚠️ Invalid filename format: {filename}")
        return None

    try:
        # Extract emotion (3rd position)
        emotion = int(parts[2])

        # Convert surprise from 8 to 0 for easier processing
        if emotion == 8:
            emotion = 0

        # Create metadata dictionary
        metadata = {
            'emotion': emotion,
            'emotion_label': EMOTIONS.get(emotion, 'unknown'),
            'intensity': int(parts[3]),      # 1=normal, 2=strong
            'statement': int(parts[4]),      # Which sentence was spoken
            'repetition': int(parts[5]),     # 1st or 2nd repetition
            'actor': int(parts[6]),          # Actor ID (1-24)
            'gender': 'female' if int(parts[6]) % 2 == 0 else 'male'  # Even=female, odd=male
        }

        return metadata

    except (ValueError, IndexError) as e:
        print(f"⚠️ Error parsing filename {filename}: {e}")
        return None

# Test the function
test_filename = "03-01-06-01-02-01-12.wav"
test_result = parse_filename(test_filename)
print(f"\n🧪 Testing filename parser:")
print(f"Input: {test_filename}")
print(f"Output: {test_result}")

def load_audio_files(data_path):
    """
    Load all audio files and extract their metadata.

    Args:
        data_path (str): Path to the dataset

    Returns:
        pd.DataFrame: DataFrame containing file information
    """
    print("🔍 Searching for audio files...")

    # Find the audio directory
    audio_path = os.path.join(data_path, 'audio_speech_actors_01-24')
    if not os.path.exists(audio_path):
        # Search for the correct directory
        for root, dirs, files in os.walk(data_path):
            if 'Actor_01' in dirs or any('.wav' in f for f in files):
                audio_path = root
                break

    print(f"📁 Audio files located in: {audio_path}")

    file_data = []

    # Walk through all directories and files
    for root, dirs, files in os.walk(audio_path):
        for filename in files:
            if filename.endswith('.wav'):
                file_path = os.path.join(root, filename)
                metadata = parse_filename(filename)

                # Only keep files with valid emotions
                if metadata and metadata['emotion'] in EMOTIONS:
                    metadata['file_path'] = file_path
                    metadata['filename'] = filename
                    file_data.append(metadata)

    # Create DataFrame
    df = pd.DataFrame(file_data)
    print(f"✅ Found {len(df)} valid audio files")

    if len(df) > 0:
        print("\n📈 Data distribution:")
        print("Emotions:")
        print(df['emotion_label'].value_counts())
        print(f"\nGender distribution:")
        print(df['gender'].value_counts())
        print(f"\nIntensity distribution:")
        print(df['intensity'].value_counts())

    return df

# Define the data path
data_path = '/kaggle/input/ravdess-emotional-speech-audio'

# Load file information
print("\n🔄 Loading audio file information...")
file_df = load_audio_files(data_path)

if len(file_df) == 0:
    raise ValueError("❌ No valid audio files found. Please check the data path.")

print(f"✅ Successfully loaded information for {len(file_df)} audio files")


🔧 STEP 4: Creating Functions to Parse Filenames
--------------------------------------------------
We need to extract emotion information from the filename structure.

🧪 Testing filename parser:
Input: 03-01-06-01-02-01-12.wav
Output: {'emotion': 6, 'emotion_label': 'fearful', 'intensity': 1, 'statement': 2, 'repetition': 1, 'actor': 12, 'gender': 'female'}

🔄 Loading audio file information...
🔍 Searching for audio files...
📁 Audio files located in: /kaggle/input/ravdess-emotional-speech-audio/audio_speech_actors_01-24
✅ Found 0 valid audio files


ValueError: ❌ No valid audio files found. Please check the data path.

In [None]:

# Audio processing parameters
SAMPLE_RATE = 22050    # How many samples per second (Hz)
DURATION = 3.0         # Length of audio to analyze (seconds)
OFFSET = 0.5          # Skip first 0.5 seconds (remove silence)

# Model training parameters
N_MELS = 128          # Number of Mel frequency bands
N_MFCC = 13           # Number of MFCC coefficients
BATCH_SIZE = 32       # Number of samples per training batch
EPOCHS = 50           # Maximum number of training epochs
VALIDATION_SPLIT = 0.2  # 20% for validation
TEST_SPLIT = 0.1      # 10% for testing

print(f"📊 Emotion categories: {list(EMOTIONS.values())}")
print(f"🎵 Audio parameters: Sample rate={SAMPLE_RATE}Hz, Duration={DURATION}s")
print(f"🧠 Training parameters: Batch size={BATCH_SIZE}, Max epochs={EPOCHS}")

📊 Emotion categories: ['neutral', 'calm', 'happy', 'sad', 'angry', 'fearful', 'disgust', 'surprised']
🎵 Audio parameters: Sample rate=22050Hz, Duration=3.0s
🧠 Training parameters: Batch size=32, Max epochs=50


In [None]:

# ========================= STEP 5: Audio Feature Extraction =========================
print("\n🎵 STEP 5: Understanding Audio Feature Extraction")
print("-" * 50)
print("Audio signals are complex. We need to extract meaningful features that")
print("capture the characteristics that distinguish different emotions.")
print("\nKey audio features we'll extract:")
print("• MFCC: Captures the shape of the spectral envelope")
print("• Mel-spectrogram: Time-frequency representation of audio")
print("• Chroma: Represents the 12 different pitch classes")
print("• Spectral contrast: Measures the difference in amplitude between peaks and valleys")
print("• Zero crossing rate: How often the signal crosses zero")

def extract_audio_features(file_path, sr=SAMPLE_RATE, duration=DURATION, offset=OFFSET, verbose=False):
    """
    Extract comprehensive audio features for emotion recognition.

    Args:
        file_path (str): Path to the audio file
        sr (int): Sample rate
        duration (float): Duration to load (seconds)
        offset (float): Offset from start (seconds)
        verbose (bool): Whether to show detailed output

    Returns:
        tuple: (features_dict, raw_audio) or (None, None) if error
    """
    try:
        # Step 1: Load audio file
        if verbose:
            print(f"🎵 Loading audio: {os.path.basename(file_path)}")
        audio, _ = librosa.load(file_path, sr=sr, duration=duration, offset=offset)

        # Step 2: Ensure consistent length
        target_length = int(sr * duration)
        if len(audio) < target_length:
            # Pad with zeros if too short
            audio = np.pad(audio, (0, target_length - len(audio)), mode='constant')
        else:
            # Truncate if too long
            audio = audio[:target_length]

        if verbose:
            print(f"   Audio length: {len(audio)} samples ({len(audio)/sr:.2f} seconds)")

        # Step 3: Extract features
        features = {}

        if verbose:
            print("   Extracting MFCC features...")
        # MFCC: Mel-frequency cepstral coefficients
        # These capture the shape of the spectral envelope
        mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=N_MFCC)
        features['mfcc'] = mfccs
        if verbose:
            print(f"     MFCC shape: {mfccs.shape}")

        if verbose:
            print("   Extracting Mel-spectrogram...")
        # Mel-spectrogram: Time-frequency representation
        mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr, n_mels=N_MELS)
        mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)
        features['mel_spectrogram'] = mel_spec_db
        if verbose:
            print(f"     Mel-spectrogram shape: {mel_spec_db.shape}")

        if verbose:
            print("   Extracting Chroma features...")
        # Chroma: Represents the 12 different pitch classes
        chroma = librosa.feature.chroma_stft(y=audio, sr=sr)
        features['chroma'] = chroma
        if verbose:
            print(f"     Chroma shape: {chroma.shape}")

        if verbose:
            print("   Extracting Spectral contrast...")
        # Spectral contrast: Difference between peaks and valleys in spectrum
        contrast = librosa.feature.spectral_contrast(y=audio, sr=sr)
        features['spectral_contrast'] = contrast
        if verbose:
            print(f"     Spectral contrast shape: {contrast.shape}")

        if verbose:
            print("   Extracting Zero crossing rate...")
        # Zero crossing rate: How often signal crosses zero
        zcr = librosa.feature.zero_crossing_rate(audio)
        features['zcr'] = zcr
        if verbose:
            print(f"     ZCR shape: {zcr.shape}")

        return features, audio

    except Exception as e:
        print(f"❌ Error processing {file_path}: {e}")
        return None, None

def prepare_lstm_features(features):
    """
    Prepare features for LSTM input.
    LSTM needs input in format: (time_steps, features)

    Args:
        features (dict): Dictionary of extracted features

    Returns:
        np.array: Features formatted for LSTM
    """
    # Use Mel-spectrogram as our main feature
    mel_spec = features['mel_spectrogram']  # Shape: (n_mels, time_frames)

    # Transpose to get (time_frames, n_mels) for LSTM
    mel_spec = mel_spec.T

    return mel_spec

# Test feature extraction on one file
print("\n🧪 Testing feature extraction on a sample file...")
test_file = file_df.iloc[0]
print(f"Test file: {test_file['filename']} (Emotion: {test_file['emotion_label']})")

test_features, test_audio = extract_audio_features(test_file['file_path'], verbose=True)
if test_features:
    test_lstm_features = prepare_lstm_features(test_features)
    print(f"✅ LSTM features shape: {test_lstm_features.shape}")
    print("   Format: (time_frames, mel_frequency_bands)")

In [None]:

# ========================= STEP 6: Data Augmentation Functions =========================
print("\n🔄 STEP 6: Data Augmentation Techniques")
print("-" * 50)
print("Data augmentation helps improve model performance by creating variations")
print("of our training data. This helps the model generalize better.")

def add_noise(audio, noise_factor=0.005):
    """
    Add Gaussian white noise to audio signal.
    This simulates real-world recording conditions.

    Args:
        audio (np.array): Original audio signal
        noise_factor (float): How much noise to add (0.005 = 0.5%)

    Returns:
        np.array: Audio with added noise
    """
    noise = np.random.normal(0, noise_factor, len(audio))
    return audio + noise

def shift_audio(audio, shift_max=0.2):
    """
    Shift audio in time (circular shift).
    This simulates different timing in speech.

    Args:
        audio (np.array): Original audio signal
        shift_max (float): Maximum shift as fraction of audio length

    Returns:
        np.array: Time-shifted audio
    """
    shift = np.random.randint(-int(len(audio) * shift_max), int(len(audio) * shift_max))
    return np.roll(audio, shift)

# Demonstrate data augmentation
print("\n🧪 Demonstrating data augmentation:")
original_audio = test_audio
noisy_audio = add_noise(original_audio)
shifted_audio = shift_audio(original_audio)

print(f"Original audio range: [{original_audio.min():.4f}, {original_audio.max():.4f}]")
print(f"Noisy audio range: [{noisy_audio.min():.4f}, {noisy_audio.max():.4f}]")
print(f"Shifted audio range: [{shifted_audio.min():.4f}, {shifted_audio.max():.4f}]")


In [None]:
# ========================= STEP 7: Feature Visualization Functions =========================
print("\n📊 STEP 7: Creating Feature Visualization Functions")
print("-" * 50)
print("Visualization helps us understand what our features look like and")
print("how different emotions appear in the feature space.")

def visualize_audio_features(features, audio, emotion_label, filename, sample_rate=SAMPLE_RATE):
    """
    Create comprehensive visualization of audio features.

    Args:
        features (dict): Extracted audio features
        audio (np.array): Raw audio signal
        emotion_label (str): Emotion name
        filename (str): Audio filename
        sample_rate (int): Audio sample rate
    """
    fig, axes = plt.subplots(2, 3, figsize=(18, 10))
    fig.suptitle(f'Audio Feature Visualization - {emotion_label.upper()} ({filename})', fontsize=16)

    # 1. Raw audio waveform
    time_axis = np.linspace(0, len(audio)/sample_rate, len(audio))
    axes[0, 0].plot(time_axis, audio, color='blue', alpha=0.7)
    axes[0, 0].set_title('Raw Audio Waveform')
    axes[0, 0].set_xlabel('Time (seconds)')
    axes[0, 0].set_ylabel('Amplitude')
    axes[0, 0].grid(True, alpha=0.3)

    # 2. MFCC features
    mfcc = features['mfcc']
    img1 = axes[0, 1].imshow(mfcc, aspect='auto', origin='lower', cmap='viridis')
    axes[0, 1].set_title(f'MFCC Features ({mfcc.shape[0]} x {mfcc.shape[1]})')
    axes[0, 1].set_xlabel('Time Frames')
    axes[0, 1].set_ylabel('MFCC Coefficients')
    plt.colorbar(img1, ax=axes[0, 1])

    # 3. Mel-spectrogram
    mel_spec = features['mel_spectrogram']
    img2 = axes[0, 2].imshow(mel_spec, aspect='auto', origin='lower', cmap='magma')
    axes[0, 2].set_title(f'Mel-Spectrogram ({mel_spec.shape[0]} x {mel_spec.shape[1]})')
    axes[0, 2].set_xlabel('Time Frames')
    axes[0, 2].set_ylabel('Mel Frequency Bands')
    plt.colorbar(img2, ax=axes[0, 2])

    # 4. Chroma features
    chroma = features['chroma']
    img3 = axes[1, 0].imshow(chroma, aspect='auto', origin='lower', cmap='coolwarm')
    axes[1, 0].set_title(f'Chroma Features ({chroma.shape[0]} x {chroma.shape[1]})')
    axes[1, 0].set_xlabel('Time Frames')
    axes[1, 0].set_ylabel('Pitch Classes')
    plt.colorbar(img3, ax=axes[1, 0])

    # 5. Spectral contrast
    contrast = features['spectral_contrast']
    img4 = axes[1, 1].imshow(contrast, aspect='auto', origin='lower', cmap='plasma')
    axes[1, 1].set_title(f'Spectral Contrast ({contrast.shape[0]} x {contrast.shape[1]})')
    axes[1, 1].set_xlabel('Time Frames')
    axes[1, 1].set_ylabel('Frequency Bands')
    plt.colorbar(img4, ax=axes[1, 1])

    # 6. Zero crossing rate
    zcr = features['zcr']
    axes[1, 2].plot(zcr[0], color='red', linewidth=2)
    axes[1, 2].set_title(f'Zero Crossing Rate (Length: {zcr.shape[1]})')
    axes[1, 2].set_xlabel('Time Frames')
    axes[1, 2].set_ylabel('ZCR')
    axes[1, 2].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Print feature statistics
    print(f"\n📊 Feature Statistics for {emotion_label.upper()}:")
    print(f"   MFCC: mean={np.mean(mfcc):.3f}, std={np.std(mfcc):.3f}")
    print(f"   Mel-spectrogram: mean={np.mean(mel_spec):.3f}, std={np.std(mel_spec):.3f}")
    print(f"   Chroma: mean={np.mean(chroma):.3f}, std={np.std(chroma):.3f}")
    print(f"   Spectral contrast: mean={np.mean(contrast):.3f}, std={np.std(contrast):.3f}")
    print(f"   Zero crossing rate: mean={np.mean(zcr):.6f}, std={np.std(zcr):.6f}")
    print("-" * 60)

def compare_emotions_features(all_features_samples, emotion_labels, sample_indices):
    """
    Compare features across different emotions.

    Args:
        all_features_samples (list): List of feature dictionaries
        emotion_labels (list): List of emotion names
        sample_indices (list): List of sample indices
    """
    n_emotions = len(sample_indices)
    fig, axes = plt.subplots(n_emotions, 3, figsize=(15, 4*n_emotions))

    if n_emotions == 1:
        axes = axes.reshape(1, -1)

    fig.suptitle('Feature Comparison Across Emotions', fontsize=16)

    for i, (features, emotion, idx) in enumerate(zip(all_features_samples, emotion_labels, sample_indices)):
        # MFCC comparison
        img1 = axes[i, 0].imshow(features['mfcc'], aspect='auto', origin='lower', cmap='viridis')
        axes[i, 0].set_title(f'{emotion.upper()} - MFCC')
        axes[i, 0].set_ylabel('MFCC Coefficients')
        if i == n_emotions - 1:
            axes[i, 0].set_xlabel('Time Frames')

        # Mel-spectrogram comparison
        img2 = axes[i, 1].imshow(features['mel_spectrogram'], aspect='auto', origin='lower', cmap='magma')
        axes[i, 1].set_title(f'{emotion.upper()} - Mel-Spectrogram')
        axes[i, 1].set_ylabel('Mel Frequency Bands')
        if i == n_emotions - 1:
            axes[i, 1].set_xlabel('Time Frames')

        # Chroma comparison
        img3 = axes[i, 2].imshow(features['chroma'], aspect='auto', origin='lower', cmap='coolwarm')
        axes[i, 2].set_title(f'{emotion.upper()} - Chroma')
        axes[i, 2].set_ylabel('Pitch Classes')
        if i == n_emotions - 1:
            axes[i, 2].set_xlabel('Time Frames')

    plt.tight_layout()
    plt.show()

In [None]:

# ========================= STEP 8: Batch Processing Audio Files =========================
print("\n⚙️ STEP 8: Processing All Audio Files")
print("-" * 50)
print("Now we'll process all audio files and extract features.")
print("This may take a few minutes depending on the dataset size.")

X_features = []           # Will store LSTM-ready features
X_raw_audio = []         # Will store raw audio for visualization
y_labels = []            # Will store emotion labels
file_info = []           # Will store file metadata
all_features_for_viz = [] # Will store complete features for visualization

# For visualization samples
viz_samples = {}  # {emotion_label: [indices]}
samples_per_emotion = 2  # Show 2 samples per emotion

print(f"🔄 Processing {len(file_df)} audio files...")

processed_count = 0
augmented_count = 0

# choose one methods from 4 methods
def choose_one_augment_method(audio):
  for idx, row in file_df.iterrows():
    # Extract features from audio file (without verbose output)
    features, audio = extract_audio_features(row['file_path'], verbose=False)

    if features is not None:
        # Prepare features for LSTM
        lstm_features = prepare_lstm_features(features)

        # Store everything
        X_features.append(lstm_features)
        X_raw_audio.append(audio)
        y_labels.append(row['emotion'])
        file_info.append(row)
        all_features_for_viz.append(features)
        processed_count += 1

        # Collect samples for visualization
        emotion_label = row['emotion_label']
        if emotion_label not in viz_samples:
            viz_samples[emotion_label] = []
        if len(viz_samples[emotion_label]) < samples_per_emotion:
            viz_samples[emotion_label].append(len(X_features) - 1)

        # Data augmentation: 50% chance to create augmented version
        if np.random.random() > 0.5:
            # Add noise to original audio
            noisy_audio = add_noise(audio)

            # Extract features from noisy audio
            try:
                # Create temporary features for noisy audio
                temp_features = {}
                temp_features['mfcc'] = librosa.feature.mfcc(y=noisy_audio, sr=SAMPLE_RATE, n_mfcc=N_MFCC)
                temp_features['mel_spectrogram'] = librosa.power_to_db(
                    librosa.feature.melspectrogram(y=noisy_audio, sr=SAMPLE_RATE, n_mels=N_MELS),
                    ref=np.max
                )
                temp_features['chroma'] = librosa.feature.chroma_stft(y=noisy_audio, sr=SAMPLE_RATE)
                temp_features['spectral_contrast'] = librosa.feature.spectral_contrast(y=noisy_audio, sr=SAMPLE_RATE)
                temp_features['zcr'] = librosa.feature.zero_crossing_rate(noisy_audio)

                noisy_lstm_features = prepare_lstm_features(temp_features)

                # Add augmented sample
                X_features.append(noisy_lstm_features)
                X_raw_audio.append(noisy_audio)
                y_labels.append(row['emotion'])
                file_info.append(row)
                all_features_for_viz.append(temp_features)
                augmented_count += 1

            except Exception as e:
                print(f"   ⚠️ Error in augmentation for file {idx}: {e}")

    # Progress update every 50 files or at the end
    if (idx + 1) % 50 == 0 or idx == len(file_df) - 1:
        print(f"   Progress: {idx + 1}/{len(file_df)} files processed")

print(f"✅ Feature extraction complete!")
print(f"   Original samples processed: {processed_count}")
print(f"   Augmented samples created: {augmented_count}")
print(f"   Total samples: {len(X_features)}")

In [None]:

# ========================= STEP 9: Feature Visualization Examples =========================
print("\n🎨 STEP 9: Visualizing Extracted Features")
print("-" * 50)
print("Let's visualize the features we extracted to understand what they represent.")

# Show detailed features for first sample of each emotion
for emotion_label, indices in viz_samples.items():
    if indices:  # If we have samples for this emotion
        idx = indices[0]  # Take the first sample
        row_info = file_info[idx]
        features = all_features_for_viz[idx]
        audio = X_raw_audio[idx]

        print(f"\n📈 Showing feature visualization for '{emotion_label.upper()}' emotion:")
        print(f"   File: {row_info['filename']}")
        print(f"   Actor: {row_info['actor']} ({row_info['gender']})")
        print(f"   Intensity: {'Normal' if row_info['intensity'] == 1 else 'Strong'}")

        visualize_audio_features(features, audio, emotion_label, row_info['filename'])

# Compare features across different emotions
print("\n🔍 Comparing Features Across Different Emotions:")
print("This helps us see how different emotions create different patterns.")

comparison_features = []
comparison_labels = []
comparison_indices = []

# Select up to 4 emotions for comparison
for emotion_label, indices in list(viz_samples.items())[:4]:
    if indices:
        idx = indices[0]
        comparison_features.append(all_features_for_viz[idx])
        comparison_labels.append(emotion_label)
        comparison_indices.append(idx)

if comparison_features:
    compare_emotions_features(comparison_features, comparison_labels, comparison_indices)

# Show feature dimension statistics
print(f"\n📊 Feature Dimension Summary:")
if all_features_for_viz:
    sample_features = all_features_for_viz[0]
    print(f"   MFCC: {sample_features['mfcc'].shape} (coefficients x time_frames)")
    print(f"   Mel-spectrogram: {sample_features['mel_spectrogram'].shape} (mel_bands x time_frames)")
    print(f"   Chroma: {sample_features['chroma'].shape} (pitch_classes x time_frames)")
    print(f"   Spectral contrast: {sample_features['spectral_contrast'].shape} (bands x time_frames)")
    print(f"   Zero crossing rate: {sample_features['zcr'].shape} (1 x time_frames)")

# Emotion distribution visualization
print("\n📈 Dataset Distribution Analysis:")
plt.figure(figsize=(15, 5))

# Pie chart of emotion distribution
plt.subplot(1, 3, 1)
emotion_counts = pd.Series([file_info[i]['emotion_label'] for i in range(len(file_info))]).value_counts()
colors = plt.cm.Set3(np.linspace(0, 1, len(emotion_counts)))
plt.pie(emotion_counts.values, labels=emotion_counts.index, autopct='%1.1f%%', colors=colors)
plt.title('Emotion Distribution')

# Bar chart of emotion counts
plt.subplot(1, 3, 2)
plt.bar(emotion_counts.index, emotion_counts.values, color=colors)
plt.title('Sample Count per Emotion')
plt.xlabel('Emotion Category')
plt.ylabel('Number of Samples')
plt.xticks(rotation=45)

# Gender distribution
plt.subplot(1, 3, 3)
gender_counts = pd.Series([file_info[i]['gender'] for i in range(len(file_info))]).value_counts()
plt.bar(gender_counts.index, gender_counts.values, color=['lightblue', 'lightpink'])
plt.title('Gender Distribution')
plt.xlabel('Gender')
plt.ylabel('Number of Samples')

plt.tight_layout()
plt.show()

In [None]:

# ========================= STEP 10: Preparing Data for LSTM =========================
print("\n🔧 STEP 10: Preparing Data for LSTM Neural Network")
print("-" * 50)
print("LSTM (Long Short-Term Memory) networks work with sequences.")
print("We need to format our data properly for the LSTM to understand it.")

# Find the maximum sequence length
max_length = max([x.shape[0] for x in X_features])
print(f"📏 Maximum sequence length found: {max_length} time frames")
print(f"   This represents {max_length * (DURATION / max_length):.2f} seconds of audio")

def pad_sequence(seq, max_len):
    """
    Pad sequences to have the same length.
    Shorter sequences are padded with zeros.
    Longer sequences are truncated.

    Args:
        seq (np.array): Input sequence
        max_len (int): Target length

    Returns:
        np.array: Padded/truncated sequence
    """
    if seq.shape[0] < max_len:
        # Pad with zeros if too short
        pad_width = max_len - seq.shape[0]
        return np.pad(seq, ((0, pad_width), (0, 0)), mode='constant')
    else:
        # Truncate if too long
        return seq[:max_len]

print("⚙️ Padding all sequences to uniform length...")
# Pad all sequences to the same length
X_padded = np.array([pad_sequence(x, max_length) for x in X_features])
y_array = np.array(y_labels)

print(f"✅ Feature matrix shape: {X_padded.shape}")
print(f"   Format: (samples, time_steps, features)")
print(f"   - {X_padded.shape[0]} samples")
print(f"   - {X_padded.shape[1]} time steps")
print(f"   - {X_padded.shape[2]} features (mel frequency bands)")
print(f"📊 Label array shape: {y_array.shape}")



In [None]:
# ========================= STEP 11: Dataset Splitting =========================
print("\n📂 STEP 11: Splitting Dataset into Train/Validation/Test Sets")
print("-" * 50)
print("We split our data into three parts:")
print("• Training set (70%): Used to train the model")
print("• Validation set (15%): Used to tune the model during training")
print("• Test set (15%): Used to evaluate final model performance")

# First split: separate test set (15%)
X_train_val, X_test, y_train_val, y_test = train_test_split(
    X_padded, y_array,
    test_size=0.15,
    random_state=42,
    stratify=y_array  # Ensure balanced distribution
)

# Second split: separate training and validation (from remaining 85%)
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val,
    test_size=0.176,  # 0.176 * 0.85 ≈ 0.15 of total data
    random_state=42,
    stratify=y_train_val
)

print(f"📊 Dataset split complete:")
print(f"   Training set: {X_train.shape[0]} samples ({X_train.shape[0]/len(X_padded)*100:.1f}%)")
print(f"   Validation set: {X_val.shape[0]} samples ({X_val.shape[0]/len(X_padded)*100:.1f}%)")
print(f"   Test set: {X_test.shape[0]} samples ({X_test.shape[0]/len(X_padded)*100:.1f}%)")

# Convert labels to one-hot encoding
print("\n🔢 Converting labels to one-hot encoding...")
print("One-hot encoding converts labels like 'happy' to vectors like [0,0,1,0,0,0,0,0]")

num_classes = len(EMOTIONS)
y_train_cat = to_categorical(y_train, num_classes)
y_val_cat = to_categorical(y_val, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

print(f"✅ One-hot encoding complete:")
print(f"   Number of emotion classes: {num_classes}")
print(f"   Original label example: {y_train[0]} ({EMOTIONS[y_train[0]]})")
print(f"   One-hot label example: {y_train_cat[0]}")

# Check class distribution in each set
print(f"\n📈 Class distribution verification:")
for set_name, y_set in [("Training", y_train), ("Validation", y_val), ("Test", y_test)]:
    unique, counts = np.unique(y_set, return_counts=True)
    print(f"   {set_name} set:")
    for emotion_id, count in zip(unique, counts):
        emotion_name = EMOTIONS[emotion_id]
        percentage = count / len(y_set) * 100
        print(f"     {emotion_name}: {count} samples ({percentage:.1f}%)")

In [None]:

# ========================= STEP 12: Building the LSTM Model =========================
print("\n🧠 STEP 12: Building the LSTM Neural Network")
print("-" * 50)
print("LSTM (Long Short-Term Memory) networks are perfect for sequence data like audio.")
print("They can remember patterns across time, which is important for emotion recognition.")
print("\nOur model architecture:")
print("1. LSTM Layer 1: 128 units, returns sequences")
print("2. Batch Normalization: Stabilizes training")
print("3. Dropout: Prevents overfitting (30%)")
print("4. LSTM Layer 2: 64 units, final output")
print("5. Batch Normalization: Stabilizes training")
print("6. Dropout: Prevents overfitting (30%)")
print("7. Dense Layer 1: 64 neurons, ReLU activation")
print("8. Dropout: Prevents overfitting (50%)")
print("9. Dense Layer 2: 32 neurons, ReLU activation")
print("10. Output Layer: 8 neurons (one per emotion), Softmax activation")

def create_emotion_model(input_shape, num_classes):
    """
    Create an LSTM model for emotion recognition.

    Args:
        input_shape (tuple): Shape of input data (time_steps, features)
        num_classes (int): Number of emotion classes

    Returns:
        tensorflow.keras.Model: Compiled LSTM model
    """
    print(f"🏗️ Building model with input shape: {input_shape}")
    print(f"   Output classes: {num_classes}")

    model = Sequential([
        # First LSTM layer - processes sequences and passes to next layer
        LSTM(128,
             return_sequences=True,  # Return full sequence for next LSTM layer
             input_shape=input_shape,
             name='lstm_1'),
        BatchNormalization(name='batch_norm_1'),
        Dropout(0.3, name='dropout_1'),

        # Second LSTM layer - final sequence processing
        LSTM(64,
             return_sequences=False,  # Return only final output
             name='lstm_2'),
        BatchNormalization(name='batch_norm_2'),
        Dropout(0.3, name='dropout_2'),

        # Fully connected layers for classification
        Dense(64, activation='relu', name='dense_1'),
        BatchNormalization(name='batch_norm_3'),
        Dropout(0.5, name='dropout_3'),

        Dense(32, activation='relu', name='dense_2'),
        Dropout(0.3, name='dropout_4'),

        # Output layer - one neuron per emotion class
        Dense(num_classes, activation='softmax', name='output')
    ])

    return model

# Create the model
input_shape = (X_train.shape[1], X_train.shape[2])  # (time_steps, features)
model = create_emotion_model(input_shape, num_classes)

# Compile the model
print("\n⚙️ Compiling the model...")
print("• Optimizer: Adam (adaptive learning rate)")
print("• Loss function: Categorical crossentropy (for multi-class classification)")
print("• Metrics: Accuracy")

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Display model architecture
print("\n🏗️ Model Architecture Summary:")
model.summary()

# Calculate total parameters
total_params = model.count_params()
print(f"\n📊 Model Statistics:")
print(f"   Total parameters: {total_params:,}")
print(f"   Model layers: {len(model.layers)}")

In [None]:

# ========================= STEP 13: Setting Up Training Callbacks =========================
print("\n⚙️ STEP 13: Setting Up Training Callbacks")
print("-" * 50)
print("Callbacks help us train the model more effectively:")
print("• Early Stopping: Stops training if validation accuracy doesn't improve")
print("• Learning Rate Reduction: Reduces learning rate when training plateaus")

# Early stopping: prevent overfitting
early_stopping = EarlyStopping(
    monitor='val_accuracy',      # Watch validation accuracy
    patience=10,                 # Wait 10 epochs before stopping
    restore_best_weights=True,   # Use the best weights found
    verbose=1,                   # Print when stopping
    mode='max'                   # We want to maximize accuracy
)

# Learning rate reduction: improve convergence
lr_reduction = ReduceLROnPlateau(
    monitor='val_loss',          # Watch validation loss
    factor=0.5,                  # Reduce LR by half
    patience=5,                  # Wait 5 epochs before reducing
    min_lr=1e-7,                # Don't go below this learning rate
    verbose=1                    # Print when reducing
)

callbacks = [early_stopping, lr_reduction]

print("✅ Callbacks configured:")
print(f"   Early stopping patience: 10 epochs")
print(f"   Learning rate reduction factor: 0.5")

In [None]:

# ========================= STEP 14: Training the Model =========================
print("\n🚀 STEP 14: Training the Model")
print("-" * 50)
print("Now we train our LSTM model on the emotion recognition task.")
print("This process will take several minutes...")
print(f"\nTraining parameters:")
print(f"   Batch size: {BATCH_SIZE}")
print(f"   Maximum epochs: {EPOCHS}")
print(f"   Training samples: {len(X_train)}")
print(f"   Validation samples: {len(X_val)}")

print("\n🏃‍♂️ Starting training...")
print("Watch the accuracy and loss values:")
print("• Training accuracy should increase over time")
print("• Validation accuracy should also increase (and not lag behind too much)")
print("• Training loss should decrease over time")
print("• Validation loss should also decrease")

# Train the model
history = model.fit(
    X_train, y_train_cat,                    # Training data
    validation_data=(X_val, y_val_cat),     # Validation data
    epochs=EPOCHS,                          # Maximum number of epochs
    batch_size=BATCH_SIZE,                  # Samples per batch
    callbacks=callbacks,                    # Early stopping and LR reduction
    verbose=1                               # Show progress bar
)

print("\n🎉 Training completed!")

# Get training history for analysis
train_acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
train_loss = history.history['loss']
val_loss = history.history['val_loss']

print(f"📊 Training Results:")
print(f"   Final training accuracy: {train_acc[-1]:.4f} ({train_acc[-1]*100:.1f}%)")
print(f"   Final validation accuracy: {val_acc[-1]:.4f} ({val_acc[-1]*100:.1f}%)")
print(f"   Best validation accuracy: {max(val_acc):.4f} ({max(val_acc)*100:.1f}%)")
print(f"   Total epochs trained: {len(train_acc)}")

In [None]:

# ========================= STEP 15: Model Evaluation =========================
print("\n📊 STEP 15: Evaluating Model Performance")
print("-" * 50)
print("Now let's see how well our model performs on the test set.")
print("The test set contains data the model has never seen before.")

# Evaluate on test set
print("🧪 Testing model on unseen data...")
test_loss, test_accuracy = model.evaluate(X_test, y_test_cat, verbose=0)

print(f"\n🎯 Test Results:")
print(f"   Test Loss: {test_loss:.4f}")
print(f"   Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.1f}%)")

# Generate predictions
print("\n🔮 Generating detailed predictions...")
y_pred = model.predict(X_test, verbose=0)
y_pred_classes = np.argmax(y_pred, axis=1)

# Calculate additional metrics
print("\n📈 Detailed Performance Analysis:")

# Classification report
emotion_names = [EMOTIONS[i] for i in range(num_classes)]
print("\n📋 Classification Report:")
print("This shows precision, recall, and F1-score for each emotion.")
print("• Precision: Of all predictions for this emotion, how many were correct?")
print("• Recall: Of all actual instances of this emotion, how many were found?")
print("• F1-score: Harmonic mean of precision and recall")
print("\n" + classification_report(y_test, y_pred_classes, target_names=emotion_names))

# Confusion matrix analysis
print("\n🔍 Confusion Matrix Analysis:")
print("Shows which emotions the model confuses with each other.")
cm = confusion_matrix(y_test, y_pred_classes)

# Print simplified confusion matrix with emotion names
print("\nConfusion Matrix (Actual vs Predicted):")
print(f"{'':12}", end="")
for emotion in emotion_names:
    print(f"{emotion[:8]:>8}", end="")
print()

for i, actual_emotion in enumerate(emotion_names):
    print(f"{actual_emotion[:12]:12}", end="")
    for j in range(len(emotion_names)):
        print(f"{cm[i,j]:8d}", end="")
    print()


In [None]:

# ========================= STEP 16: Visualization of Results =========================
print("\n📈 STEP 16: Visualizing Training Results and Performance")
print("-" * 50)

# Create comprehensive visualization
fig = plt.figure(figsize=(20, 12))

# Training history - Accuracy
plt.subplot(2, 3, 1)
plt.plot(train_acc, label='Training Accuracy', linewidth=2, color='blue')
plt.plot(val_acc, label='Validation Accuracy', linewidth=2, color='orange')
plt.title('Model Accuracy Over Time', fontsize=14)
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

# Training history - Loss
plt.subplot(2, 3, 2)
plt.plot(train_loss, label='Training Loss', linewidth=2, color='red')
plt.plot(val_loss, label='Validation Loss', linewidth=2, color='purple')
plt.title('Model Loss Over Time', fontsize=14)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

# Confusion Matrix Heatmap
plt.subplot(2, 3, 3)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
           xticklabels=[name[:6] for name in emotion_names],
           yticklabels=[name[:6] for name in emotion_names])
plt.title('Confusion Matrix', fontsize=14)
plt.xlabel('Predicted Emotion')
plt.ylabel('Actual Emotion')

# Per-class accuracy
plt.subplot(2, 3, 4)
class_accuracy = cm.diagonal() / cm.sum(axis=1)
bars = plt.bar(range(len(emotion_names)), class_accuracy,
               color=plt.cm.viridis(np.linspace(0, 1, len(emotion_names))))
plt.title('Accuracy per Emotion Class', fontsize=14)
plt.xlabel('Emotion')
plt.ylabel('Accuracy')
plt.xticks(range(len(emotion_names)), [name[:6] for name in emotion_names], rotation=45)
plt.ylim(0, 1)

# Add accuracy values on bars
for i, (bar, acc) in enumerate(zip(bars, class_accuracy)):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{acc:.2f}', ha='center', va='bottom')

# Model prediction confidence distribution
plt.subplot(2, 3, 5)
confidence_scores = np.max(y_pred, axis=1)
plt.hist(confidence_scores, bins=20, alpha=0.7, color='green', edgecolor='black')
plt.title('Prediction Confidence Distribution', fontsize=14)
plt.xlabel('Confidence Score')
plt.ylabel('Number of Predictions')
plt.axvline(np.mean(confidence_scores), color='red', linestyle='--',
           label=f'Mean: {np.mean(confidence_scores):.3f}')
plt.legend()

# Learning curve comparison
plt.subplot(2, 3, 6)
epochs = range(1, len(train_acc) + 1)
plt.plot(epochs, train_acc, 'b-', label='Training Accuracy', linewidth=2)
plt.plot(epochs, val_acc, 'r-', label='Validation Accuracy', linewidth=2)
plt.fill_between(epochs, train_acc, alpha=0.3, color='blue')
plt.fill_between(epochs, val_acc, alpha=0.3, color='red')
plt.title('Learning Curves Comparison', fontsize=14)
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"\n📊 Final Performance Summary:")
print(f"   Best training accuracy: {max(train_acc):.4f}")
print(f"   Best validation accuracy: {max(val_acc):.4f}")
print(f"   Final test accuracy: {test_accuracy:.4f}")
print(f"   Average prediction confidence: {np.mean(confidence_scores):.3f}")
print(f"   Model converged in: {len(train_acc)} epochs")

# Identify best and worst performing emotions
best_emotion_idx = np.argmax(class_accuracy)
worst_emotion_idx = np.argmin(class_accuracy)
print(f"   Best recognized emotion: {emotion_names[best_emotion_idx]} ({class_accuracy[best_emotion_idx]:.3f})")
print(f"   Most challenging emotion: {emotion_names[worst_emotion_idx]} ({class_accuracy[worst_emotion_idx]:.3f})")

In [None]:

# ========================= STEP 17: Saving the Model =========================
print("\n💾 STEP 17: Saving the Trained Model")
print("-" * 50)
print("Saving our trained model so we can use it later without retraining.")

# Save the model
model.save('/content/emotion_recognition_model.h5')
print("✅ Model saved as 'emotion_recognition_model.h5'")

# Save emotion labels mapping
import json
with open('/content/emotion_labels.json', 'w', encoding='utf-8') as f:
    json.dump(EMOTIONS, f, ensure_ascii=False, indent=2)
print("✅ Emotion labels saved as 'emotion_labels.json'")

# Save training history
history_dict = {
    'train_accuracy': train_acc,
    'val_accuracy': val_acc,
    'train_loss': train_loss,
    'val_loss': val_loss,
    'test_accuracy': float(test_accuracy),
    'test_loss': float(test_loss)
}

with open('/content/training_history.json', 'w') as f:
    json.dump(history_dict, f, indent=2)
print("✅ Training history saved as 'training_history.json'")

print(f"\n📁 All files saved to /content/:")
print(f"   • emotion_recognition_model.h5 (trained model)")
print(f"   • emotion_labels.json (emotion mappings)")
print(f"   • training_history.json (training metrics)")

In [None]:

# ========================= STEP 18: Testing Individual Predictions =========================
print("\n🧪 STEP 18: Testing Individual Audio File Predictions")
print("-" * 50)
print("Let's test our model on individual audio files to see how it performs.")

def predict_emotion(file_path, model, emotions_dict):
    """
    Predict emotion for a single audio file.

    Args:
        file_path (str): Path to audio file
        model: Trained Keras model
        emotions_dict (dict): Emotion ID to name mapping

    Returns:
        tuple: (predicted_emotion_name, confidence_score, all_probabilities)
    """
    try:
        # Extract features
        features, audio = extract_audio_features(file_path)
        if features is None:
            return None, None, None

        # Prepare for LSTM
        lstm_features = prepare_lstm_features(features)
        lstm_features_padded = pad_sequence(lstm_features, max_length)
        lstm_features_batch = np.expand_dims(lstm_features_padded, axis=0)

        # Make prediction
        prediction = model.predict(lstm_features_batch, verbose=0)

        # Get results
        emotion_id = np.argmax(prediction)
        confidence = np.max(prediction)
        emotion_name = emotions_dict[emotion_id]

        return emotion_name, confidence, prediction[0]

    except Exception as e:
        print(f"❌ Error predicting emotion: {e}")
        return None, None, None

def analyze_prediction(prediction_probs, emotions_dict):
    """
    Analyze and display prediction probabilities.

    Args:
        prediction_probs (np.array): Probability for each emotion class
        emotions_dict (dict): Emotion ID to name mapping
    """
    print("   Prediction probabilities:")
    for i, prob in enumerate(prediction_probs):
        emotion_name = emotions_dict[i]
        print(f"     {emotion_name:10}: {prob:.3f} ({prob*100:.1f}%)")

# Test on several example files
print("🧪 Testing model predictions on sample files:")
test_indices = np.random.choice(len(file_df), size=min(5, len(file_df)), replace=False)

for i, idx in enumerate(test_indices):
    test_file = file_df.iloc[idx]
    print(f"\n📄 Test Example {i+1}:")
    print(f"   File: {test_file['filename']}")
    print(f"   Actual emotion: {test_file['emotion_label'].upper()}")
    print(f"   Actor: {test_file['actor']} ({test_file['gender']})")
    print(f"   Intensity: {'Normal' if test_file['intensity'] == 1 else 'Strong'}")

    # Make prediction
    predicted_emotion, confidence, all_probs = predict_emotion(
        test_file['file_path'], model, EMOTIONS
    )

    if predicted_emotion:
        print(f"   Predicted emotion: {predicted_emotion.upper()}")
        print(f"   Confidence: {confidence:.3f} ({confidence*100:.1f}%)")

        # Check if prediction is correct
        is_correct = predicted_emotion == test_file['emotion_label']
        print(f"   Result: {'✅ CORRECT' if is_correct else '❌ INCORRECT'}")

        # Show detailed probabilities
        analyze_prediction(all_probs, EMOTIONS)
    else:
        print("   ❌ Failed to make prediction")


In [None]:

# ========================= STEP 19: Understanding Model Behavior =========================
print("\n🔬 STEP 19: Understanding What the Model Learned")
print("-" * 50)
print("Let's analyze some patterns in our model's behavior.")

# Analyze common misclassifications
print("🔍 Analyzing common misclassifications:")
misclassified_pairs = {}

for actual, predicted in zip(y_test, y_pred_classes):
    if actual != predicted:
        pair = (EMOTIONS[actual], EMOTIONS[predicted])
        misclassified_pairs[pair] = misclassified_pairs.get(pair, 0) + 1

# Show top 5 most common misclassifications
if misclassified_pairs:
    sorted_pairs = sorted(misclassified_pairs.items(), key=lambda x: x[1], reverse=True)
    print("\nMost common misclassifications:")
    for i, ((actual, predicted), count) in enumerate(sorted_pairs[:5]):
        print(f"   {i+1}. {actual} → {predicted}: {count} times")

    print("\n💡 Insights:")
    print("   • Look for emotions that are commonly confused")
    print("   • Similar emotions (like 'calm' and 'neutral') might be hard to distinguish")
    print("   • This helps us understand model limitations")

# Analyze confidence by emotion
print(f"\n📊 Confidence analysis by emotion:")
confidence_by_emotion = {}
for actual, pred_prob in zip(y_test, y_pred):
    emotion_name = EMOTIONS[actual]
    confidence = np.max(pred_prob)
    if emotion_name not in confidence_by_emotion:
        confidence_by_emotion[emotion_name] = []
    confidence_by_emotion[emotion_name].append(confidence)

print("Average confidence per emotion:")
for emotion, confidences in confidence_by_emotion.items():
    avg_confidence = np.mean(confidences)
    print(f"   {emotion:10}: {avg_confidence:.3f}")

In [None]:

# ========================= STEP 20: Conclusion and Next Steps =========================
print("\n🎓 STEP 20: Tutorial Conclusion and Next Steps")
print("=" * 60)
print("🎉 Congratulations! You have successfully built an emotion recognition system!")
print("\n📚 What you learned:")
print("   ✅ Audio feature extraction (MFCC, Mel-spectrogram, Chroma, etc.)")
print("   ✅ Data preprocessing and augmentation techniques")
print("   ✅ LSTM neural network architecture for sequence classification")
print("   ✅ Model training with callbacks and optimization")
print("   ✅ Performance evaluation and visualization")
print("   ✅ Individual prediction and model interpretation")

print(f"\n📊 Your model's final performance:")
print(f"   🎯 Test Accuracy: {test_accuracy:.1%}")
print(f"   📈 Best Validation Accuracy: {max(val_acc):.1%}")
print(f"   🧠 Total Parameters: {total_params:,}")
print(f"   ⏱️ Training Epochs: {len(train_acc)}")

print(f"\n🚀 Next steps to improve your model:")
print("   1. 🎵 Try different audio features (spectral rolloff, tonnetz, etc.)")
print("   2. 🏗️ Experiment with different architectures (CNN+LSTM, Transformers)")
print("   3. 📊 Use more data augmentation techniques")
print("   4. ⚙️ Tune hyperparameters (learning rate, batch size, dropout)")
print("   5. 🎭 Add more emotion classes or intensity levels")
print("   6. 🔄 Try transfer learning with pre-trained models")
print("   7. 📱 Deploy your model to a web app or mobile app")

print(f"\n🔧 How to use your saved model:")
print("   ```python")
print("   from tensorflow.keras.models import load_model")
print("   model = load_model('emotion_recognition_model.h5')")
print("   emotion, confidence, _ = predict_emotion('new_audio.wav', model, EMOTIONS)")
print("   ```")

print(f"\n🌟 Applications of emotion recognition:")
print("   • 🎭 Interactive entertainment systems")
print("   • 🏥 Mental health monitoring")
print("   • 📞 Customer service analysis")
print("   • 🤖 Human-computer interaction")
print("   • 📚 Educational technology")
print("   • 🎬 Content recommendation systems")

print("\n" + "=" * 60)
print("🎊 Thank you for completing the RAVDESS Emotion Recognition Tutorial!")
print("Keep experimenting and learning! 🚀")
print("=" * 60)