# Project: Emotion Detection

In this project, you will adapt the original Cat vs. Dog project to work with the **RAVDESS_Emotional_speech_audio** dataset for emotion detection. Below is a summary of the key modifications you will need to make, along with instructions on what parts you need to complete:

---

#### 1. Dataset and File Structure

- **Original Project:**  
  - Used a dataset with files labeled as either "cat" or "dog".

- **New Emotion Dataset (RAVDESS):**  
  - The dataset is organized into 24 subdirectories (e.g., `Actor_01`, `Actor_02`, …, `Actor_24`), each containing 60 `.wav` files.
  - **Filename Format:**  
    The file names follow this format:  
    ```
    Modality-VocalChannel-Emotion-EmotionalIntensity-Statement-Repetition-Actor.wav
    ```
    For example, `03-01-06-01-02-01-12.wav` means:  
    - **Emotion Code:** The 3rd part (`06`) indicates the emotion (in this case, "Fearful").

- **Your Task:**  
  - Adapt the code to recursively load `.wav` files from the RAVDESS dataset.
  - Parse the filename (split by `'-'`) to extract the emotion code.
  
---

#### 2. Label Mapping

- **Original Project:**  
  - Labels were assigned based on the filename prefix ("cat" or "dog").

- **New Emotion Labels:**  
  - Create a mapping from the emotion code (from the filename) to a numerical label. For example:
    - `"01"` → Neutral  
    - `"02"` → Calm  
    - `"03"` → Happy  
    - `"04"` → Sad  
    - `"05"` → Angry  
    - `"06"` → Fearful  
    - `"07"` → Disgust  
    - `"08"` → Surprised

- **Your Task:**  
  - Use a dictionary (e.g., `emotion_map`) to map these codes to labels.
  - Convert the labels to integers (e.g., 0 to 7).

---

#### 3. Multi-class Classification

- **Original Project:**  
  - Solved a binary classification problem (cat vs. dog).

- **New Emotion Detection:**  
  - You now have 8 emotion classes.
  - **Random Forest Classifier:**  
    - Continue using statistical summaries of MFCC features, but ensure that the classifier is trained with multi-class labels.
  - **Convolutional Neural Network (CNN):**  
    - Change the output layer to have **8 neurons** with a **softmax activation** function.
    - Update the loss function to `sparse_categorical_crossentropy` for multi-class classification.

- **Your Task:**  
  - Modify the CNN architecture accordingly.
  - Verify that your evaluation metrics and confusion matrix display the 8 classes.

---

#### 4. Feature Extraction and Visualization

- **Similarities:**  
  - The process for extracting MFCC features (including deltas) remains largely the same.
  - Visualization of the audio waveform and MFCC heatmap (using `imshow` or similar) is still applicable.

- **Your Task:**  
  - Adapt the feature extraction code to work with the RAVDESS dataset.
  - Ensure that the visualization parts still help you verify the quality of the extracted features.

---

#### 5. Additional Notebook for Inference on New Audio

- **New Requirement:**  
  - Create a **separate notebook** where you:
    - Record your own voice using the microphone.
    - Apply the trained model (which you saved from this project) to predict the emotion in your recording.
  
- **Your Task:**  
  - Build a new notebook that includes:
    - Code to record audio from the microphone.
    - Feature extraction code (to compute MFCCs from your recording).
    - Code to load the pre-trained model (Random Forest and/or CNN) and output the predicted emotion.
  

## Imports and Helper Functions

In [None]:
import os
import glob
import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import sounddevice as sd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.models import load_model

#### Emotion mapping: convert emotion code (string) to integer label (0-based)

In [None]:
emotion_map = {
    "01": 0,  # Neutral
    "02": 1,  # Calm
    "03": 2,  # Happy
    "04": 3,  # Sad
    "05": 4,  # Angry
    "06": 5,  # Fearful
    "07": 6,  # Disgust
    "08": 7   # Surprised
}

emotion_labels = ["Neutral", "Calm", "Happy", "Sad", "Angry", "Fearful", "Disgust", "Surprised"]

#### Function to extract statistical MFCC features (for Random Forest)

In [None]:
def extract_statistical_features(file_path, n_mfcc=13):
    try:
        y, sr = librosa.load(file_path, sr=None)
        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
        mfcc_delta = librosa.feature.delta(mfcc)
        mfcc_delta2 = librosa.feature.delta(mfcc, order=2)
        mfcc_mean = np.mean(mfcc, axis=1)
        mfcc_std = np.std(mfcc, axis=1)
        delta_mean = np.mean(mfcc_delta, axis=1)
        delta_std = np.std(mfcc_delta, axis=1)
        delta2_mean = np.mean(mfcc_delta2, axis=1)
        delta2_std = np.std(mfcc_delta2, axis=1)
        features = np.concatenate([mfcc_mean, mfcc_std, delta_mean, delta_std, delta2_mean, delta2_std])
        return features, y, sr, mfcc
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return None, None, None, None

#### Function to extract fixed-size MFCC image for CNN

In [None]:
def get_mfcc_image(file_path, n_mfcc=13, max_len=216):
    try:
        y, sr = librosa.load(file_path, sr=None)
        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
        # Pad or truncate the MFCC to have a fixed number of frames (max_len)
        mfcc_fixed = librosa.util.fix_length(mfcc, size=max_len, axis=1)
        return mfcc_fixed
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return None

## Load Dataset and Extract Features

#### Define dataset path and recursively get all .wav files from the dataset

In [None]:
dataset_path = '../../Codes/datasets/RAVDESS_Emotional_speech_audio'

audio_files = glob.glob(os.path.join(dataset_path, '**', '*.wav'), recursive=True)
print("Number of audio files found:", len(audio_files))

#### Initialize lists for features and labels

In [None]:
features_list = []
labels_list = []
waveforms = []   # For visualization
sample_rates = []  # For visualization
mfccs = []       # For visualization
file_names = []  # To store file paths

# For CNN images
cnn_images = []


#### Process each audio file

In [None]:
for file_path in audio_files:
    # Extract statistical features for Random Forest
    features, y, sr, mfcc = extract_statistical_features(file_path, n_mfcc=13)
    if features is not None:
        features_list.append(features)
        waveforms.append(y)
        sample_rates.append(sr)
        mfccs.append(mfcc)
        file_names.append(file_path)
        
        # Parse the filename to extract the emotion code (3rd part of the filename)
        base_name = os.path.basename(file_path)
        parts = base_name.split('-')
        if len(parts) >= 3:
            emotion_code = parts[2]
            label = emotion_map.get(emotion_code, -1)  # default to -1 if not found
            labels_list.append(label)
        else:
            labels_list.append(-1)
        
        # Extract MFCC image for CNN classifier
        mfcc_img = get_mfcc_image(file_path, n_mfcc=13, max_len=216)
        if mfcc_img is not None:
            cnn_images.append(mfcc_img)

#### Convert lists to numpy arrays

In [None]:
features_array = np.array(features_list)
labels_array = np.array(labels_list)
cnn_images = np.array(cnn_images)

print("Features array shape (Random Forest):", features_array.shape)
print("Labels array shape:", labels_array.shape)
print("CNN images shape:", cnn_images.shape)  # Expected shape: (num_samples, 13, 216)

## Visualize an Example Audio File and its MFCC Heatmap

#### Plot the audio waveform

In [None]:
# CODE

In [None]:
# CODE

#### Plot the MFCC heatmap using imshow (as an alternative to specshow)

In [None]:
# CODE

## Random Forest classifier

#### Train-Test Split

In [None]:
# CODE

#### Train a Random Forest classifier

In [None]:
# CODE

#### Evaluate on the test set

In [None]:
# CODE

#### Plot the confusion matrix

In [None]:
# CODE

## Prepare Data and Train the CNN Classifier

#### Expand dimensions of CNN images to add a channel dimension (required by CNNs)

In [18]:
import tensorflow as tf
import numpy as np

test = [[28,28],[28,28]]
test_tensor = np.array(test)

test_tensor = tf.expand_dims(test_tensor, axis=1)

test_tensor.shape

TensorShape([2, 1, 2])

In [None]:
cnn_images_exp = cnn_images[..., np.newaxis]  # New shape: (num_samples, 13, 216, 1)
cnn_images_exp = tf.expand_dims(cnn_images, axis=1)

#### Split data for the CNN classifier

In [None]:
# CODE

#### Build the CNN model for multi-class classification

In [None]:
# CODE


In [None]:
# CODE

#### Compile the model

In [None]:
# CODE

#### Train the CNN model

In [None]:
# CODE

#### Evaluate the CNN model on the test set

In [None]:
# CODE

## Save the Trained Models

In [None]:
# CODE


## Load the saved models

In [None]:
# CODE

## Inference on a New Audio File or via Microphone Recording

In [None]:
# Set source = 'file' to use an external audio file (e.g., 'new_audio.wav')
# Set source = 'mic' to record audio from the microphone

source = 'mic'  # Change to 'file' if you want to use an external file

if source == 'file':
    # Inference using an external audio file
    new_audio_path = 'new_audio.wav'  # Provide the path to your audio file
    
    # Extract features for the Random Forest model
    features_new, y_new, sr_new, mfcc_new = extract_statistical_features(new_audio_path, n_mfcc=13)
    if features_new is not None:
        features_new = features_new.reshape(1, -1)
    
    # Extract MFCC image for the CNN model
    mfcc_img_new = get_mfcc_image(new_audio_path, n_mfcc=13, max_len=216)
    if mfcc_img_new is not None:
        mfcc_img_new = mfcc_img_new[np.newaxis, ..., np.newaxis]

elif source == 'mic':
    duration = 3  # seconds to record
    fs = 48000   # Sampling rate
    print(f"Recording audio for {duration} seconds...")
    recording = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='float32')
    sd.wait() 
    y_new = recording.flatten()  
    sr_new = fs
    print("Recording complete.")

    mfcc_record = librosa.feature.mfcc(y=y_new, sr=sr_new, n_mfcc=13)
    mfcc_delta = librosa.feature.delta(mfcc_record)
    mfcc_delta2 = librosa.feature.delta(mfcc_record, order=2)
    mfcc_mean = np.mean(mfcc_record, axis=1)
    mfcc_std = np.std(mfcc_record, axis=1)
    delta_mean = np.mean(mfcc_delta, axis=1)
    delta_std = np.std(mfcc_delta, axis=1)
    delta2_mean = np.mean(mfcc_delta2, axis=1)
    delta2_std = np.std(mfcc_delta2, axis=1)
    
    # Concatenate statistical features for Random Forest prediction
    features_new = np.concatenate([mfcc_mean, mfcc_std, delta_mean, delta_std, delta2_mean, delta2_std]).reshape(1, -1)

    
    # For the CNN model, create a fixed-size MFCC image
    max_len = 216
    mfcc_img_new = librosa.util.fix_length(mfcc_record, size=max_len, axis=1)
    mfcc_img_new = mfcc_img_new[np.newaxis, ..., np.newaxis]
else:
    raise ValueError("Invalid source selected. Please set source to 'file' or 'mic'.")

#### Make predictions using the trained models

In [None]:
emotion_labels = ["Neutral", "Calm", "Happy", "Sad", "Angry", "Fearful", "Disgust", "Surprised"]

rf_pred_new = rf_clf.predict(features_new)[0]
cnn_pred_probs_new = cnn_model.predict(mfcc_img_new)[0]
cnn_pred_new = np.argmax(cnn_pred_probs_new)

print("Random Forest Prediction:", emotion_labels[rf_pred_new])
print("CNN Prediction:", emotion_labels[cnn_pred_new])

## Compile a report detailing your challenges faced, and the performance of the sentiment analysis.

Your report Here