# Facial expression detection

#### This notebook captures emotional states based on facial expressions using real-time webcam capture. It is used to test the models trained in the 'expression_model.ipynb' notebook in a real environment with variance in camera placement, lighting, etc. 

##### Each model is tested for its predictions in the following order
1. Neutral
2. Happy
3. Sad
4. Angry
5. Surprise
6. Fear
7. Disgust


### Import libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os
import cv2
import dlib
import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle
import timeit
import datetime

In [2]:
# Libraries and versions
libraries = {
    'pandas': pd.__version__,
    'numpy': np.__version__,
    'opencv-python': cv2.__version__,
    'dlib': dlib.__version__,
    'tensorflow': tf.__version__,
}

# Write to the requirements file and print
with open('requirements.txt', 'w') as f:
    for lib, version in libraries.items():
        f.write(f"{lib}=={version}\n")
        print(f"{lib}: {version}") 

pandas: 2.2.3
numpy: 1.23.5
opencv-python: 4.10.0
dlib: 19.24.6
tensorflow: 2.12.0


## Helper functions

#### Metadata in pickle file

In [3]:
# Saves loaded model metadata to a pickle file.
def pickle_metadata(model_path, emotions_map, pickle_path):
    metadata = {
        "model_path": model_path,
        "emotion_labels": emotions_map
    }
    
    with open(pickle_path, 'wb') as pkl_file:
        pickle.dump(metadata, pkl_file)
    
    print(f"Model metadata saved to: {pickle_path}")

### Real-time emotion detection from webcam capture

In [4]:
# Performs real-time emotion detection with user feedback using pre-trained model and video capture.
def emotion_detection(pickle_path, img_size, emotion_frames):
    # Load Dlib's face detector
    detector = dlib.get_frontal_face_detector()
    
    # Load model and metadata from the pickle file
    with open(pickle_path, 'rb') as pickle_file:
        model_metadata = pickle.load(pickle_file)

    # Initialize model and motions labels from pickle metadata
    model = load_model(model_metadata["model_path"])   # Load the pre-trained model 
    emotions = model_metadata["emotion_labels"]  # Load the emotion classes
    
    # Initialize emotion count dictionary and variable for tracking total no. of emotions observed
    # Used to return a summary of all observed emotions without relying on emotion stabilization 
    emotion_count = {label: 0 for label in emotions} # Emotion label dictionary 
    emotion_count_total = 0 
    
    # Initialize user key-press feedback (i.e. True or False emotion observed)
    user_feedback = []             # List for emotion feedback
    detected_emotion = None        # Track the last emotion that was logged

    # Initialize variable for stabilizing emotions predicted by the model
    # Used to ensure that predicted emotions are stable for a certain ammount of frames (i.e. 'emotion_frames' input argument)
    # Helps to provide the users with enough time to identify and provide feedback for the observed emotion
    emotion_queue = []   # FIFO queue for tracking observed emotions in the input frame count threshold, used as a buffer to stabilize emotion over time
    emotion_prev = None  # Previous stabilized emotion from queue
    emotion_curr = None  # Current stabilized emotion from queue
    stable_frames = 0    # Counter for how many frames the emotion has been stable

    # Initialize video capture from webcam
    cap = cv2.VideoCapture(0)

    # Initial color for rectangle around detected face: Used to revert color change of user input key-press 
    # True: Green, False: Red, No input: Green
    rect_color = (255, 0, 0) # Green color

    # Guide text for first 10 seconds
    start_time = timeit.default_timer() # Start timer: Used to clear the text
    text, font, scale, thickness = "Press 'T' if the detected emotion is correct and 'F' if incorrect", cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1
    text_size = cv2.getTextSize(text, font, scale, thickness)[0]
    text_pos = ((int(cap.get(cv2.CAP_PROP_FRAME_WIDTH )) - text_size[0]) // 2, int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT )) - 20) # Display in the middle at bottom

    # Terminal start-up prints
    print("Press 'T' if the detected emotion is correct and 'F' if incorrect")
    print("Press 'Q' to quit.")
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        # Print user input instructions 
        if start_time is not None:
            # Get elapsed time since starting video capture
            elapsed_time = timeit.default_timer() - start_time
            # Print 10 seconds
            if elapsed_time  < 10:
                cv2.putText(frame, text, text_pos, font, scale, (180, 190, 180), thickness)
            else:
                # Clear start_time variable afterwards 
                start_time = None

        # Convert the frame to grayscale
        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Detect faces using Dlib's face detector (performs better than the Haar Cascade Classifier previously used)
        faces = detector(gray_frame)

        # For each detected face (i.e. coordinates of detected face)
        for face in faces:
            # Capture face region and repare image for model classification
            x, y, w, h = face.left(), face.top(), face.width(), face.height()
            face_region = gray_frame[y:y+h, x:x+w]                   # Capture face region
            face_resized = cv2.resize(face_region, img_size)         # Resize region to input image size
            face_normalized = face_resized / 255.0                   # Normalize
            face_reshaped = np.expand_dims(face_normalized, axis=-1) # Add channel dimension
            face_input = np.expand_dims(face_reshaped, axis=0)       # Add batch dimension

            # Predict the emotion using the pre-trained model
            prediction = model.predict(face_input, verbose=0)        # Pre-trained model loaded from the pickle file
            prediction_max = np.argmax(prediction)                   # Index of emotion with the highest probability
            prediction_label = emotions[prediction_max]              # Map emotion with the highest probability to the labels of the pickle file 

            # Populate the emotion queue to obtained the max obs. emotion of the desired frames (i.e. the frame input argument)
            # Stabilizes emotion prediction to enable user feedback
            emotion_queue.append(prediction_label)   # Appends predicted emotions
            if len(emotion_queue) > emotion_frames:   
                emotion_queue.pop(0)                 # Pop first emotion label if queue exceeds input frames
                
            # Get the current emotion from the emotion prediction queue
            emotion_curr = max(set(emotion_queue), key=emotion_queue.count)  # Max observed emotion label is the current (stable) emotion
    
            # Track changes in observed emotion
            if emotion_curr == emotion_prev:
                # Increase stable frames value if the emotion labels remains the same
                stable_frames += 1          # Increment frames
            else:
                # Reset stable frames value and previous label when a new emotion is observed. 
                emotion_prev = emotion_curr # Update previous observed emotion label
                stable_frames = 1           # Reset frames
  
            # Detect stabilized emotions (i.e. when an emotion has been observed for longer than the emotion frame input threshold argument)
            if stable_frames >= emotion_frames:
                if emotion_curr != detected_emotion:
                    detected_emotion = emotion_curr  # Define or update the detected emotion

            # Display the predicted emotion top left corner
            if detected_emotion:
                cv2.putText(frame, detected_emotion, (265, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (100, 255, 100), 2)
                
            # Display rectangle around the face region with current color
            cv2.rectangle(frame, (x, y), (x+w, y+h), rect_color, 2)
            
            # Reset rectangle color to blue
            if stable_frames > 5:
                rect_color = (255, 0, 0)

            # Update emotion label and total counters
            emotion_count[prediction_label] += 1  # Emotion label individual counts
            emotion_count_total += 1              # Emotion label individual counts

        # Display detection text
        cv2.putText(frame, "Detect emotion:", (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (100, 255, 100), 2)

        # Display image with expression classification annotation
        cv2.imshow('Facial Expression Detection', frame)

        # Handle key presses
        key = cv2.waitKey(1) & 0xFF
        # Press 'T' to mark the the current stable emotion as false
        if key == ord('t'):  
            if len(faces) > 0:
                rect_color = (0, 255, 0)                         # Update rectangle color to green shortly
                print(f"Set emotion '{emotion_curr}' as true.")  # Print choice in terminal
                # Append to user feedback list
                user_feedback.append({
                    "timestamp": datetime.datetime.now(),
                    "emotion": emotion_curr,
                    "true": True
                })
        # Press 'F' to mark the the current stable emotion as false
        elif key == ord('f'): 
            if len(faces) > 0:
                rect_color = (0, 0, 255)                          # Update rectangle color to red shortly
                print(f"Set emotion '{emotion_curr}' as false.")  # Print choice in terminal
                # Append to user feedback list
                user_feedback.append({
                    "timestamp": datetime.datetime.now(),
                    "emotion": emotion_curr,
                    "true": False
                })
        # Press 'Q' to exit the loop       
        elif key == ord('q'):  
            break

    # Release the video capture object and close all OpenCV windows
    cap.release()
    cv2.destroyAllWindows()

    # Calculate percentages and determine the overall state
    if emotion_count_total > 0:
        print("\nEmotion Summary:")
        overall_state = max(emotion_count, key=emotion_count.get)  # Get most frequently detected emotion
        # Print percentage for each emotional state
        for emotion, count in emotion_count.items():
            percentage = (count / emotion_count_total) * 100
            print(f"{emotion}: {percentage:.2f}% ")
        # Print the most frequently detected emotion
        print(f"\nOverall State: {overall_state} ")
    else:
        print("\nNo emotions were detected.")

    # Save user feedback data as a DataFrame
    user_feedback_df = pd.DataFrame(user_feedback)

    # Calculate and print overall true score from the df
    if not user_feedback_df.empty:
        true_score = user_feedback_df["true"].mean() * 100
        print(f"\nOverall True Score: {true_score:.2f}%")
    else:
        print("\nNo feedback recorded.")

    # Return user feedback dataframe
    return user_feedback_df


## Emotion detection using model pre-trained on the FER-2013 dataset

In [5]:
emotions_map_fer = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
pickle_metadata("expression_model_FER.h5", emotions_map_fer, "expression_model_FER_metadata.pkl")

Model metadata saved to: expression_model_FER_metadata.pkl


In [7]:
# Start webcam capture and emotion detection for FER-2013 model
emotion_detection("expression_model_FER_metadata.pkl", (48, 48), 15)


Press 'T' if the detected emotion is correct and 'F' if incorrect
Press 'Q' to quit.
Set emotion 'Fear' as false.
Set emotion 'Neutral' as true.
Set emotion 'Happy' as true.
Set emotion 'Sad' as true.
Set emotion 'Angry' as true.
Set emotion 'Fear' as false.
Set emotion 'Fear' as true.
Set emotion 'Sad' as false.

Emotion Summary:
Angry: 6.33% 
Disgust: 0.00% 
Fear: 19.49% 
Happy: 6.43% 
Sad: 42.93% 
Surprise: 0.00% 
Neutral: 24.83% 

Overall State: Sad 

Overall True Score: 62.50%


Unnamed: 0,timestamp,emotion,true
0,2024-11-22 12:04:02.901527,Fear,False
1,2024-11-22 12:04:06.290528,Neutral,True
2,2024-11-22 12:04:18.140090,Happy,True
3,2024-11-22 12:04:22.896089,Sad,True
4,2024-11-22 12:04:36.528089,Angry,True
5,2024-11-22 12:04:43.004089,Fear,False
6,2024-11-22 12:05:52.823273,Fear,True
7,2024-11-22 12:06:35.654849,Sad,False


### Summary

Based on the evaluation of the model, we expect it to have problems predicting most emotional states, except for 'happy' and 'surprise'. 

The models fail to correctly predict 'surprise' and 'disgust', and it was difficult to capture the correct emotional states. There also seems to be a strong bias towards the emotional state 'sad'.

Overall, it performed poorly.

## Emotion detection using model pre-trained on the RAF-DB dataset

In [14]:
emotions_map_raf = ['Surprise', 'Fear', 'Disgust', 'Happiness', 'Sadness', 'Anger', 'Neutral']
pickle_metadata("expression_model_RAF.h5", emotions_map_raf, "expression_model_RAF_metadata.pkl")

Model metadata saved to: expression_model_RAF_metadata.pkl


In [17]:
# Start webcam capture and emotion detection for RAF-DB model
emotion_detection("expression_model_RAF_metadata.pkl", (100, 100), 15)

Press 'T' if the detected emotion is correct and 'F' if incorrect
Press 'Q' to quit.
Set emotion 'Neutral' as true.
Set emotion 'Happiness' as true.
Set emotion 'Sadness' as true.
Set emotion 'Anger' as true.
Set emotion 'Surprise' as true.
Set emotion 'Fear' as false.
Set emotion 'Fear' as true.
Set emotion 'Disgust' as true.

Emotion Summary:
Surprise: 21.73% 
Fear: 2.69% 
Disgust: 8.64% 
Happiness: 7.06% 
Sadness: 32.87% 
Anger: 17.64% 
Neutral: 9.38% 

Overall State: Sadness 

Overall True Score: 87.50%


Unnamed: 0,timestamp,emotion,true
0,2024-11-22 12:25:11.955855,Neutral,True
1,2024-11-22 12:25:16.009923,Happiness,True
2,2024-11-22 12:25:20.006924,Sadness,True
3,2024-11-22 12:25:27.385924,Anger,True
4,2024-11-22 12:25:31.782925,Surprise,True
5,2024-11-22 12:25:44.267923,Fear,False
6,2024-11-22 12:25:45.820925,Fear,True
7,2024-11-22 12:26:55.292496,Disgust,True


### Summary

Based on the evaluation of the model, we expect it to have problems predicting only the emotional state 'fear'.

The models correctly predicted all emotional states. However, correctly predicting the emotional state 'fear' took a little longer than the rest, mostly overlapping with 'sad', although no other bias was observed during experiment. 

Overall it performed really well and it was easy to get it to make the correct predictions.


## Emotion detection using model pre-trained on FER-2013 and RAF-DB combined

In [18]:
emotions_map_comb = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
pickle_metadata("expression_model_COMB.h5", emotions_map_comb, "expression_model_COMB_metadata.pkl")

Model metadata saved to: expression_model_COMB_metadata.pkl


In [20]:
# Start webcam capture and emotion detection for RAF-DB model
emotion_detection("expression_model_COMB_metadata.pkl", (100, 100), 15)

Press 'T' if the detected emotion is correct and 'F' if incorrect
Press 'Q' to quit.
Set emotion 'Sad' as false.
Set emotion 'Happy' as true.
Set emotion 'Sad' as true.
Set emotion 'Angry' as true.
Set emotion 'Surprise' as true.
Set emotion 'Surprise' as true.
Set emotion 'Sad' as false.

Emotion Summary:
Angry: 12.76% 
Disgust: 0.00% 
Fear: 7.09% 
Happy: 10.70% 
Sad: 48.84% 
Surprise: 19.20% 
Neutral: 1.42% 

Overall State: Sad 

Overall True Score: 71.43%


Unnamed: 0,timestamp,emotion,true
0,2024-11-22 12:38:38.198864,Sad,False
1,2024-11-22 12:38:42.788864,Happy,True
2,2024-11-22 12:38:48.124863,Sad,True
3,2024-11-22 12:38:53.257864,Angry,True
4,2024-11-22 12:39:02.670865,Surprise,True
5,2024-11-22 12:39:48.431016,Surprise,True
6,2024-11-22 12:40:04.249034,Sad,False


### Summary

Based on the evaluation of the model, we expect it to have problems predicting 'fear' and 'disgust'. 

The model is not able to predict 'neutral' and 'disgust', but the other emotional states were fairly easy to capture. It retains much of the bias towards 'sad' from the model trained using only the FER-2013 dataset. 

Overall, it performed reasonably well.

## Conclusion

Based on the results of real-time facial expression prediction, both models trained on the FER-2013 dataset performed below the required standard in this real-life situation. However, the model trained on the RAF-DB dataset performed really well and proved to be very accurate under the test conditions.
