# 🎮 Real-Time Emotion Detection for Game Integration

This notebook takes our trained emotion recognition model and applies it to a live webcam feed. The goal is to detect faces, predict their emotions in real-time, and prepare the output for integration with our adventure game.

**Key Steps:**
1.  **Load our trained model** (`emotion_model.pth`)
2.  **Access the webcam** for live video
3.  **Detect faces** in each frame using OpenCV
4.  **Preprocess** each face for our model
5.  **Predict the emotion** and display it on the screen
6.  **Discuss game integration** strategies

In [1]:
# =============================================================================
# IMPORT LIBRARIES
# =============================================================================
import cv2
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image
import pytorch_lightning as pl
import os
import requests  # For downloading the face detector

print("✅ All libraries imported successfully!")



✅ All libraries imported successfully!


In [2]:
# =============================================================================
# DEFINE THE CNN MODEL ARCHITECTURE
# =============================================================================
# We need to define the model architecture exactly as it was during training
# so we can load the saved weights correctly.


class EmotionCNN(pl.LightningModule):
    """Compact CNN for emotion recognition using PyTorch Lightning"""

    def __init__(self, num_classes=6, learning_rate=0.001):
        super().__init__()
        self.save_hyperparameters()
        self.learning_rate = learning_rate
        self.num_classes = num_classes

        # Convolutional layers with batch normalization
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 48x48 -> 24x24
        )

        self.conv2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 24x24 -> 12x12
        )

        self.conv3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 12x12 -> 6x6
        )

        # Global Average Pooling (more efficient than flattening)
        self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # 6x6x128 -> 1x1x128

        # Classification head
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, num_classes),
        )

    def forward(self, x):
        # Forward pass through the network
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.global_avg_pool(x)
        x = x.view(x.size(0), -1)  # Flatten
        x = self.classifier(x)
        return x


print("✅ EmotionCNN model class defined.")

✅ EmotionCNN model class defined.


# 🤔 Do We Really Need to Rebuild the Model to Run It?

That's an excellent question! And the answer is **yes, in this case, we do**. Here's why:

When we saved our model in the first notebook, we used `torch.save(model.state_dict(), ...)`. This method saves only the model's **parameters** (the learned weights and biases) in a dictionary. It **does not** save the model's architecture—the Python code that defines the layers (`Conv2d`, `Linear`, etc.) and the `forward` pass.

### The "Scaffold" Analogy 🏗️

Think of it like this:
1.  The `EmotionCNN` class is the **blueprint** for a building.
2.  `model = EmotionCNN()` creates an empty **scaffold** or skeleton of that building.
3.  `model.load_state_dict(...)` takes the saved materials (the weights) and **fills in the scaffold**.

Without the blueprint (`EmotionCNN` class), PyTorch wouldn't know where to put the saved weights.

### Why is this the recommended way?

-   **Portability & Safety**: It's the most robust and recommended method. It decouples the model's weights from its code, making it safer and easier to share and load across different projects without running potentially malicious code.
-   **Flexibility**: It allows you to modify the model's architecture in your code and still load compatible weights from older versions.

So, redefining the class here isn't "rebuilding" the model from scratch; it's simply providing the necessary structure for PyTorch to load our pre-trained weights.

In [4]:
# =============================================================================
# LOAD THE TRAINED MODEL AND FACE DETECTOR
# =============================================================================

# --- Load Emotion Recognition Model ---
MODEL_PATH = "emotion_model.pth"


def load_emotion_model(model_path):
    """Load the trained emotion recognition model"""
    if not os.path.exists(model_path):
        print(f"❌ Error: Model file not found at {model_path}")
        print(
            "Please make sure you have run the training notebook (01_full_of_emotion.ipynb) first!"
        )
        return None, None

    # Set weights_only=False because the checkpoint contains non-tensor data (like accuracy)
    # This is safe as we trust the source of the model file.
    checkpoint = torch.load(
        model_path, map_location=torch.device("cpu"), weights_only=False
    )

    # Create model instance
    model = EmotionCNN(num_classes=checkpoint["num_classes"])
    model.load_state_dict(checkpoint["model_state_dict"])
    model.eval()  # Set model to evaluation mode

    print(f"✅ Model loaded successfully from {model_path}")
    return model, checkpoint["emotion_names"]


model, emotion_names = load_emotion_model(MODEL_PATH)

# --- Load Face Detector (Haar Cascade) ---
HAAR_CASCADE_URL = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
HAAR_CASCADE_FILENAME = "haarcascade_frontalface_default.xml"

# Download the file if it doesn't exist
if not os.path.exists(HAAR_CASCADE_FILENAME):
    print(f"📥 Downloading face detector model: {HAAR_CASCADE_FILENAME}...")
    response = requests.get(HAAR_CASCADE_URL)
    if response.status_code == 200:
        with open(HAAR_CASCADE_FILENAME, "wb") as f:
            f.write(response.content)
        print("✅ Download complete.")
    else:
        print(
            f"❌ Failed to download face detector. Status code: {response.status_code}"
        )
        face_cascade = None

if os.path.exists(HAAR_CASCADE_FILENAME):
    face_cascade = cv2.CascadeClassifier(HAAR_CASCADE_FILENAME)
    print(f"✅ Face detector loaded successfully.")
else:
    face_cascade = None
    print("❌ Face detector could not be loaded.")

✅ Model loaded successfully from emotion_model.pth
📥 Downloading face detector model: haarcascade_frontalface_default.xml...
✅ Download complete.
✅ Face detector loaded successfully.


# 🤔 Should We Use a Larger Bounding Box?

You noticed that the face detection box can be quite tight around the face. That's a great observation! There are pros and cons to expanding it.

### The Argument for a **Tighter** Box (What we had before)
-   **Consistency with Training Data**: Our model was trained on the FER-2013 dataset, which contains tightly cropped 48x48 images of faces. By feeding it a similarly tight crop, we are giving it input that most closely matches what it learned from.
-   **Reduces Noise**: A tight crop eliminates distracting background elements, clothing, or other objects that are not relevant to the facial expression.

### The Argument for a **Wider** Box (What we'll do now)
-   **More Context**: A slightly larger box might capture more of the head and hairline, which can sometimes provide subtle cues about an expression.
-   **Better Visuals**: A padded box often looks more natural and less claustrophobic on the screen.
-   **Robustness**: It can help if the initial face detection is slightly off-center, ensuring the full face is still captured.

### The Verdict? Let's Experiment!

The only way to know for sure is to try it. We'll add a `padding` factor to the code below. This will expand the area we crop for the model and the box we draw on the screen. You can easily adjust this factor to see how it impacts both the visual result and the model's predictions.

In [6]:
# =============================================================================
# REAL-TIME EMOTION DETECTION
# =============================================================================


def predict_emotion(model, image_array, emotion_names):
    """
    Predict emotion from a face image array (grayscale, 48x48)
    """
    # Preprocess image: Convert to tensor, normalize
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize(mean=[0.5], std=[0.5])]
    )

    image = Image.fromarray(image_array)
    image_tensor = transform(image).unsqueeze(0)

    with torch.no_grad():
        output = model(image_tensor)
        probabilities = F.softmax(output, dim=1)
        pred_idx = torch.argmax(output, dim=1).item()
        confidence = probabilities[0][pred_idx].item()

    return emotion_names[pred_idx], confidence


# --- Main Loop ---
if model is not None and face_cascade is not None:
    cap = cv2.VideoCapture(0)  # 0 is the default webcam

    if not cap.isOpened():
        print("❌ Error: Could not open webcam.")
    else:
        print("🚀 Starting webcam feed... Press 'q' to quit.")

        while True:
            # Read frame from webcam
            ret, frame = cap.read()
            if not ret:
                break

            # Convert to grayscale for face detection
            gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Detect faces
            faces = face_cascade.detectMultiScale(
                gray_frame, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)
            )

            # Define a padding factor to expand the bounding box (e.g., 0.2 = 20% padding)
            padding_factor = 0.2

            # Process each detected face
            for x, y, w, h in faces:
                # Calculate padding
                pad_w = int(w * padding_factor)
                pad_h = int(h * padding_factor)

                # Calculate new, padded coordinates, ensuring they are within frame bounds
                x1 = max(0, x - pad_w // 2)
                y1 = max(0, y - pad_h // 2)
                x2 = min(frame.shape[1], x + w + pad_w // 2)
                y2 = min(frame.shape[0], y + h + pad_h // 2)

                # Extract the face ROI (Region of Interest) with padding
                face_roi = gray_frame[y1:y2, x1:x2]

                # Skip if the ROI is empty (can happen at the edge of the screen)
                if face_roi.size == 0:
                    continue

                # Resize to 48x48 for the model
                resized_face = cv2.resize(face_roi, (48, 48))

                # Predict emotion
                emotion, confidence = predict_emotion(
                    model, resized_face, emotion_names
                )

                # --- Display results on the frame ---
                # Draw the padded rectangle around the face
                cv2.rectangle(
                    frame, (x1, y1), (x2, y2), (255, 0, 0), 2
                )  # Blue rectangle

                # Create text for display
                label = f"{emotion} ({confidence:.2f})"

                # Put text above the rectangle
                cv2.putText(
                    frame,
                    label,
                    (x1, y1 - 10),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.9,
                    (255, 0, 0),
                    2,
                )

            # Display the resulting frame
            cv2.imshow("Real-Time Emotion Detection", frame)

            # Break the loop if 'q' is pressed
            if cv2.waitKey(1) & 0xFF == ord("q"):
                break

        # Release resources
        cap.release()
        cv2.destroyAllWindows()
        print("✅ Webcam feed stopped.")
else:
    print("⚠️ Skipping real-time detection due to missing model or face detector.")

🚀 Starting webcam feed... Press 'q' to quit.
✅ Webcam feed stopped.
✅ Webcam feed stopped.


> **Note on Display:** The `cv2.imshow()` function will open a new window to display the webcam feed. This is normal behavior. To close the feed, make sure the new window is active and press the 'q' key.

# 🔌 Game Integration: What's Next?

We now have a live emotion label! The final step is to send this information to our game engine.

**Current Output:**
- `emotion` (string): e.g., "Happy", "Sad", "Angry"
- `confidence` (float): e.g., 0.95

## 📡 Communication Methods

How can the game receive this data from our Python script?

1.  **Local Sockets (Recommended):**
    - Our Python script acts as a **server**.
    - The game acts as a **client**.
    - The script sends the emotion string over a local network connection (e.g., `localhost:12345`).
    - **Pros:** Real-time, low latency.
    - **Cons:** Requires network programming in the game engine.

2.  **Writing to a File:**
    - The Python script continuously writes the current emotion to a text file (e.g., `emotion.txt`).
    - The game continuously reads this file.
    - **Pros:** Very simple to implement.
    - **Cons:** Slower, potential for file access conflicts (less robust).

3.  **Web Server (Advanced):**
    - The Python script runs a lightweight web server (e.g., using Flask).
    - The game makes HTTP requests to get the latest emotion.
    - **Pros:** Flexible, can be used remotely.
    - **Cons:** More complex setup.

**For our project, using a simple socket connection is the most balanced approach for real-time performance and moderate implementation effort.**