# GESCAM Inference Demo

This notebook demonstrates how to use a trained GESCAM model for gaze prediction in classroom settings.

## Setup

First, let's import the necessary libraries:

In [None]:


import os
import sys
import torch
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import cv2
from tqdm.notebook import tqdm
import urllib.request

Let's download the inference script if not already available:


In [None]:
# Check if the inference script exists, if not create it
inference_script_path = 'gescam_inference.py'

if not os.path.exists(inference_script_path):
    # Download from your repository or copy the code here
    print("Creating inference script...")
    with open(inference_script_path, 'w') as f:
        f.write("""
# Paste the entire content of the gescam_inference.py script here
""")
    print("Inference script created.")


Now import the GazeInference class from the script:

In [None]:
# Add the current directory to path if needed
if '.' not in sys.path:
    sys.path.append('.')

# Import the GazeInference class
from gescam_inference import GazeInference


## Load Model

Load your trained model:



In [None]:
# Path to your model
model_path = 'path/to/your/model.pt'  # Update this with your model path

# Initialize inference module
inference = GazeInference(model_path)


## Process a Single Image

Let's test the model with a single image:

In [None]:
# Path to test image (update with your image)
test_image_path = 'path/to/your/test/image.jpg'

# Display the image
img = Image.open(test_image_path)
plt.figure(figsize=(10, 6))
plt.imshow(np.array(img))
plt.title("Test Image")
plt.axis('off')
plt.show()


Now let's detect faces in the image:

In [None]:
# Load face detector
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Detect faces
img_array = np.array(img)
gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 4)

# Convert to normalized coordinates
height, width = img_array.shape[:2]
head_boxes = []

for (x, y, w, h) in faces:
    # Convert to normalized coordinates
    x1, y1 = x / width, y / height
    x2, y2 = (x + w) / width, (y + h) / height
    
    head_boxes.append([x1, y1, x2, y2])

# Display image with face detection boxes
plt.figure(figsize=(10, 6))
plt.imshow(img_array)
for head_bbox in head_boxes:
    x1, y1, x2, y2 = head_bbox
    # Convert to pixel coordinates for plotting
    x1, x2 = x1 * width, x2 * width
    y1, y2 = y1 * height, y2 * height
    plt.gca().add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1,
                               fill=False, edgecolor='green', linewidth=2))
plt.title(f"Detected {len(head_boxes)} faces")
plt.axis('off')
plt.show()

# If no faces detected, use a default box
if not head_boxes:
    print("No faces detected, using default box")
    head_boxes = [[0.4, 0.4, 0.6, 0.6]]  # Default box in the center


Now let's predict the gaze for each detected face:

In [None]:
for i, head_bbox in enumerate(head_boxes):
    # Predict gaze
    gaze_heatmap, in_frame_prob, visualization = inference.predict(
        img, head_bbox
    )
    
    # Display the visualization
    plt.figure(figsize=(12, 8))
    plt.imshow(visualization)
    plt.title(f"Face {i+1}: Gaze Prediction (In-frame probability: {in_frame_prob:.2f})")
    plt.axis('off')
    plt.show()
    
    # You can also access the raw heatmap separately
    plt.figure(figsize=(6, 6))
    plt.imshow(gaze_heatmap, cmap='jet')
    plt.title(f"Raw Gaze Heatmap for Face {i+1}")
    plt.axis('off')
    plt.colorbar()
    plt.show()


## Process a Video

Let's also try processing a video:

In [None]:

# Path to test video (update with your video)
test_video_path = 'path/to/your/test/video.mp4'

# Output video path
output_video_path = 'gaze_prediction_output.mp4'

# Create a face tracker function
def face_tracker(frame_cascade):
    def tracker(frame):
        # Convert to grayscale for face detection
        gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
        
        # Detect faces
        faces = frame_cascade.detectMultiScale(gray, 1.1, 4)
        
        # Convert to normalized coordinates
        height, width = frame.shape[:2]
        head_boxes = []
        
        for (x, y, w, h) in faces:
            # Convert to normalized coordinates
            x1, y1 = x / width, y / height
            x2, y2 = (x + w) / width, (y + h) / height
            
            head_boxes.append([x1, y1, x2, y2])
        
        # If no faces detected, use a default box
        if not head_boxes:
            head_boxes = [[0.4, 0.4, 0.6, 0.6]]  # Default box
            
        return head_boxes
    
    return tracker

# Process video
inference.process_video(
    test_video_path,
    output_path=output_video_path,
    head_tracker=face_tracker(face_cascade),
    detector=None,  # No object detector for simplicity
    sample_rate=5  # Process every 5th frame for speed
)

# Display a link to download the video
from IPython.display import HTML
if os.path.exists(output_video_path):
    print(f"Video saved to: {output_video_path}")
    # Display video if in a notebook
    try:
        from IPython.display import Video
        display(Video(output_video_path, width=640))
    except ImportError:
        print("IPython.display.Video not available. Cannot display the video inline.")
else:
    print("Error: Output video was not created.")




## Conclusion

This notebook demonstrated how to use the GESCAM model for gaze prediction. The model can be used with both images and videos, and can be integrated with face detection for automatic head tracking.

For best results, consider:
1. Using a specialized head detector instead of a face detector
2. Integrating an object detector for classroom objects
3. Fine-tuning the model on your specific classroom environment