# Eye-Tracking Data Processing and Visualization Pipeline

Scripts were developed with the help of claude 3.5 Sonnet 

This collection of scripts processes eye-tracking data from driving simulations:

Data Processing:
- Normalizes raw gaze coordinates from various screen sizes to standard 1280x960 resolution
- Handles letterboxing/pillarboxing offsets across different displays
- Filters out low-quality data where users looked at screen edges >50% of time
- Separates data into preprocessedAndNormalized.csv and badgazedata.csv

Visualization Features:
- Creates individual videos showing user gaze patterns with smooth animations
- Uses cubic interpolation for fluid eye movement transitions 
- Displays gaze as glowing hot pink dot overlaid on driving footage
- Color-codes points by original screen size with detailed legend
- Shows normalized vs original coordinates for verification

Video Players:
- General player showing all driving videos with gaze overlay
- Hazard-specific player showing only videos with detected hazards
- Displays metadata like hazard severity, video ID, and timestamps
- Includes playback controls (pause/play/quit)

File Generation:
- Creates separate MP4 files named {user_email}_{video_id}.mp4
- Preserves original coordinate information and metadata
- Provides detailed processing summaries and progress tracking
- Includes error handling and data validation

Usage:
- Press 'p' to pause/unpause video playback
- Press 'q' to quit video playback


# Dependencies 

In [167]:
!pip install dataloader
!pip install scipy



# Driving Video Gaze Visualization Player 
Randomly selects and plays driving videos while overlaying real-time eye-tracking data that shows where users were looking during the simulation.

In [113]:
import pandas as pd
import cv2
import numpy as np
import random
import os
from pathlib import Path

class DrivingVideoPlayer:
    def __init__(self, video_folder_path, csv_path):
        """
        Initialize the video player with paths to video folder and gaze data CSV
        
        Args:
            video_folder_path (str): Path to folder containing driving videos
            csv_path (str): Path to CSV file containing gaze data
        """
        self.video_folder = Path(video_folder_path)
        self.gaze_data = pd.read_csv(csv_path)
        self.available_videos = list(self.video_folder.glob('*.mp4'))  # Adjust extension if needed
        
    def get_random_video(self):
        """Select a random video from the folder"""
        if not self.available_videos:
            raise FileNotFoundError("No videos found in specified folder")
        return random.choice(self.available_videos)
    
    def get_gaze_data_for_video(self, video_id):
        """
        Get all gaze data points for a specific video
        
        Args:
            video_id (str): ID of the video to get gaze data for
            
        Returns:
            pd.DataFrame: Filtered gaze data for the specified video
        """
        return self.gaze_data[self.gaze_data['videoId'] == video_id].sort_values('time')
    
    def draw_gaze_points(self, frame, gaze_points, current_time, time_window=0.5):
        """
        Draw gaze points on the frame
        
        Args:
            frame: Current video frame
            gaze_points (pd.DataFrame): Gaze data for the current video
            current_time (float): Current time in the video
            time_window (float): Time window to show gaze points (in seconds)
        """
        # Filter gaze points within the time window
        current_points = gaze_points[
            (gaze_points['time'] >= current_time - time_window) & 
            (gaze_points['time'] <= current_time)
        ]
        
        # Draw each gaze point
        for _, point in current_points.iterrows():
            x, y = int(point['x']), int(point['y'])
            # Draw point with user ID
            cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)
            cv2.putText(frame, f"User: {point['userId']}", 
                       (x + 10, y - 10), 
                       cv2.FONT_HERSHEY_SIMPLEX, 
                       0.5, (0, 255, 0), 1)
    
    def play_random_video(self):
        """Play a random video with overlaid gaze data"""
        # Get random video
        video_path = self.get_random_video()
        video_id = video_path.stem  # Assumes filename matches video ID in CSV
        
        # Get gaze data for this video
        video_gaze_data = self.get_gaze_data_for_video(video_id)
        
        # Open video
        cap = cv2.VideoCapture(str(video_path))
        if not cap.isOpened():
            raise ValueError(f"Could not open video: {video_path}")
        
        fps = cap.get(cv2.CAP_PROP_FPS)
        frame_count = 0
        
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
                
            current_time = frame_count / fps
            
            # Draw gaze points for current time
            self.draw_gaze_points(frame, video_gaze_data, current_time)
            
            # Add time indicator
            cv2.putText(frame, f"Time: {current_time:.2f}s", 
                       (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            
            # Show frame
            cv2.imshow('Driving Video with Gaze Data', frame)
            
            # Handle keyboard input
            key = cv2.waitKey(1) & 0xFF
            if key == ord('q'):  
                break
            elif key == ord('p'):  
                cv2.waitKey(0)
            
            frame_count += 1
        
        cap.release()
        cv2.destroyAllWindows()

def main():
    
    video_folder = "/Users/lennoxanderson/Documents/machineLearning/data/TeslaRawDrivingFootage/SplitData/1-4BatchSplits"  # Replace with your video folder path
    csv_path = "/Users/lennoxanderson/Documents/Research/Human-Alignment-Hazardous-Driving-Detection/final_user_survey_data.csv"  # Replace with your CSV file path
    
    try:
        player = DrivingVideoPlayer(video_folder, csv_path)
        player.play_random_video()
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

# Hazardous Driving Video Analyzer 
Selectively plays driving videos containing hazards while displaying both the hazard severity and real-time eye-tracking data of users who viewed these potentially dangerous situations.

In [143]:
import pandas as pd
import cv2
import numpy as np
import random
import os
from pathlib import Path

class DrivingVideoPlayer:
    def __init__(self, video_folder_path, csv_path):
        """
        Initialize the video player with paths to video folder and gaze data CSV
        
        Args:
            video_folder_path (str): Path to folder containing driving videos
            csv_path (str): Path to CSV file containing gaze data
        """
        self.video_folder = Path(video_folder_path)
        self.gaze_data = pd.read_csv(csv_path)
        # Get unique videos that had hazards detected
        hazard_videos = self.gaze_data[self.gaze_data['hazardDetected'] == True]['videoId'].unique()
        # Filter available videos to only those with hazards
        self.available_videos = [
            video for video in self.video_folder.glob('*.mp4')
            if video.stem in hazard_videos
        ]
        print(f"Found {len(self.available_videos)} videos with hazards")
        
    def get_random_hazard_video(self):
        """Select a random video from the folder that contains a hazard"""
        if not self.available_videos:
            raise FileNotFoundError("No videos with hazards found in specified folder")
        return random.choice(self.available_videos)
    
    def get_gaze_data_for_video(self, video_id):
        """
        Get all gaze data points for a specific video
        
        Args:
            video_id (str): ID of the video to get gaze data for
            
        Returns:
            pd.DataFrame: Filtered gaze data for the specified video
        """
        return self.gaze_data[self.gaze_data['videoId'] == video_id].sort_values('time')
    
    def draw_gaze_points(self, frame, gaze_points, current_time, time_window=0.5):
        """
        Draw gaze points on the frame
        
        Args:
            frame: Current video frame
            gaze_points (pd.DataFrame): Gaze data for the current video
            current_time (float): Current time in the video
            time_window (float): Time window to show gaze points (in seconds)
        """
        # Filter gaze points within the time window
        current_points = gaze_points[
            (gaze_points['time'] >= current_time - time_window) & 
            (gaze_points['time'] <= current_time)
        ]
        
        # Draw each gaze point
        for _, point in current_points.iterrows():
            x, y = int(point['x']), int(point['y'])
            # Draw point with user ID
            cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)
            cv2.putText(frame, f"User: {point['userId']}", 
                       (x + 10, y - 10), 
                       cv2.FONT_HERSHEY_SIMPLEX, 
                       0.5, (0, 255, 0), 1)
    
    def play_random_hazard_video(self):
        """Play a random video containing a hazard with overlaid gaze data"""
        # Get random video with hazard
        video_path = self.get_random_hazard_video()
        video_id = video_path.stem
        
        # Get video metadata
        video_metadata = self.gaze_data[self.gaze_data['videoId'] == video_id].iloc[0]
        hazard_severity = video_metadata['hazardSeverity']
        
        # Get gaze data for this video
        video_gaze_data = self.get_gaze_data_for_video(video_id)
        
        # Open video
        cap = cv2.VideoCapture(str(video_path))
        if not cap.isOpened():
            raise ValueError(f"Could not open video: {video_path}")
        
        fps = cap.get(cv2.CAP_PROP_FPS)
        frame_count = 0
        
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
                
            current_time = frame_count / fps
            
            # Draw gaze points for current time
            self.draw_gaze_points(frame, video_gaze_data, current_time)
            
            # Add time indicator and hazard info
            cv2.putText(frame, f"Time: {current_time:.2f}s", 
                       (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            cv2.putText(frame, f"Video ID: {video_id}", 
                       (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            cv2.putText(frame, f"Hazard Severity: {hazard_severity}", 
                       (10, 90), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            
            # Show frame
            cv2.imshow('Driving Video with Hazard and Gaze Data', frame)
            
            # Handle keyboard input
            key = cv2.waitKey(1) & 0xFF
            if key == ord('q'):  # Press 'q' to quit
                break
            elif key == ord('p'):  # Press 'p' to pause/unpause
                cv2.waitKey(0)
            
            frame_count += 1
        
        cap.release()
        cv2.destroyAllWindows()

def main():
    video_folder = "/Users/lennoxanderson/Documents/machineLearning/data/TeslaRawDrivingFootage/SplitData/1-4BatchSplits"
    csv_path = "/Users/lennoxanderson/Documents/Research/Human-Alignment-Hazardous-Driving-Detection/final_user_survey_data.csv"
    
    try:
        player = DrivingVideoPlayer(video_folder, csv_path)
        player.play_random_hazard_video()
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

Found 130 videos with hazards


# Normalized Hazard Video Gaze Visualizer
Plays hazardous driving videos while displaying color-coded eye-tracking data that has been normalized across different screen sizes to the original video resolution (1280x960), making gaze patterns directly comparable regardless of the viewer's original screen dimensions

In [163]:
import pandas as pd
import cv2
import numpy as np
import random
import os
from pathlib import Path

class DrivingVideoPlayer:
    def __init__(self, video_folder_path, csv_path):
        """
        Initialize the video player with paths to video folder and gaze data CSV
        """
        self.video_folder = Path(video_folder_path)
        self.gaze_data = pd.read_csv(csv_path)
        
        # Video dimensions (both original and target are the same)
        self.video_width = 1280
        self.video_height = 960
        
        # Normalize all gaze coordinates
        self.normalize_gaze_coordinates()
        
        # Get unique videos that had hazards detected
        hazard_videos = self.gaze_data[self.gaze_data['hazardDetected'] == True]['videoId'].unique()
        self.available_videos = [
            video for video in self.video_folder.glob('*.mp4')
            if video.stem in hazard_videos
        ]
        print(f"Found {len(self.available_videos)} videos with hazards")
    
    def calculate_video_display_size(self, screen_width, screen_height):
        """
        Calculate how the video would be displayed on a given screen size
        while maintaining aspect ratio
        """
        video_aspect = self.video_width / self.video_height
        screen_aspect = screen_width / screen_height
        
        if screen_aspect > video_aspect:
            # Height limited
            display_height = screen_height
            display_width = display_height * video_aspect
        else:
            # Width limited
            display_width = screen_width
            display_height = display_width / video_aspect
            
        return display_width, display_height
    
    def normalize_gaze_coordinates(self):
        """
        Normalize all gaze coordinates to the original video resolution (1280x960)
        accounting for how the video was displayed on different screen sizes
        """
        normalized_coordinates = []
        
        for _, row in self.gaze_data.iterrows():
            # Calculate how video was displayed on user's screen
            display_width, display_height = self.calculate_video_display_size(
                row['width'], row['height']
            )
            
            # Calculate the black bars (letterbox/pillarbox) offset
            x_offset = (row['width'] - display_width) / 2
            y_offset = (row['height'] - display_height) / 2
            
            # Remove the offset from the gaze coordinates
            adjusted_x = row['x'] - x_offset
            adjusted_y = row['y'] - y_offset
            
            # Convert from display coordinates to video coordinates
            video_x = (adjusted_x / display_width) * self.video_width
            video_y = (adjusted_y / display_height) * self.video_height
            
            # Store original values for reference
            row['original_x'] = row['x']
            row['original_y'] = row['y']
            
            # Update coordinates with normalized values
            row['x'] = np.clip(video_x, 0, self.video_width)
            row['y'] = np.clip(video_y, 0, self.video_height)
            
            normalized_coordinates.append(row)
        
        self.gaze_data = pd.DataFrame(normalized_coordinates)
        print("Normalized gaze coordinates to video resolution (1280x960)")

    def get_random_hazard_video(self):
        """Select a random video from the folder that contains a hazard"""
        if not self.available_videos:
            raise FileNotFoundError("No videos with hazards found in specified folder")
        return random.choice(self.available_videos)
    
    def get_gaze_data_for_video(self, video_id):
        """Get all gaze data points for a specific video"""
        return self.gaze_data[self.gaze_data['videoId'] == video_id].sort_values('time')
    
    def draw_gaze_points(self, frame, gaze_points, current_time, time_window=0.5):
        """Draw gaze points on the frame"""
        # Filter gaze points within the time window
        current_points = gaze_points[
            (gaze_points['time'] >= current_time - time_window) & 
            (gaze_points['time'] <= current_time)
        ]
        
        # Draw each gaze point
        for _, point in current_points.iterrows():
            x, y = int(point['x']), int(point['y'])
            
            # Color based on original screen size
            original_size = f"{int(point['width'])}x{int(point['height'])}"
            color_hash = hash(original_size) % 255
            point_color = (color_hash, 255, (color_hash + 125) % 255)
            
            # Draw point
            cv2.circle(frame, (x, y), 5, point_color, -1)
            
            # Draw information about the point
            cv2.putText(frame, f"User: {point['userId']}", 
                       (x + 10, y - 25), 
                       cv2.FONT_HERSHEY_SIMPLEX, 
                       0.5, point_color, 1)
            cv2.putText(frame, f"Screen: {original_size}", 
                       (x + 10, y - 10), 
                       cv2.FONT_HERSHEY_SIMPLEX, 
                       0.5, point_color, 1)
            
            # Draw original coordinates for verification
            if 'original_x' in point:
                cv2.putText(frame, 
                           f"Orig: ({int(point['original_x'])},{int(point['original_y'])})", 
                           (x + 10, y + 5), 
                           cv2.FONT_HERSHEY_SIMPLEX, 
                           0.5, point_color, 1)

    def play_random_hazard_video(self):
        """Play a random video containing a hazard with overlaid gaze data"""
        video_path = self.get_random_hazard_video()
        video_id = video_path.stem
        
        # Get video metadata
        video_metadata = self.gaze_data[self.gaze_data['videoId'] == video_id].iloc[0]
        hazard_severity = video_metadata['hazardSeverity']
        
        # Get gaze data for this video
        video_gaze_data = self.get_gaze_data_for_video(video_id)
        
        # Get unique screen sizes for legend
        screen_sizes = video_gaze_data[['width', 'height']].drop_duplicates()
        
        # Open video
        cap = cv2.VideoCapture(str(video_path))
        if not cap.isOpened():
            raise ValueError(f"Could not open video: {video_path}")
        
        fps = cap.get(cv2.CAP_PROP_FPS)
        frame_count = 0
        
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
                
            current_time = frame_count / fps
            
            # Draw gaze points for current time
            self.draw_gaze_points(frame, video_gaze_data, current_time)
            
            # Add information overlay
            cv2.putText(frame, f"Time: {current_time:.2f}s", 
                       (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            cv2.putText(frame, f"Video ID: {video_id}", 
                       (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            cv2.putText(frame, f"Hazard Severity: {hazard_severity}", 
                       (10, 90), cv2.FONT_HERSHEY_SIMPLEX, 
                       1, (255, 255, 255), 2)
            
            # Add screen size legend
            y_offset = 120
            cv2.putText(frame, "Original Screen Sizes:", 
                       (10, y_offset), cv2.FONT_HERSHEY_SIMPLEX, 
                       0.7, (255, 255, 255), 2)
            for _, size in screen_sizes.iterrows():
                y_offset += 25
                size_str = f"{int(size['width'])}x{int(size['height'])}"
                color_hash = hash(size_str) % 255
                color = (color_hash, 255, (color_hash + 125) % 255)
                cv2.putText(frame, size_str, 
                           (30, y_offset), cv2.FONT_HERSHEY_SIMPLEX, 
                           0.6, color, 2)
            
            # Show frame
            cv2.imshow('Driving Video with Normalized Gaze Data', frame)
            
            # Handle keyboard input
            key = cv2.waitKey(1) & 0xFF
            if key == ord('q'):
                break
            elif key == ord('p'):
                cv2.waitKey(0)
            
            frame_count += 1
        
        cap.release()
        cv2.destroyAllWindows()

def main():
    video_folder = "/Users/lennoxanderson/Documents/machineLearning/data/TeslaRawDrivingFootage/SplitData/1-4BatchSplits"
    csv_path = "/Users/lennoxanderson/Documents/Research/Human-Alignment-Hazardous-Driving-Detection/final_user_survey_data.csv"
    
    try:
        player = DrivingVideoPlayer(video_folder, csv_path)
        player.play_random_hazard_video()
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

Normalized gaze coordinates to video resolution (1280x960)
Found 130 videos with hazards


# Gaze Data Screen Size Normalizer
Reads raw eye-tracking data collected from various screen sizes and normalizes all gaze coordinates to match the original video dimensions (1280x960), accounting for letterboxing and pillarboxing while preserving a detailed record of the transformations applied.

In [161]:
import pandas as pd
import numpy as np
from pathlib import Path

def calculate_video_display_size(screen_width, screen_height, video_width=1280, video_height=960):
    """
    Calculate how the video would be displayed on a given screen size
    while maintaining aspect ratio
    """
    video_aspect = video_width / video_height
    screen_aspect = screen_width / screen_height
    
    if screen_aspect > video_aspect:
        # Height limited
        display_height = screen_height
        display_width = display_height * video_aspect
    else:
        # Width limited
        display_width = screen_width
        display_height = display_width / video_aspect
        
    # Calculate letterbox/pillarbox offsets
    x_offset = (screen_width - display_width) / 2
    y_offset = (screen_height - display_height) / 2
    
    return display_width, display_height, x_offset, y_offset

def normalize_gaze_data(input_csv, output_csv):
    """
    Read the input CSV, normalize gaze coordinates, and save to a new CSV
    """
    print(f"Reading data from {input_csv}")
    df = pd.read_csv(input_csv)
    
    # Original video dimensions
    VIDEO_WIDTH = 1280
    VIDEO_HEIGHT = 960
    
    # Lists to store normalized coordinates and additional info
    normalized_data = []
    
    print("Normalizing gaze coordinates...")
    for idx, row in df.iterrows():
        if idx % 1000 == 0:  # Progress indicator
            print(f"Processing row {idx}/{len(df)}")
            
        # Calculate display dimensions and offsets
        display_width, display_height, x_offset, y_offset = calculate_video_display_size(
            row['width'], row['height']
        )
        
        # Remove the offset from the gaze coordinates
        adjusted_x = row['x'] - x_offset
        adjusted_y = row['y'] - y_offset
        
        # Convert from display coordinates to video coordinates
        video_x = (adjusted_x / display_width) * VIDEO_WIDTH
        video_y = (adjusted_y / display_height) * VIDEO_HEIGHT
        
        # Clip coordinates to video boundaries
        normalized_x = np.clip(video_x, 0, VIDEO_WIDTH)
        normalized_y = np.clip(video_y, 0, VIDEO_HEIGHT)
        
        # Create new row with additional normalization info
        new_row = row.copy()
        
        # Store original coordinates and dimensions
        new_row['original_x'] = row['x']
        new_row['original_y'] = row['y']
        new_row['original_width'] = row['width']
        new_row['original_height'] = row['height']
        
        # Store display calculations
        new_row['display_width'] = display_width
        new_row['display_height'] = display_height
        new_row['x_offset'] = x_offset
        new_row['y_offset'] = y_offset
        
        # Update main coordinates and dimensions to video space
        new_row['x'] = normalized_x
        new_row['y'] = normalized_y
        new_row['width'] = VIDEO_WIDTH
        new_row['height'] = VIDEO_HEIGHT
        
        # Add normalization metadata
        new_row['normalized_to_width'] = VIDEO_WIDTH
        new_row['normalized_to_height'] = VIDEO_HEIGHT
        
        normalized_data.append(new_row)
    
    # Convert to DataFrame
    normalized_df = pd.DataFrame(normalized_data)
    
    # Save to CSV
    print(f"Saving normalized data to {output_csv}")
    normalized_df.to_csv(output_csv, index=False)
    
    # Print summary statistics
    print("\nNormalization Summary:")
    print(f"Total rows processed: {len(normalized_df)}")
    print(f"Original screen sizes: {len(normalized_df.groupby(['original_width', 'original_height']))}")
    print(f"All rows normalized to: {VIDEO_WIDTH}x{VIDEO_HEIGHT}")
    print("\nColumns modified:")
    print("- x, y: Normalized to video space coordinates")
    print("- width, height: Set to video dimensions")
    print("\nNew columns added:")
    new_columns = ['original_x', 'original_y', 'original_width', 'original_height',
                  'display_width', 'display_height', 'x_offset', 'y_offset',
                  'normalized_to_width', 'normalized_to_height']
    for col in new_columns:
        print(f"- {col}")

def main():
    # File paths
    input_csv = "/Users/lennoxanderson/Documents/Research/Human-Alignment-Hazardous-Driving-Detection/final_user_survey_data.csv"
    output_csv = "normalized_gaze_data.csv"
    
    try:
        normalize_gaze_data(input_csv, output_csv)
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

Reading data from /Users/lennoxanderson/Documents/Research/Human-Alignment-Hazardous-Driving-Detection/final_user_survey_data.csv
Normalizing gaze coordinates...
Processing row 0/40653
Processing row 1000/40653
Processing row 2000/40653
Processing row 3000/40653
Processing row 4000/40653
Processing row 5000/40653
Processing row 6000/40653
Processing row 7000/40653
Processing row 8000/40653
Processing row 9000/40653
Processing row 10000/40653
Processing row 11000/40653
Processing row 12000/40653
Processing row 13000/40653
Processing row 14000/40653
Processing row 15000/40653
Processing row 16000/40653
Processing row 17000/40653
Processing row 18000/40653
Processing row 19000/40653
Processing row 20000/40653
Processing row 21000/40653
Processing row 22000/40653
Processing row 23000/40653
Processing row 24000/40653
Processing row 25000/40653
Processing row 26000/40653
Processing row 27000/40653
Processing row 28000/40653
Processing row 29000/40653
Processing row 30000/40653
Processing row

# Smooth Gaze Path Video Generator
Creates individualized videos for each user showing their eye movement patterns as a smoothly animated hot pink dot, using cubic interpolation to create fluid transitions between gaze points and generating separate video files for each user's viewing session.

In [175]:
import pandas as pd
import cv2
import numpy as np
from pathlib import Path
from scipy.interpolate import interp1d

class UserGazeVideoCreator:
    def __init__(self, video_folder_path, csv_path, output_folder_path):
        """
        Initialize the video creator with paths
        """
        self.video_folder = Path(video_folder_path)
        self.output_folder = Path(output_folder_path)
        self.output_folder.mkdir(parents=True, exist_ok=True)
        
        # Hot pink color in BGR format
        self.dot_color = (147, 20, 255)  # RGB(255, 20, 147) in BGR
        
        # Read and normalize the gaze data
        print("Reading and normalizing gaze data...")
        self.gaze_data = self.read_normalized_gaze_data(csv_path)
        
        # Get unique users
        self.users = self.gaze_data['userId'].unique()
        print(f"Found {len(self.users)} unique users")

    def read_normalized_gaze_data(self, csv_path):
        """Read and normalize the gaze data"""
        df = pd.read_csv(csv_path)
        VIDEO_WIDTH = 1280
        VIDEO_HEIGHT = 960
        
        # Normalize coordinates if they haven't been normalized yet
        if 'normalized_to_width' not in df.columns:
            print("Normalizing coordinates...")
            for idx, row in df.iterrows():
                # Calculate display dimensions
                video_aspect = VIDEO_WIDTH / VIDEO_HEIGHT
                screen_aspect = row['width'] / row['height']
                
                if screen_aspect > video_aspect:
                    display_height = row['height']
                    display_width = display_height * video_aspect
                else:
                    display_width = row['width']
                    display_height = display_width / video_aspect
                
                # Calculate offsets
                x_offset = (row['width'] - display_width) / 2
                y_offset = (row['height'] - display_height) / 2
                
                # Normalize coordinates
                df.at[idx, 'x'] = ((row['x'] - x_offset) / display_width) * VIDEO_WIDTH
                df.at[idx, 'y'] = ((row['y'] - y_offset) / display_height) * VIDEO_HEIGHT
        
        return df

    def interpolate_gaze_points(self, video_gaze_data, fps):
        """
        Create smooth interpolation between gaze points
        """
        # Get original time points and coordinates
        times = video_gaze_data['time'].values
        x_coords = video_gaze_data['x'].values
        y_coords = video_gaze_data['y'].values
        
        # Create interpolation functions
        x_interp = interp1d(times, x_coords, kind='cubic', bounds_error=False, fill_value='extrapolate')
        y_interp = interp1d(times, y_coords, kind='cubic', bounds_error=False, fill_value='extrapolate')
        
        # Create timestamps for every frame
        frame_times = np.arange(times[0], times[-1], 1/fps)
        
        # Interpolate positions for every frame
        x_smooth = x_interp(frame_times)
        y_smooth = y_interp(frame_times)
        
        return pd.DataFrame({
            'time': frame_times,
            'x': x_smooth,
            'y': y_smooth
        })

    def create_video_for_user(self, user_id, sample_mode=False):
        """
        Create videos for a specific user with their gaze overlay
        
        Args:
            user_id (str): User ID (email) to process
            sample_mode (bool): If True, only process one video for this user
        """
        # Get all videos for this user
        user_data = self.gaze_data[self.gaze_data['userId'] == user_id]
        unique_videos = user_data['videoId'].unique()
        
        if sample_mode:
            unique_videos = unique_videos[:1]  # Only process first video in sample mode
        
        for video_id in unique_videos:
            # Get gaze data for this video
            video_gaze_data = user_data[user_data['videoId'] == video_id].sort_values('time')
            
            if len(video_gaze_data) < 4:  # Need at least 4 points for cubic interpolation
                print(f"Not enough gaze points for video {video_id}")
                continue
                
            # Find video file
            video_path = next(self.video_folder.glob(f"{video_id}.*"))
            if not video_path.exists():
                print(f"Video file not found for {video_id}")
                continue
            
            print(f"Processing video {video_id} for user {user_id}")
            
            # Open video
            cap = cv2.VideoCapture(str(video_path))
            if not cap.isOpened():
                print(f"Could not open video: {video_path}")
                continue
            
            # Get video properties
            fps = cap.get(cv2.CAP_PROP_FPS)
            frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            
            # Create interpolated gaze points
            smooth_gaze_data = self.interpolate_gaze_points(video_gaze_data, fps)
            
            # Create output video
            output_filename = f"{user_id}_{video_id}.mp4"
            output_path = self.output_folder / output_filename
            
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            out = cv2.VideoWriter(str(output_path), fourcc, fps, (frame_width, frame_height))
            
            frame_count = 0
            last_gaze_time = smooth_gaze_data['time'].max()
            
            while cap.isOpened():
                ret, frame = cap.read()
                if not ret:
                    break
                
                current_time = frame_count / fps
                
                # Stop if we've passed the last gaze point
                if current_time > last_gaze_time:
                    break
                
                # Get interpolated gaze point for current time
                current_point = smooth_gaze_data[
                    (smooth_gaze_data['time'] >= current_time) &
                    (smooth_gaze_data['time'] < current_time + 1/fps)
                ]
                
                if not current_point.empty:
                    x = int(current_point.iloc[0]['x'])
                    y = int(current_point.iloc[0]['y'])
                    
                    # Draw dot with anti-aliasing
                    cv2.circle(frame, (x, y), 8, self.dot_color, -1, cv2.LINE_AA)
                    
                    # Add subtle glow effect
                    cv2.circle(frame, (x, y), 12, self.dot_color, 2, cv2.LINE_AA)
                
                out.write(frame)
                frame_count += 1
            
            cap.release()
            out.release()
            print(f"Saved video to {output_path}")

def main():
    # File paths
    video_folder = "/Users/lennoxanderson/Documents/machineLearning/data/TeslaRawDrivingFootage/SplitData/1-4BatchSplits"
    csv_path = "/Users/lennoxanderson/Documents/Research/Human-Alignment-Hazardous-Driving-Detection/ETL/badgazedata.csv"  # Use your normalized CSV
    output_folder = "user_gaze_videos"
    
    try:
        creator = UserGazeVideoCreator(video_folder, csv_path, output_folder)
        
        # First create a sample video for one user
        sample_user = creator.users[0]
        print(f"\nCreating sample video for user {sample_user}")
        creator.create_video_for_user(sample_user, sample_mode=True)
        
        # Ask if user wants to process all videos
        response = input("\nDo you want to process all videos for all users? (y/n): ")
        if response.lower() == 'y':
            for user_id in creator.users:
                print(f"\nProcessing videos for user {user_id}")
                creator.create_video_for_user(user_id)
                
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

Reading and normalizing gaze data...
Found 20 unique users

Creating sample video for user Avimarzini123@gmail.com
Processing video video423 for user Avimarzini123@gmail.com
Saved video to user_gaze_videos/Avimarzini123@gmail.com_video423.mp4



Do you want to process all videos for all users? (y/n):  y



Processing videos for user Avimarzini123@gmail.com
Processing video video423 for user Avimarzini123@gmail.com
Saved video to user_gaze_videos/Avimarzini123@gmail.com_video423.mp4

Processing videos for user abdouliejaye7@gmail.com
Processing video video227 for user abdouliejaye7@gmail.com
Saved video to user_gaze_videos/abdouliejaye7@gmail.com_video227.mp4
Processing video video383 for user abdouliejaye7@gmail.com
Saved video to user_gaze_videos/abdouliejaye7@gmail.com_video383.mp4
Processing video video8 for user abdouliejaye7@gmail.com
Saved video to user_gaze_videos/abdouliejaye7@gmail.com_video8.mp4

Processing videos for user amiterez93@gmail.com
Processing video video129 for user amiterez93@gmail.com
Saved video to user_gaze_videos/amiterez93@gmail.com_video129.mp4
Processing video video483 for user amiterez93@gmail.com
Saved video to user_gaze_videos/amiterez93@gmail.com_video483.mp4

Processing videos for user andersonlennox381@outlook.com
Processing video video442 for user an