# Idea:

## Using an XGBoost Model

1.   Start by collecting a dataset of labeled images, where the label is the 
direction of the gaze.

2.   Extract the features (e.g. pupil_sizes, eye_corners, eyebrow_contours, faces, and eye positions) from each image with OpenCV and use them as input to the XGBoost model.

3. Train the XGBoost model on the labeled data, using the extracted features as input and the gaze direction as the output label.

4. Use the feature importance analysis in XGBoost along with SHAP to identify which features are most important for the prediction, so that we know which features the model is using to make its decisions.

(Justification: Since XGBoost is a powerful and efficient machine learning model that is often used for classification and is particularly effective at handling structured data, providing excellent accuracy and interpretability.)

---


## Input:
- pupil_sizes, eye_corners, eyebrow_contours, faces, and eye positions

(**Justification** for each input feature is explained in each section later) 

## Output:
- 1 out of the 9 Eye-gazing directions (Up, down, left, right, etc)

(**Justification**: Predefining 9 eye gazing directions allows the classification problem to be simplified and reduces the number of classes that the model has to distinguish between. This makes it easier to train the model and achieve a high accuracy. Additionally, it may be more interpretable to humans to see the prediction as one of 9 distinct classes rather than as a set of coordinates. However, using a continuous set of coordinates could be useful if the problem requires more precision or if the classification categories are not clearly defined)


---




# Procedures:


1.   Locate and extract faces in the image
2.   Extract eyes and eyebrows from the faces detected
3.   Extract eye corners from the eyes
4.   Extract pupil size from the eyes
5.   Combine all the information we have into a feature matrix
6.   Perform data preprocessing
7.   Train the model
8.   Identify and understand how each feature contributes



## Downloading Data, Importing Libraries and Defining Variables

In [94]:
!pip install shap

import cv2
import numpy as np
from google.colab.patches import cv2_imshow
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import shap

!wget "https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml"
!wget "https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_eye.xml"
!wget "https://github.com/npinto/opencv/raw/master/data/haarcascades/haarcascade_mcs_eyepair_big.xml"

prefix = "MPIIGaze/"
file_path = "test.jpg"

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
eyebrow_cascade = cv2.CascadeClassifier('haarcascade_mcs_eyepair_big.xml')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting shap
  Downloading shap-0.41.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (575 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m575.9/575.9 KB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
Collecting slicer==0.0.7
  Downloading slicer-0.0.7-py3-none-any.whl (14 kB)
Installing collected packages: slicer, shap
Successfully installed shap-0.41.0 slicer-0.0.7
--2023-02-17 06:16:47--  https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml [following]
--2023-02-17 06:16:47--  https://raw.githubusercontent.com/opencv/opencv/ma

## Image Preprocessing - Removing Noise

Noise in images could be sensor noise, compression artifacts, or poor lighting conditions. Removing noise from an image can help improve its visual quality and make it more useful for further analysis.

In order to remove noise from an image, we could use spatial filters like Gaussian smoothing to blur the image and reduce high-frequency noise. 

Another approach is to use thresholding or adaptive thresholding to isolate regions of the image that are most likely to contain the features of interest, while suppressing noise in other regions.

---

##**Justification**: 
- I chose Gaussian filter because it has a smoothing effect while preserving the edges of the image, while computationally efficient

- I chose adaptive thresholding in this case as it would be a better approach for image segmentation and edge detection in cases where the lighting conditions are not consistent throughout the image, as it is capable of automatically determining the optimal threshold value for each local region of the image, based on the characteristics of the local pixel intensities

In [99]:
def image_preprocessing(img_path):
  # Load input image
  img = cv2.imread(img_path)

  # Convert image to grayscale
  img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

  # Use adaptive threshold
  cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 9, 3)

  return img, img_gray

img, img_gray = image_preprocessing(file_path)

## Extracting Faces, Eyes, and Eyebrows

- Cascade classifier is used to identify faces and eyes by computing Haar-like features, as it is fast and effective

- Once the face and eye regions have been identified, a region of interest (ROI) can be defined above each eye to isolate the area where the eyebrows are most likely to be found.

- In the code, I used color analysis to identify the specific regions within the ROI that correspond to the eyebrows and extract pixels with specific color values or by analyzing color histograms within the ROI.

---
### Justification of extracting eyebrows:
Position of the eyebrows relative to the eyes can provide information about the direction of the gaze. When a person looks up or down, their eyebrows move accordingly, and these changes can be detected by tracking the position of the eyebrows.


###Improvements I could make:
Extracted eyebrow regions can be further processed using techniques such as edge detection or morphological operations to improve their shape and position accuracy. 

In [64]:
def extract_faces(img, img_gray):
  face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
  faces = face_cascade.detectMultiScale(img, 1.3, 5)

  imgs_with_faces_only = [img[y:y+h, x:x+w] for (x, y, w, h) in faces]
  imgs_gray_with_faces_only = [img_gray[y:y+h, x:x+w] for (x, y, w, h) in faces]

  return faces, imgs_with_faces_only, imgs_gray_with_faces_only

faces, imgs_with_faces_only, imgs_gray_with_faces_only = extract_faces(img, img_gray)

In [91]:
def extract_eyes(img):
  eyes_roi = [img[y:y+h, x:x+w] for x, y, w, h in eye_cascade.detectMultiScale(img)]
  return eyes_roi

eye_positions = [extract_eyes(img) for img in imgs_with_faces_only]

In [65]:
def extract_eyebrows(img):
  eyes_roi = [img[y:y+h, x:x+w] for x, y, w, h in eye_cascade.detectMultiScale(img)]

  # Define colors to look for (brown shades)
  lower = np.array([28, 11, 5], dtype=np.uint8)
  upper = np.array([50, 20, 20], dtype=np.uint8)

  response = []

  # Analyze color values in each ROI
  for i, roi in enumerate(eyes_roi):
      # Convert ROI to HSV color space
      hsv = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
      
      # Threshold image to extract brown pixels
      mask = cv2.inRange(hsv, lower, upper)
      
      # Find contours of brown regions
      contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
      
      # Draw contours on original image
      for contour in contours:
          x, y, w, h = cv2.boundingRect(contour)
          cv2.rectangle(roi, (x, y), (x+w, y+h), (0, 255, 0), 2)

      response.append(contours)
    
  return response

eyebrow_contours = [extract_eyebrows(img) for img in imgs_with_faces_only]

## Extracting eye corners

Reason to extract eye corners is to measure the position and movement of the eyes as eye tracking relies on detecting eye landmarks such as the corners to determine where a person is looking and how their gaze is moving.

In [79]:
def detect_eye_corners(img_gray):
    # Iterate through the detected faces
    all_corners = []

    # Detect eyes in the ROI
    eyes = eye_cascade.detectMultiScale(img_gray, scaleFactor=1.2, minNeighbors=5)

    # Iterate through the detected eyes
    for (ex, ey, ew, eh) in eyes:
        # Extract eye region and apply adaptive thresholding
        eye_gray = img_gray[ey - 15:ey+eh + 15, ex- 15:ex+ew + 15]
        eye_thresh = cv2.adaptiveThreshold(eye_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 9, 3)

        # Apply Harris corner detection
        gray_corners = cv2.cornerHarris(eye_thresh, 2, 3, 0.04)
        corners = cv2.goodFeaturesToTrack(gray_corners, 2, 0.005, 25, blockSize=45)
        corners = np.int0(corners)

        # Append the detected corners to the list of all corners
        all_corners.append(corners)

    return all_corners


eye_corners = [detect_eye_corners(img) for img in imgs_gray_with_faces_only]

### Extracting pupil size

Reason to extract the size of the pupil is that it does changes in response to changes in lighting and focus. We can use contour detection or Hough circle detection to detect and extract pupil

1.   Detect the contour of the pupil.
2.   Find the contour with the maximum area, as it is likely to be the pupil


In [None]:
def extract_pupil(img_gray):
    all_pupil_sizes = []

    eyes = eye_cascade.detectMultiScale(img_gray, scaleFactor=1.2, minNeighbors=5)

    for (ex, ey, ew, eh) in eyes:
      eyes_roi = img_gray[ey:ey+eh, ex:ex+ew]
      # Find the contours of the pupil
      contours, hierarchy = cv2.findContours(eyes_roi, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

      # Find the contour with the maximum area, which is likely the pupil
      max_area = 0
      max_contour = None
      for contour in contours:
          area = cv2.contourArea(contour)
          if area > max_area:
              max_area = area
              max_contour = contour

      # Draw a circle around the pupil
      (x, y), radius = cv2.minEnclosingCircle(max_contour)

      # Return the coordinates and radius of the pupil
      all_pupil_sizes.append(((int(x), int(y)), int(radius)))

    return all_pupil_sizes

pupil_sizes = [extract_pupil(img) for img in imgs_gray_with_faces_only]

# Data Processing

We combine all the features we extracted into a feature matrix and perform normalisation and data spliting before passing it to the algorithm to train

In [92]:
def append_features_to_matrix(feature_matrix, features):
    """
    Helper method to split and reshape input arrays and append them to a feature matrix.
    :param matrix: Feature matrix to append features to.
    :param features: Features to append to the matrix.
    :return: Updated feature matrix.
    """
    # Split x, y coordinates from features
    x, y = features[0][0], features[0][1]

    # Reshape x and y coordinates and stack them together
    feature = np.array([x, y, radius]).reshape(1, -1)

    # Append reshaped features to the feature matrix
    feature_matrix = np.append(feature_matrix, feature)
    return feature_matrix

# Available variables: pupil_sizes, eye_corners, eyebrow_contours, faces, eye positions

# Create empty feature matrix
feature_matrix = np.empty((0, 0))
pupil_sizes_2d = np.array([pupil_sizes[0][0], pupil_sizes[0][1], pupil_sizes[1]]).reshape(1, -1)
feature_matrix = np.concatenate((feature_matrix, pupil_sizes_2d), axis=1)


feature_matrix = append_features_to_matrix(feature_matrix, eye_corners)
feature_matrix = append_features_to_matrix(feature_matrix, eyebrow_contours)
feature_matrix = append_features_to_matrix(feature_matrix, faces)
feature_matrix = append_features_to_matrix(feature_matrix, eye_positions)


X = feature_matrix  # feature matrix (pupil sizes, eye corners, eyebrow contours, faces, and eye positions)
y = ...  # label vector (gaze direction)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

  feature = np.array([x, y, radius]).reshape(1, -1)


#Model Training

In [75]:
# Train an XGBoost model

params = {
    'objective': 'multi:softmax',
    'num_class': 9,  # assuming 9 eye gazing directions
    'max_depth': 3,
    'learning_rate': 0.1,
    'n_estimators': 100,
    'gamma': 0,
    'min_child_weight': 1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'reg_alpha': 0,
    'reg_lambda': 1,
    'seed': 42
}

model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train)

# Evaluate the model on the test set
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: {:.2f}%'.format(accuracy * 100))

## Understand Feature Importance and Contribution to prediction

In [None]:
import shap

# Load the dataset
X, y = shap.datasets.iris()

# Calculate the SHAP values
explainer = shap.Explainer(model)
shap_values = explainer(X)

# Visualize the SHAP values
shap.summary_plot(shap_values, X)

# Print the feature importance
feature_importance = model.feature_importances_
feature_names = ['pupil_size', 'eye_corner_x', 'eye_corner_y', 'eyebrow_contour_x', 'eyebrow_contour_y', 'face_x', 'face_y', 'eye_x', 'eye_y']
for name, importance in zip(feature_names, feature_importance):
    print('{}: {:.2f}%'.format(name, importance * 100))