# Attention Prediction in a non-calibrated system

The purpose of this notebook is to predict a user's attention, meaning a model to decide whether a user is looking on the laptop's screen or not.  We want to achieve this without calibrating the system and by only using 2D frames from the laptop camera in real-time. 

For the thesis, we have created a dataset with labelled frames(1: looking, 0: not looking) and performed a couple of preprocessing steps to extract 68 facial landmarks on the face and the 2 iris points calculated by Loceye's eye-tracking algorithms. 
Below, we will try out different data architectures and classifiers to figure out the best performance.
To achieve real-time performance and non-biased behaviour, we do not want to include any calibrating process. Therefore we will approximate a mapping between the 2D camera points and the 3D world by solving the [Perspective-n-Point](https://en.wikipedia.org/wiki/Perspective-n-Point) problem using the built-in OpenCV method solvePnP. This way, we will obtain the Rotation and Translation matrix(3x1 each) for each frame. Afterwards, we will use them in different architectures to train a machine learning model that will predict the final output. The architectures presented below are:

 - Feed the raw two vectors in a classifier.
 - Feed the two vectors and the 2 iris points.

### Import packages

In [None]:
import numpy as np
import cv2
import dlib
from imutils import face_utils
import glob
import pickle
from random import shuffle
from sklearn.model_selection import train_test_split
from sklearn import metrics
import json

### Load Dataset

In [None]:
# Number of training examples to use(0-2806)
DATASET_SIZE = 2758
DEBUG = False

# Load the dataset
with open('data_cleaned.json') as json_file:
    data_all = json.load(json_file)

# Extract the keys in sorted order
keys = sorted(data_all)

# Convert python list to np array
keys = np.asarray(keys)
print(keys.shape)

Function that returns the required parameters of the solvePnP method. The paramaters *model_points* and *dist_coeffs* are the same for every example so they are declared globally.

In [None]:
# 3D model points.
model_points = np.array([
                            (0.0, 0.0, 0.0),             # Nose tip
                            (0.0, -330.0, -65.0),        # Chin
                            (-225.0, 170.0, -135.0),     # Left eye left corner
                            (225.0, 170.0, -135.0),      # Right eye right corne
                            (-150.0, -150.0, -125.0),    # Left Mouth corner
                            (150.0, -150.0, -125.0)      # Right mouth corner
                        ])

def generate_solvepnp_parameters(size, landmarks):
    # Approximate camera intrinsic parameters
    focal_length = size[1]
    center = (size[1]/2, size[0]/2)
    camera_matrix = np.array(
                             [[focal_length, 0, center[0]],
                             [0, focal_length, center[1]],
                             [0, 0, 1]], dtype = "double"
                             )

    # Grab the 2D coordinates of our six sample points
    image_points = np.array([
        (landmarks[33]['x'], landmarks[33]['y']) ,     # Nose tip
        (landmarks[8]['x'], landmarks[8]['y']),     # Chin
        (landmarks[36]['x'], landmarks[36]['y']),     # Left eye left corner
        (landmarks[45]['x'], landmarks[45]['y']),     # Right eye right corner
        (landmarks[48]['x'], landmarks[48]['y']),     # Left Mouth corner
        (landmarks[54]['x'], landmarks[54]['y'])      # Right mouth corner
    ], dtype="double")
    # 3D model points.
    
    return image_points, camera_matrix

model_points = np.array([
                            (0.0, 0.0, 0.0),             # Nose tip
                            (0.0, -330.0, -65.0),        # Chin
                            (-225.0, 170.0, -135.0),     # Left eye left corner
                            (225.0, 170.0, -135.0),      # Right eye right corne
                            (-150.0, -150.0, -125.0),    # Left Mouth corner
                            (150.0, -150.0, -125.0)      # Right mouth corner
                        ])

dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion

Debugging function used to visualize image points.

In [None]:
def visualize_image(im, rotation_vector, translation_vector, camera_matrix, image_points):
    # Project a 3D point (0, 0, 1000.0) onto the image plane.
    # We use this to draw a line sticking out of the nose
    (nose_end_point2D, jacobian) = cv2.projectPoints(
        np.array([(0.0, 0.0, 500.0)]), rotation_vector, translation_vector, camera_matrix, dist_coeffs
        )
    for p in image_points:
        cv2.circle(im, (int(p[0]), int(p[1])), 3, (0,0,255), -1)
    p1 = ( int(image_points[0][0]), int(image_points[0][1]) )
    p2 = ( int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]) )

    # Draw a line connecting the two points. This line must show
    # the direction out of the nose
    cv2.line(im, p1, p2, (255,0,0), 2)
    # Display image
    cv2.imshow("Output", im)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    print("Rotation Vector:\n {0}".format(rotation_vector))
    print("Translation Vector:\n {0}".format(translation_vector))

***

# Architecture #1

#### In this data architecture we will use as input the translation vector and rotation vector, each one of shape (3, 1) so a signle training example of our dataset will be of shape (6, 1). 

In [None]:
X = np.zeros((DATASET_SIZE, 6, 1))
y = np.zeros(DATASET_SIZE)

# Keep track of the indices where solvePnP crashed
failed_indices = []

for i in range(DATASET_SIZE):
    key = keys[i]

    # Approximate camera intrinsic parameters
    im = cv2.imread('dataset/' + key)   # This imread is time consuming! Another way?
    size = im.shape
    landmarks = data_all[key]['landmarks']
    
    image_points, camera_matrix = generate_solvepnp_parameters(size, landmarks)
    
    # Solve the PnP problem with the parameters specified above
    # and obtain rotation and translation vectors
    (success, rotation_vector, translation_vector) = cv2.solvePnP(
        model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_DLS
        )
    
    X[i, :] = np.concatenate((rotation_vector, translation_vector), axis=0)
    
    # Check if solvePnP crashed
    if(X[i, 0] > 10000):
        print(key)
        failed_indices.append(i)
    
    # Check if it is positive or negative example
    output = key.split('/')[1]
    if(output == 'positive'):
        y[i] = 1
    elif(output == 'negative'):
        y[i] = 0
        
# # Remove indices where solvePnP crashed
X = np.delete(X, failed_indices, axis=0)
y = np.delete(y, failed_indices, axis=0)
DATASET_SIZE = X.shape[0]

X = X.squeeze()
print(X.shape)
print(y.shape)

#### Data Preproccessing: Normalize features to have 0 mean and 1 Std
We obtain the mean and std from the first 800 examples and not the whole dataset beacause computations crush with too many exmaples and give infinite mean and std

#####  Before normalization:

In [None]:
m = X.mean(axis=0)
std = X.std(axis=0)

# # Save the mean and std for predictions in other notebooks
# with open('mean.pickle', 'wb') as f:
#     pickle.dump(m, f, pickle.HIGHEST_PROTOCOL)
# with open('std.pickle', 'wb') as f:
#     pickle.dump(std, f, pickle.HIGHEST_PROTOCOL)

Normalize features to have zero mean and unit variance:

In [None]:
X_scaled = (X-m)/std

# Train different classifiers
Here we will try out 3 classifiers and compare their results:
 - A SVM with rbf kernel and penalty parameter C=100
 - A Logistic Regression classifer
 - A Random Forest Classifier

In [None]:
# Number of classifiers to train in order to summarize the results
NUM_OF_CLASSIFIERS = 10
Y_SVM, Y_LR, Y_RF = np.zeros((NUM_OF_CLASSIFIERS)), np.zeros((NUM_OF_CLASSIFIERS)), np.zeros((NUM_OF_CLASSIFIERS))

### SVM Classifier

In [None]:
from sklearn import svm

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)
    
    svm_classifier = svm.SVC(C=100, kernel='rbf', gamma='auto')
    svm_classifier.fit(X_train, y_train)
    y_pred_svm = svm_classifier.predict(X_test)
    Y_SVM[i] = metrics.accuracy_score(y_test, y_pred_svm)

# Print the last result
print('Training set accuracy for SVM:', svm_classifier.score(X_train, y_train))
print('Test set accuracy for SVM: ', metrics.accuracy_score(y_test, y_pred_svm))

### Logistic Regression Classifier

In [None]:
from sklearn.linear_model import LogisticRegression

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)
    
    lr_classifier = LogisticRegression(solver='lbfgs')
    lr_classifier.fit(X_train, y_train)
    y_pred_lr = lr_classifier.predict(X_test)
    Y_LR[i] = metrics.accuracy_score(y_test, y_pred_lr)

print('Training set accuracy for Logistic Regression:', lr_classifier.score(X_train, y_train))
print('Test set accuracy for Logistic Regression: ', metrics.accuracy_score(y_test, y_pred_lr))

### Random Forst Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)
    
    rf_classifier = RandomForestClassifier(n_estimators=100, random_state=1)
    rf_classifier.fit(X_train, y_train)
    y_pred_rf = rf_classifier.predict(X_test)
    Y_RF[i] = metrics.accuracy_score(y_test, y_pred_rf)

print('Training set accuracy for Random Forst:', rf_classifier.score(X_train, y_train))
print('Test set accuracy for Random Forest: ', metrics.accuracy_score(y_test, y_pred_rf))

## Visualize & Summarize results for Architecture #1

We trained the 3 classifiers for 10 different train/test splits and the accuracy results are shown below

In [None]:
import matplotlib.pyplot as plt

t = np.linspace(0, NUM_OF_CLASSIFIERS, NUM_OF_CLASSIFIERS)
plt.plot(t, Y_SVM, 'r')
plt.plot(t, Y_LR, 'g')
plt.plot(t, Y_RF, 'b')
plt.xlabel('classifier No')
plt.ylabel('Test set Accuracy')
plt.legend(('SVM', 'LR', 'RF'))

plt.show()

In [None]:
print(np.max(Y_SVM), np.max(Y_RF))
print(np.mean(Y_SVM), np.mean(Y_RF))

The best out of 10 scores for each classifier is presented below

Metric                | SVM        | Logistic Regression | Random Forest |
:----:                | :--------: | :-----------------: | :-----:       |
Training set Accuracy | 0.743      | 0.619               | 1             | 
Test set Accuracty    | 0.699      | 0.612               | 0.779         |

We clearly see that the Random Forest classifier with 100 trees outperfomrs the other 2, reaching 78% accuracy!

In [None]:
with open('classifier.pickle', 'wb') as f:
    pickle.dump(rf_classifier, f, pickle.HIGHEST_PROTOCOL)

***

# Architecture #2
#### In this data architecture we will use as input the translation vector and rotation vector plus the 2 iris points) so a signle training example of our dataset will be of shape (10, 1). 
We already have the array X of shape (6, 1) containing the rotation and translation vectors stacked so we only need to extract the iris points from our data.json.

In [None]:
X2 = np.zeros((DATASET_SIZE, 10))
for i in range(DATASET_SIZE):
    key = keys[i]
    iris_right = np.asarray(data_all[key]['iris_right'])
    iris_left = np.asarray(data_all[key]['iris_left'])
    X2[i, :] = np.concatenate((X[i, :], iris_left, iris_right), axis = 0)
print(X2.shape)

The following steps are the same as for the first architecture

In [None]:
m = X2.mean(axis=0)
std = X2.std(axis=0)

X2_scaled = (X2-m)/std

# Train different classifiers
Here we will try out 3 classifiers and compare their results:
 - A SVM with rbf kernel and penalty parameter C=100
 - A Logistic Regression classifer
 - A Random Forest Classifier

In [None]:
# Number of classifiers to train in order to summarize the results
NUM_OF_CLASSIFIERS = 10
Y_SVM2, Y_LR2, Y_RF2 = np.zeros((NUM_OF_CLASSIFIERS)), np.zeros((NUM_OF_CLASSIFIERS)), np.zeros((NUM_OF_CLASSIFIERS))

### SVM classifier

In [None]:
from sklearn import svm
for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X2_train, X2_test, y2_train, y2_test = train_test_split(X2_scaled, y, test_size=0.2)
    
    svm_classifier2 = svm.SVC(C=100, kernel='rbf', gamma='auto')
    svm_classifier2.fit(X2_train, y2_train)
    y2_pred_svm = svm_classifier2.predict(X2_test)
    Y_SVM2[i] = metrics.accuracy_score(y2_test, y2_pred_svm)

print('Training set accuracy for SVM:', svm_classifier2.score(X2_train, y2_train))
print('Test set accuracy for SVM: ', metrics.accuracy_score(y2_test, y2_pred_svm))

### Logistic Regression Classifier

In [None]:
from sklearn.linear_model import LogisticRegression

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X2_train, X2_test, y2_train, y2_test = train_test_split(X2_scaled, y, test_size=0.2)
    
    lr_classifier2 = LogisticRegression(solver='lbfgs')
    lr_classifier2.fit(X2_train, y2_train)
    y2_pred_lr = lr_classifier2.predict(X2_test)
    Y_LR2[i] = metrics.accuracy_score(y2_test, y2_pred_lr)

print('Training set accuracy for Logistic Regression:', lr_classifier2.score(X2_train, y2_train))
print('Test set accuracy for Logistic Regression: ', metrics.accuracy_score(y2_test, y2_pred_lr))

### Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X2_train, X2_test, y2_train, y2_test = train_test_split(X2_scaled, y, test_size=0.2)
    
    rf_classifier2 = RandomForestClassifier(n_estimators=100, random_state=1)
    rf_classifier2.fit(X2_train, y2_train)
    y2_pred_rf = rf_classifier2.predict(X2_test)
    Y_RF2[i] = metrics.accuracy_score(y2_test, y2_pred_rf)

print('Training set accuracy for Random Forst:', rf_classifier2.score(X2_train, y2_train))
print('Test set accuracy for Random Forest: ', metrics.accuracy_score(y2_test, y2_pred_rf))

## Visualize & Summarize results for Architecture #2


We trained the 3 classifiers for 25 different train/test splits and the accuracy results are shown below

In [None]:
import matplotlib.pyplot as plt

t = np.linspace(0, NUM_OF_CLASSIFIERS, NUM_OF_CLASSIFIERS)
plt.plot(t, Y_SVM2, 'r')
plt.plot(t, Y_LR2, 'g')
plt.plot(t, Y_RF2, 'b')
plt.xlabel('classifier No')
plt.ylabel('Test set Accuracy')
plt.legend(('SVM', 'LR', 'RF'))

plt.show()
print(Y_SVM2.max(axis=0), Y_LR2.max(axis=0), Y_RF2.max(axis=0))
print(Y_SVM2.mean(axis=0), Y_LR2.mean(axis=0), Y_RF2.mean(axis=0))

Metric                | SVM        | Logistic Regression | Random Forest |
:----:                | :--------: | :-----------------: | :-----:       |
Training set Accuracy | 0.752      | 0.533               | 1             | 
Test set Accuracty    | 0.716      | 0.588               | 0.770         |

The difference between the first and the second architecture is noticeable.
The SVM  method is more accurate now approaching the Random Forest accuracy, which also improved in but not as much as the SVM. Also, the Logistic Regression classifier seems unable to handle our Dataset in both architectures; therefore, we will not consider it for the next of our research. The best accuracy scored, 77%, is achieved by Random Forest classifier in our second data architecture.

***

# Architecture #3
In this archtecture we will use the difference vector between the iris and the inner edge point of the eye(Maybe add and image here?). First, we will keep the raw iris coordinates and add the 2 vectors to our training data, yielding to a shape of (14, 1). Afterwards we will try removing the absolute coordinates of the iris and compare the 2 results.

Previously on this notebook we had extracted the 68 facial landmarks as well as the two iris points. The 2 inner eye points are the landmarks 39 and 42

In [None]:
# Dataset construction

X3 = np.zeros((DATASET_SIZE, 14))
for i in range(DATASET_SIZE):
    key = keys[i]
    
    landmarks = data_all[key]['landmarks']
    iris_right = np.asarray(data_all[key]['iris_right'])
    iris_left = np.asarray(data_all[key]['iris_left'])
    
    left_vector = np.asarray( (abs(iris_left[0] - landmarks[39]['x']), abs(iris_left[1] - landmarks[39]['y'])) )
    right_vector = np.asarray( (abs(iris_right[0] - landmarks[42]['x']), abs(iris_right[1] - landmarks[42]['y'])) )
    X3[i] = np.concatenate((X2[i], left_vector, right_vector), axis=0)
    
    if DEBUG:
        im = cv2.imread('dataset/' + key)
        cv2.circle(im, (int(iris_left[0]), int(iris_left[1])), 3, (0,0,255), -1)
        cv2.circle(im, (int(iris_right[0]), int(iris_right[1])), 3, (0,0,255), -1)
        cv2.circle(im, (landmarks[39]['x'], landmarks[39]['y']), 3, (0,0,255), -1)
        cv2.circle(im, (landmarks[42]['x'], landmarks[42]['y']), 3, (0,0,255), -1)
        cv2.imshow('Im', im)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

print(X3.shape)

In [None]:
m = X3.mean(axis=0)
std = X3.std(axis=0)
X3_scaled = (X3-m)/std

print(m)
print(std)

In [None]:
# Number of classifiers to train in order to summarize the results
NUM_OF_CLASSIFIERS = 10
Y_SVM3, Y_LR3, Y_RF3 = np.zeros((NUM_OF_CLASSIFIERS)), np.zeros((NUM_OF_CLASSIFIERS)), np.zeros((NUM_OF_CLASSIFIERS))

### SVM Classifier

In [None]:
from sklearn import svm

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X3_train, X3_test, y3_train, y3_test = train_test_split(X3_scaled, y, test_size=0.2)
    
    svm_classifier3 = svm.SVC(C=10, kernel='rbf', gamma='auto')
    svm_classifier3.fit(X3_train, y3_train)
    y3_pred_svm = svm_classifier3.predict(X3_test)
    Y_SVM3[i] = metrics.accuracy_score(y3_test, y3_pred_svm)

# Print the last result
print('Training set accuracy for SVM:', svm_classifier3.score(X3_train, y3_train))
print('Test set accuracy for SVM: ', metrics.accuracy_score(y3_test, y3_pred_svm))

### Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

for i in range(NUM_OF_CLASSIFIERS):
    # Try a different train/test split each time
    X3_train, X3_test, y3_train, y3_test = train_test_split(X3_scaled, y, test_size=0.2)
    
    rf_classifier3 = RandomForestClassifier(n_estimators=100, random_state=1)
    rf_classifier3.fit(X3_train, y3_train)
    y3_pred_rf = rf_classifier3.predict(X3_test)
    Y_RF3[i] = metrics.accuracy_score(y3_test, y3_pred_rf)

print('Training set accuracy for Random Forst:', rf_classifier3.score(X3_train, y3_train))
print('Test set accuracy for Random Forest: ', metrics.accuracy_score(y3_test, y3_pred_rf))

### Visualize & Summarize results for Architecture #3

In [None]:
import matplotlib.pyplot as plt

t = np.linspace(0, NUM_OF_CLASSIFIERS, NUM_OF_CLASSIFIERS)
plt.plot(t, Y_SVM3, 'r')
plt.plot(t, Y_RF3, 'b')
plt.xlabel('classifier No')
plt.ylabel('Test set Accuracy')
plt.legend(('SVM', 'RF'))

plt.show()
print(Y_SVM3.max(axis=0), Y_RF3.max(axis=0))
print(Y_SVM3.mean(axis=0), Y_RF3.mean(axis=0))

At this point we also tried feeding the difference vectors only without the absolute iris location, reaching an overall accuracy 2% lower than the one above.

It's time to save our trained Random Forest classifier as it is the more accurate one and use it in external applications. The cell below will save the last trained classifier and not the best one!

In [None]:
with open('data.pickle', 'wb') as f:
    pickle.dump(rf_classifier3, f, pickle.HIGHEST_PROTOCOL)