# Face Recognition Using Neural Network

##1. Dataset Upload and Extract

### 1.1 Dataset Upload

In [1]:
# import dataset
from google.colab import files
uploaded = files.upload()

### 1.2 Dataset Extract

In [2]:
!unzip att_faces_ORG.zip

### 1.3 Verify the Dataset
Verify if our datasets are correclty imported

In [3]:
# to chack the datasets are correctly uploaded and labeled.
import os
print(os.listdir(os.getcwd()))

['.config', '.ipynb_checkpoints', 'att_faces_ORG.zip', 'att_faces', 'sample_data']


## 2. Import of All Required Libraries 

In [4]:
# import required libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

##3. Steps For Training Dataseet

### 3.1 Generating Face Dataset
Each face image is represented in the form a matrix having m rows and n columns,
where each pixel (x,y) such that xm, and yn shows pixel location of the image as
well as the direction.
For the simplicity we are assuming each face image as a column vector, if we have p
images then the size of the face database will be mn*p.<br>
Let’s say face database is denoted as $(Face Db)_{mn*p}$*italicized text*

In [5]:
Face_db = np.zeros(shape=(400, 112*92)) # image input in p*mn format
y = np.zeros(shape=(400, 1), dtype=np.int8) # name of the corresponding image
p = 0; # increment operator to store each data
for i in range(40):
  images = os.listdir('./att_faces/s'+str(i+1)) # retrive every folder of my path
  for image in images:
    img = cv2.imread('./att_faces/s'+str(i+1)+"/"+image, cv2.IMREAD_GRAYSCALE) # read the input image
    img = np.array(img, dtype=np.float64)
    Face_db[p, :] = img.flatten() # flat the image as mentioned in document
    y[p] = i+1 # label for the input data image
    p += 1 
print(Face_db)
# print(y)

[[ 34.  34.  33. ...  37.  40.  33.]
 [ 60.  60.  62. ...  32.  34.  34.]
 [ 48.  49.  45. ...  47.  46.  46.]
 ...
 [119. 120. 120. ...  89.  94.  85.]
 [123. 121. 126. ...  40.  35.  42.]
 [128. 125. 125. ...  85.  90.  84.]]


In [6]:
# OneHotEncode the y samples
enc = OneHotEncoder()
y = enc.fit_transform(y).toarray()
y.shape

(400, 40)

###3.2 Perform Split of Data for Training and Testing Purpose
split the 80% of data for training and remaining for testing.

In [7]:
X_train, X_test, y_train, y_test = train_test_split(Face_db, y, train_size=0.6, random_state=0) # spliting by 60%

print(X_train.shape)
y_train.shape;
X_test.shape;
y_test.shape

(240, 10304)


(160, 40)

In [8]:
X_train = X_train.T # to convert data to required format i.e mn*p
X_test = X_test.T
y_train=  y_train.T 
y_test = y_test.T 
print(X_test.shape)
print(y_test.shape)

(10304, 160)
(40, 160)


###3.3 Calculate Mean
Calculate the mean of each observation<br>
Here mean vector will have the dimension of $(M)_{mn*1}$

In [9]:
mean = np.mean(X_train, axis=1)
mean = mean.reshape(mean.shape[0], 1) # convert pre ccalculated shape of (mn,) to (mn,1)
print(mean.shape)

(10304, 1)


###3.4 Calculate Deviation Matrix
Subtract mean face from each face image, let’s say this mean zero face data as dev.

In [10]:
dev = X_train-mean
print(dev)

[[ 13.78333333 -43.21666667  31.78333333 ...  12.78333333  38.78333333
   33.78333333]
 [ 12.5875     -39.4125      33.5875     ...  17.5875      39.5875
   31.5875    ]
 [ 11.19583333 -36.80416667  31.19583333 ...  15.19583333  38.19583333
   33.19583333]
 ...
 [-26.7375      53.2625       6.2625     ... -34.7375      11.2625
    7.2625    ]
 [-23.66666667  55.33333333   2.33333333 ... -38.66666667   9.33333333
    4.33333333]
 [-23.55833333  49.44166667   4.44166667 ... -38.55833333  13.44166667
   10.44166667]]


###3.5 Calculate Co-Variance of the Mean aligned faces 
Hence here will get covariance matrix of p * p dimension, which is easy to compute and process, the
idea behind computing the surrogate covariance suggested by turk and peterland that,
these are only the valid direction where we will get maximum variances, and rest of the
directions are insignificant to us. Menas these are direction where we will get the
eigenvalues and for rest we will get eigenvalues equal to zero.<br>So, take two column matrix and do vector dot prodect to get a scalar and store that in cov matrix.

In [11]:
# cov = np.zeros(shape=(320, 320))
# for i in range(320):
#   for j in range(320):
#     cov[i, j] = np.dot(dev[:, i], dev[:, j])
# print(cov)
cov = np.dot(dev.T, dev) # calculating covariance matrix by above method
                        # without using any explicit for loops
                        # shape=p*p
print(cov.shape)

(240, 240)


###3.6 Eigenvalue and Eigenvector Decomposition
Determine eigenvalue and eigenvector and  select the best direction from p directions, for this sort the eigenvalues in the
descending order.

In [12]:
lambd, V = np.linalg.eig(cov) # lambd is eigenValue and V is eigenVector
idx = np.argsort(lambd)[::-1] # get the indices in descending order according to data
lambd = lambd[idx] # sort the value
V = V[:, idx] # sort the vector
V

array([[ 0.11387637,  0.03251089, -0.05384041, ..., -0.03594474,
         0.0190055 ,  0.06454972],
       [-0.07040714,  0.12979555, -0.03369453, ..., -0.00905014,
        -0.02242568,  0.06454972],
       [-0.02023225, -0.06010655,  0.01932551, ..., -0.00473831,
        -0.00059041,  0.06454972],
       ...,
       [-0.05523633, -0.0683319 ,  0.00106851, ..., -0.00527197,
         0.00815422,  0.06454972],
       [-0.05721954, -0.07732658, -0.07407796, ..., -0.00470899,
        -0.05796219,  0.06454972],
       [-0.09227545, -0.06221839, -0.03941263, ...,  0.00225416,
        -0.0160272 ,  0.06454972]])

###3.7 Selection of Prominent Features
decide a k value, which represents the number of selected
eigenvectors to extract k direction from all p direction. On the basis of k value we can
generate the $Feature vector_{p*k}$

In [13]:
def n_components(k, V):
  # choosing n-components, k
  # V is eigenVector
  # return k prominent feature i.e our feature vector
  return V[:, :k]

In [14]:
components = 20
feature_vec = n_components(components, V)
print(feature_vec)

[[ 0.11387637  0.03251089 -0.05384041 ...  0.00961343  0.00536039
   0.03266572]
 [-0.07040714  0.12979555 -0.03369453 ... -0.01593011 -0.11272594
  -0.02001582]
 [-0.02023225 -0.06010655  0.01932551 ... -0.05474237 -0.14227077
  -0.08819568]
 ...
 [-0.05523633 -0.0683319   0.00106851 ... -0.08996113 -0.05833638
   0.19310081]
 [-0.05721954 -0.07732658 -0.07407796 ...  0.06885465 -0.02587326
  -0.0355455 ]
 [-0.09227545 -0.06221839 -0.03941263 ... -0.0217031   0.05432152
   0.04413111]]


###3.8 Generating Eigenfaces:
For generating the eigenfaces project the each mean aligned face to the generated feature vector
**$(eigenfaces)_{k*mn}$ = $(featureVector)^{T}_{k*p}$ * $(dev)^{T}_{p*mn}$ **

In [15]:
# projecting each mean aligned face and storing of shape k*mn
def gen_eigenface(feature_vec, dev):
  # feature_vec: Feature Vector generated using n-components
  # dev: Deviation matrix
  # Return: EigenFace
  return np.dot(feature_vec.T, dev.T)

In [16]:

eigen_faces = gen_eigenface(feature_vec, dev) 
eigen_faces.shape

(20, 10304)

###3.9 Generating Projection of Each Train Dataset 

In [17]:
def gen_projection(eigen_faces, dev):
  # eigen_faces: Eigen Face 
  # dev: Deviation matrix
  # Return: projection of dataset
  return np.dot(eigen_faces, dev)

In [18]:
projection_train = gen_projection(eigen_faces, dev) # shape k*p
projection_train.shape

(20, 240)

###3.10 Define Number of Units for each Layer 


In [19]:
n_x = components # size of the input layer, 20
n_h = 1024 # size of the hidden layer
n_y = 40 # size of the output layer

In [20]:
# Initialization of parameters 
def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros(shape=(n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros(shape=(n_y, 1))
    
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

###3.11 Forward Propagation

In [21]:
# sigmoid
def sigmoid(y):
  x = y.copy()
  return 1/(1+np.exp(-x))
# ReLu
def relu(y, thresold=0.1):
  x = y.copy()
  x[x < thresold] = 0
  return x

In [22]:
def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    Z1 = np.dot(W1, X) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

In [23]:
def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (40, number of examples)
    Y -- "true" labels vector of shape (40, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1] # number of example
    
    # Retrieve W1 and W2 from parameters
    W1 = parameters['W1']
    W2 = parameters['W2']
    
    # Compute the cross-entropy cost
    logprobs = Y * np.log(A2)
    cost = - np.sum(logprobs) / m
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    
    return cost

In [24]:
def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (20, number of examples)
    Y -- "true" labels vector of shape (40, number of examples)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # First, retrieve W1 and W2 from the dictionary "parameters".
    W1 = parameters['W1']
    W2 = parameters['W2']
        
    # Retrieve also A1 and A2 from dictionary "cache".
    A1 = cache['A1']
    A2 = cache['A2']
    
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    dZ2= A2 - Y
    dW2 = (1 / m) * np.dot(dZ2, A1.T)
    db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
    dW1 = (1 / m) * np.dot(dZ1, X.T)
    db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [25]:
def update_parameters(parameters, grads, learning_rate=1.2):
    """
    Updates parameters using the gradient descent update rule given above
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve each parameter from the dictionary "parameters"
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    # Retrieve each gradient from the dictionary "grads"
    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']
    
    # Update rule for each parameter
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [26]:
def nn_model(X, Y, n_h, num_iterations):
    """
    Arguments:
    X -- dataset of shape (20, number of examples)
    Y -- labels of shape (40, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    n_x = X.shape[0]
    n_y = Y.shape[0]
    
    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    parameters = initialize_parameters(n_x, n_h, n_y)
    
    # Loop (gradient descent)

    for i in range(0, num_iterations):
         
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads)
        
        # Print the cost every 1000 iterations
        if i % 1000 == 0:
            print ("Cost after iteration %i: %f" % (i, cost))

    return parameters

In [27]:
parameters = nn_model(projection_train, y_train, n_h, 5000)
print(parameters)

  after removing the cwd from sys.path.


Cost after iteration 0: 0.708433
Cost after iteration 1000: 1.278157
Cost after iteration 2000: 0.196221
Cost after iteration 3000: 0.441039
Cost after iteration 4000: 0.391378
{'W1': array([[ 4.89997119e+08,  5.53275611e+07,  7.63680998e+07, ...,
         1.51887932e+07, -1.30336750e+07, -1.15107657e+07],
       [-4.89848808e+08, -1.08519597e+08,  1.44064860e+08, ...,
        -3.54201698e+07,  1.36928668e+07, -2.31805523e+07],
       [ 5.32519473e+08,  5.19393401e+07,  8.22500425e+07, ...,
         1.77474485e+07, -1.91860506e+07, -1.88326929e+07],
       ...,
       [ 6.09252554e+08,  2.10705021e+07,  9.05449565e+07, ...,
         2.35652182e+07, -2.76176223e+07, -3.82223321e+07],
       [-5.37683782e+08, -1.50931161e+08,  1.09535812e+08, ...,
        -1.73806278e+07, -5.55914552e+06, -2.72595747e+07],
       [-4.03907243e+08, -7.63712243e+07,  1.01574182e+08, ...,
        -3.45296520e+07,  1.65761408e+06, -1.60245051e+07]]), 'b1': array([[-1.43390398],
       [-2.62761596],
       [

##4. Steps for Testing

In [28]:
print(X_test)
print(X_test.shape)

[[ 50.  37.  86. ... 108. 115. 112.]
 [ 49.  36.  89. ... 111. 118. 109.]
 [ 50.  34.  90. ... 110. 117. 112.]
 ...
 [159. 160. 124. ...  53.  22.  61.]
 [111. 207. 116. ...  48.  25.  62.]
 [117. 210.  83. ...  56.  28.  62.]]
(10304, 160)


### 4.2 DEviation of test set

In [29]:
dev_test = X_test - mean # shape mn*p
print(dev_test.shape)

(10304, 160)


###4.3 Project data of test set

In [30]:
projection = gen_projection(eigen_faces, dev_test) # project each data of test dataset, shape=k*mn
print(projection.shape)

(20, 160)


In [31]:

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model
    """
    
    # Computes probabilities using forward propagation
    A2, cache = forward_propagation(X, parameters)
    print(A2.shape)
    predictions = np.argmax(A2, axis=0)+1
    
    return predictions

In [32]:
def predict_model(X, Y, parameters):
  """
  Arguments:
  X -- dataset of shape (20, number of examples)
  Y -- labels of shape (40, number of examples)
  parameters -- python dictionary containing your parameters 

  Returns
  correctyLabled: correctly matched
  """
  correctyLabled = 0
  predictions = predict(parameters, X)
  print(predictions.shape)
  for i in range(X.shape[1]):
    print("predicted:", predictions[i], "correct Label:", np.argmax(Y[:, i])+1)
    if predictions[i] == np.argmax(Y[:, i])+1:
      correctyLabled += 1
  
  return correctyLabled

In [33]:
correctyLabled = predict_model(projection, y_test, parameters)

(40, 160)
(160,)
predicted: 14 correct Label: 14
predicted: 31 correct Label: 31
predicted: 25 correct Label: 35
predicted: 20 correct Label: 20
predicted: 25 correct Label: 25
predicted: 36 correct Label: 7
predicted: 32 correct Label: 16
predicted: 27 correct Label: 27
predicted: 15 correct Label: 15
predicted: 22 correct Label: 22
predicted: 4 correct Label: 4
predicted: 14 correct Label: 14
predicted: 12 correct Label: 12
predicted: 25 correct Label: 35
predicted: 19 correct Label: 2
predicted: 6 correct Label: 6
predicted: 30 correct Label: 30
predicted: 15 correct Label: 15
predicted: 21 correct Label: 21
predicted: 20 correct Label: 20
predicted: 18 correct Label: 18
predicted: 27 correct Label: 27
predicted: 13 correct Label: 13
predicted: 25 correct Label: 35
predicted: 18 correct Label: 18
predicted: 32 correct Label: 32
predicted: 8 correct Label: 8
predicted: 2 correct Label: 2
predicted: 29 correct Label: 29
predicted: 11 correct Label: 11
predicted: 5 correct Label: 18
pr

  after removing the cwd from sys.path.


In [34]:
accuracy = correctyLabled / y_test.shape[1] * 100

In [35]:
print(accuracy)

73.125
