<a href="https://colab.research.google.com/github/sameedsyed1/website/blob/main/DL_project_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Deep Learning and Neural Networks




Name: Syed Muhammad Sameed Hussain

I.D: F2019332008

instructor: Dr. Saira Osama

section: D1


First, we need to define the structure of our Neural Network. Because our dataset is relatively simple, a network with just a hidden layer will do fine. So we will have an input layer, a hidden layer and an output layer. Next, we need an activation function. The sigmoid function is a good choice for the last layer because it outputs values between 0 and 1 while tanh (hyperbolic tangent) works better in the hidden layer.

Here, the parameters to be learned are the weights W1, W2 and the biases b1, b2. As you can see W1 and b1 connect the input layer with the hidden layer while W2, b2 connect the hidden layer with the output layer. From the basic theory we know that the activations A1 and A2 are calculated as follows:

A1 = h(W1*X + b1)
A2 = g(W2*A1 + b2)

Where g and h are the two activation functions we chose (sigmoid and tanh) and W1, W1, b1, b2 are generally Matrices.

First we will implement our sigmoid activation function defined as follows: g(z) = 1/(1+e^(-z)) where z will be a matrix in general.

In [None]:
import numpy as np

def sigmoid(z):
  return 1/(1 + np.exp(-z))

Now we have to initialize our parameters. Weight matrices W1 and W2 will be randomly initialized from a normal distribution while biases b1 and b2 will be initialized to zero. The function initialize_parameters(n_x, n_h, n_y) takes as input the number of units in each of the 3 layers and initializes the parameters properly.

In [None]:
def initialize_parameters(n_x, n_h, n_y):
  W1 = np.random.randn(n_h, n_x)
  b1 = np.zeros((n_h, 1))
  W2 = np.random.randn(n_y, n_h)
  b2 = np.zeros((n_y, 1))

  parameters = {
    "W1": W1,
    "b1" : b1,
    "W2": W2,
    "b2" : b2
  }
  return parameters

The next step is to implement the Forward Propagation. 

In [None]:
def forward_prop(X, parameters):
  W1 = parameters["W1"]
  b1 = parameters["b1"]
  W2 = parameters["W2"]
  b2 = parameters["b2"]

  Z1 = np.dot(W1, X) + b1
  A1 = np.tanh(Z1)
  Z2 = np.dot(W2, A1) + b2
  A2 = sigmoid(Z2)

  cache = {
    "A1": A1,
    "A2": A2
  }
  return A2, cache

We now have to compute the loss function.

In [None]:
def calculate_cost(A2, Y):
  cost = -np.sum(np.multiply(Y, np.log(A2)) +  np.multiply(1-Y, np.log(1-A2)))/m
  cost = np.squeeze(cost)

  return cost

Now we calculate the Back Propagation. This function will return the gradients of the Loss function with respect to the 4 parameters of our network(W1, W2, b1, b2):

In [None]:
def backward_prop(X, Y, cache, parameters):
  A1 = cache["A1"]
  A2 = cache["A2"]

  W2 = parameters["W2"]

  dZ2 = A2 - Y
  dW2 = np.dot(dZ2, A1.T)/m
  db2 = np.sum(dZ2, axis=1, keepdims=True)/m
  dZ1 = np.multiply(np.dot(W2.T, dZ2), 1-np.power(A1, 2))
  dW1 = np.dot(dZ1, X.T)/m
  db1 = np.sum(dZ1, axis=1, keepdims=True)/m

  grads = {
    "dW1": dW1,
    "db1": db1,
    "dW2": dW2,
    "db2": db2
  }

  return grads

Now we have all the gradients of the Loss function, so we can proceed to the actual learning! We will use Gradient Descent algorithm to update our parameters and make our model learn with the learning rate passed as a parameter:

In [None]:
def update_parameters(parameters, grads, learning_rate):
  W1 = parameters["W1"]
  b1 = parameters["b1"]
  W2 = parameters["W2"]
  b2 = parameters["b2"]

  dW1 = grads["dW1"]
  db1 = grads["db1"]
  dW2 = grads["dW2"]
  db2 = grads["db2"]

  W1 = W1 - learning_rate*dW1
  b1 = b1 - learning_rate*db1
  W2 = W2 - learning_rate*dW2
  b2 = b2 - learning_rate*db2
  
  new_parameters = {
    "W1": W1,
    "W2": W2,
    "b1" : b1,
    "b2" : b2
  }

  return new_parameters

Now we just put them all together inside a function called model() and call model() from the main program.

Model() function takes as input the features matrix X, the labels matrix Y, the number of units n_x, n_h, n_y, the number of iterations we want our Gradient Descent algorithm to run and the learning rate of Gradient Descent and combines all the functions above to return the trained parameters of our model:

In [None]:
def model(X, Y, n_x, n_h, n_y, num_of_iters, learning_rate):
  parameters = initialize_parameters(n_x, n_h, n_y)

  for i in range(0, num_of_iters+1):
    a2, cache = forward_prop(X, parameters)

    cost = calculate_cost(a2, Y)

    grads = backward_prop(X, Y, cache, parameters)

    parameters = update_parameters(parameters, grads, learning_rate)

    if(i%100 == 0):
      print('Cost after iteration# {:d}: {:f}'.format(i, cost))

  return parameters

Now we just have to make our prediction. The function predict(X, parameters) takes as input the matrix X with elements the 2 numbers for which we want to compute the XOR function and the trained parameters of the model and returns the desired result y_predict by using a threshold of 0.5:

In [None]:
def predict(X, parameters):
  a2, cache = forward_prop(X, parameters)
  yhat = a2
  yhat = np.squeeze(yhat)
  if(yhat >= 0.5):
    y_predict = 1
  else:
    y_predict = 0

  return y_predict

 Now let’s go to the main program and declare our matrices X, Y and the hyperparameters n_x, n_h, n_y, num_of_iters, learning_rate:

In [None]:
np.random.seed(2)

# The 4 training examples by columns
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])

# The outputs of the XOR for every example in X
Y = np.array([[0, 1, 1, 0]])

# No. of training examples
m = X.shape[1]
# Set the hyperparameters
n_x = 2     #No. of neurons in first layer
n_h = 2     #No. of neurons in hidden layer
n_y = 1     #No. of neurons in output layer
num_of_iters = 1000
learning_rate = 0.3

In [None]:
trained_parameters = model(X, Y, n_x, n_h, n_y, num_of_iters, learning_rate)

We make our prediction for a random pair of numbers, e.g (1,1):

In [None]:
# Test 2X1 vector to calculate the XOR of its elements. 
# You can try any of those: (0, 0), (0, 1), (1, 0), (1, 1)
X_test = np.array([[1], [1]])
y_predict = predict(X_test, trained_parameters)
# Print the result
print('Neural Network prediction for example ({:d}, {:d}) is {:d}'.format(
    X_test[0][0], X_test[1][0], y_predict))

Neural Network prediction for example (1, 1) is 0
