<font size=3>Utility Functions for Forward Propagation Through The L Layered Model.</font>

<font size=2>First we will load the train and test data.</font>

In [2]:
import pandas as pd
import numpy as np

<h2>Initializing Parameters:</h2>

Function for initializing weights and biases:<br/>
<ul><li> Takes as argument a list called <b>layer_dims</b> that contains #layers values where each value gives the no. of  units in the respective layers.<br/>
<li> <b> L = len(layer_dims)</b> is the #layers in the neural net. We have to initialize the weight matrices and bias vectors for L-1 layers(all except the 0 layer). For a layer l the weight matrix is W<sup>[ l ]</sup>( key in the dict is Wl ) and the bias is b<sup>[ l ]</sup>( key in the dict is bl ).<br/>
<li> Returns a dictionary called <b>parameters</b> with ( L - 1 )*2 key-value pairs with keys W1, W2,...,WL and b1,b2,...,bL. The value corresponding to the key Wi is the matrix of weights for layer i. <br/>parameters[ "W" + str( i ) ] is a matrix of shape ( n<sup>[ i ]</sup>, n<sup>[ i - 1 ]</sup> ) and parameters[ "b"+str(i) ] is a vector of shape ( n<sup>[ i ]</sup>, 1 ). Initialize the weights and biases as random values.</ul><br/>

In [2]:
def initialize_parameters(layer_dims):
    np.random.seed(3)
    L = len(layer_dims) #no. of layers in the network, including input and output layers
    parameters = {} #initialize the dict that will be returned
    for l in range(1, L):
        parameters[ "W"+str(l) ] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters[ "b"+str(l) ] = np.zeros((layer_dims[l],1))
    return parameters

In [10]:
layer_dims = [4, 5, 6, 4, 3]

#initializing parameters
parameters = initialize_parameters(layer_dims)

print("shape of W1: ",parameters["W1"].shape,"\n")
print("shape of b1: ",parameters["b1"].shape,"\n")
print("shape of W2: ",parameters["W2"].shape,"\n")
print("shape of b2: ",parameters["b2"].shape,"\n")
print("shape of W3: ",parameters["W3"].shape,"\n")
print("shape of b3: ",parameters["b3"].shape,"\n")
print("shape of W4: ",parameters["W4"].shape,"\n")
print("shape of b4: ",parameters["b4"].shape,"\n")

shape of W1:  (5, 4) 

shape of b1:  (5, 1) 

shape of W2:  (6, 5) 

shape of b2:  (6, 1) 

shape of W3:  (4, 6) 

shape of b3:  (4, 1) 

shape of W4:  (3, 4) 

shape of b4:  (3, 1) 



<h2>Single Step of Forward Propagation:</h2>

Function for one step of forward propagation:<br/>
<ul><li> This  calculates Z<sup>[ l ]</sup> and  A<sup>[ l ]</sup> for a single layer l.<br/>
<li> Takes <b>A_prev</b>, <b>W</b>, <b>b</b> and <b>activation</b> for arguments. n<sup>[ l ]</sup> is the number of units in the l<sup>th</sup> layer.<br/>
<ul><li><b>A_prev</b> is the activation vector(in form of a np array) for previous layer( [ l - 1 ] ) or the input X if we are at layer one.(X is layer zero). <i>Shape of A_prev is ( n<sup>[ l - 1 ]</sup>, m ).</i><br/>
<li><b>W</b> is the weight matrix(in form of a list) of the current layer ( [ l ] ).<i> Shape of W is ( n<sup>[ l ]</sup>, n<sup>[ l - 1 ]</sup> ).</i><br/>
<li><b>b</b> is the bias vector(in form of a list) of the current layer ( [ l ] ). <i>shape of b is ( n<sup>[ l ]</sup>, 1 ).</i><br/>
<li><b>activation</b> is a string holding the name of the activation function.</ul><br/>
<li> The function first calculates <b>Z = W . A_prev + b</b> and <i>shape of Z is ( n<sup>[ l ]</sup>, m ).</i> This is same as Z<sup>[ l ]</sup> = W<sup>[ l ]</sup> . A<sup>[ l - 1 ]</sup> + b<sup>[ l ]</sup>.<br/>
<li> Then it calculates the activation <b>A = g( Z )</b> where g() is the activation function. <i>Shape of A is same ass shape of Z</i><br/>
<li> Store A_prev, W, b in a tupple called <b>linear_cache</b>. Store Z in a tupple called <b>activation_cache</b>. These two caches will be required during back propagation.<br/>
<li> The function returns <b>A</b> and <b>cache</b>, which is a tuple containing linear_cache and activation_cache.
</ul>

In [15]:
def Forward_Prop_Single_Step(A_prev, W, b, activation):
    Z = np.dot(W, A_prev) + b
    if(activation=='relu'):
#        greaterThanZero = Z > 0 #this will return True for those values>0 and False otherwise. This gives an array of boolean values
#        greaterThanZero = greaterThanZero.astype(int) #converts the array to an array of integers either 1 or 0
#        A = np.multiply(greaterThanZero, Z) #for values of Z >0, returns the value, otherwise returns 0
        leakyZ = np.multiply(0.01, Z)
        A = np.maximum(Z, leakyZ)
    if(activation=='softmax'):
        t = np.exp(Z) #element wise exponent of Z
        sum_t = np.sum(t, axis=0) #for each col. of t, find sum of all rows in that col.
        A = t / sum_t #calculate the softmax activation
    
    linear_cache = (A_prev, W, b)
    activation_cache = (Z)
    
    cache = (linear_cache, activation_cache)
    return A, cache

<h2>Building L layers of Forward Propagation: </h2>

Function for building the L-layer model (L steps of forward propagation):<br/>
<ul><li> Takes as input <b>X</b>, <b>Y</b> and <b>parameters</b> which is the output of the initialization function. <br/><b>L = len(parameters) // 2</b><br/>
<li>Layer 0 is the input layer X. Layer L is the final layer. There are a total of L layers( not counting the input layer ). Number of hiddden layers is L-1. So we have to call the fuction implementing forward propagation L times: L-1 times for the L-1 hidden layers and once for the final layer.<br/>
<li> For each layer l in range( 1, L ) i.e., 1, ..., ( L - 1 ) call Forward_Prop_Single_Step( ) and pass <b>A<sup>[ l - 1 ]</sup></b>, <b>parameters[ "Wl" ]</b> and <b>parameters[ "bl" ]</b> along with the  <b>'relu' activation</b> function as arguments. These are the forward prop steps performed for the L-1 hidden layers. The forward prop step for the final layer will be called separately with <b>'softmax' activation</b> function.<br/>
<li> At each call to the forward_prop function A<sup>[ l ]</sup> and a cache is returned. Pass A<sup>[ l ]</sup> to the ( l + 1 )<sup>th</sup> call to the forward prop function. Append cache to a list called caches. Return <b>A<sup>[ L ]</sup></b> and <b>caches</b>. Shape of A<sup>[ L ]</sup> is (Y_train.shape[0], m)</ul><br/>

In [13]:
def L_layer_Forward_Prop(X, Y, parameters):
    L = len(parameters) // 2
    A_prev = X
    caches = [] #this is the list to be returned
    for l in range(1, L): #for each of the hidden layers
        A, cache = Forward_Prop_Single_Step(A_prev, parameters[ "W"+str(l) ], parameters[ "b"+str(l) ], activation="relu")
        caches.append(cache)
        A_prev = A
        
    #for the final layer
    AL, cache = Forward_Prop_Single_Step(A_prev, parameters[ "W"+str(L) ], parameters[ "b"+str(L) ], activation="softmax")
    caches.append(cache)
    
    assert(AL.shape == (Y.shape[0], Y.shape[1])) #make sure AL has the right shape
    return AL, caches

In [11]:
def accuracy(AL, Y):
    m = Y.shape[1]
    correct = 0
    #max_in_each_col is a (1, m) matrix whose i'th element contains the greatest value in the i'th col of AL
    max_in_each_col = np.amax(AL, axis=0)
    #find the one hot encoding for AL
    one_hot_AL = np.where(AL<max_in_each_col, 0, 1)
    for i in range(m):
        if(np.array_equal(one_hot_AL[:, i], Y[:, i])):
            correct = correct + 1
    acc = (correct / m) * 100
    return acc

<h2>Calculating the cost:</h2>
<br/>
Cost function:<br/>
<ul><li> Takes <b>A<sup>[ L ]</sup></b> and <b>Y</b> as input and calculates the <b>cost</b>. A<sup>[ L ]</sup> is the vector of probabilities.<br/>
    <li>Cost for the i<sup>th</sup> example is sum of all elements of Y<sup>( i )</sup>* log( A<sup>[ L ] ( i )</sup> ), which is a column vector with one element for each class.
<li>Cost of m examples is the sum of the cost for each example.
<li> Returns the <b>cost</b> as output.</ul><br/>

In [19]:
def compute_cost(AL, Y):
    m = Y.shape[1]
    #since the activation at the last layer is softmax
    #AL will be a (c, m) vector where c is the number of classes. Y is of the same shape.
    #np.multiply(Y, np.log(AL)) gives a (c, m) vector. The row-wise sum(axis=0) outputs a (1, m) vector,
    #where the i'th element is the cost for the i'th example. The outermost sum finds the cost for all m examples.
    single_example_cost = - np.sum(np.multiply(Y, np.log(AL)), axis=0)
    cost = np.sum(single_example_cost) / m
    return cost

<h2>Contents of <b>caches</b>:</h2>
<br/>
caches is a vector of length L.<br/>
caches[ l ] is the cache returned by the forward_prop function on its l<sup>th</sup> call.<br/>
this cache contains a linear_cache and an activation_cache. i.e.,
<br/><b>caches[ l ] = ( linear_cache, activation_cache )</b><br/>
<b>linear_cache = ( A<sup>[ l - 1 ]</sup> , W<sup>[ l ]</sup> , b<sup>[ l ]</sup> )</b><br/>and<br/>
<b>activation_cache = ( Z<sup>[ l ]</sup> )</b><br/>