# Deep Neural network implementation

This notebook has end to end deep neural network implementation from scratch.

1.This implementation uses only sigmoid activation

2.Optimization techniques implemented are classic gradient descent, gradient descent with momentum, RMSProp and Adam optimization. All of these are mini-batched by default

3.The deep neural network is implemented only with L2 - Regularization

Note: The notebook has many similarities to the programming assignments from Coursera Deep learning courses because that is where I learned it.

In [2]:
import numpy as np
import math

Sigmoid function: The expected input is a scalar or a vector or a tensor of any shape.

Returns sigmoid of each of the element

In [4]:
def sigmoid(z):
    act = 1/(1+np.exp(-z))
    
    return act

Sigmoid derivative function: The expected input is a scalar or a vector or a tensor of any shape.

Returns derivative of sigmoid that is g'(a) at the given point 'a' for each of the element

In [7]:
def sigmoid_derivative(a):
    return np.multiply(a, (1-a))

Initialize weights: This method is called for each layer. It expects two arguments that is number of input nodes and number of output nodes. The weights are initialized using Xavier initialization. We are using only sigmoid activations. The biases are intialized to zero

Returns both Weights and biases.

In [9]:
def initialize_parameters(n_input_nodes, n_output_nodes):
    W = np.random.randn(n_output_nodes, n_input_nodes)*np.sqrt(1/n_input_nodes)
    b = np.zeros(shape=[n_output_nodes, 1])
    
    return W, b

Creating layers: This method will create hidden layers by initializing weights with appropriate dimensions. The expected argument is a list of layer sizes. The list will contain both the number of units in input layer and number of units in the output layer. These two are actually appended to the hidden layers list in the train_nn_model method.

Returns a dictionary of parameters with keys as Wi and bi corresponding to weights and biases of i th layer



In [11]:
def create_layers(layers_list):
    parameters = {}
    for idx in range(1, len(layers_list)):
        parameters['W'+str(idx)], parameters['b'+str(idx)] = initialize_parameters(layers_list[idx-1], layers_list[idx])
    
    return parameters

Forward propagation: The expected arguments X and parameters have to follow the shape requirements.

X' is expected to have [n_features, n_observations]
parameters have two types W, b
'Wi' are expected to have shape [n[i], n[i-1]]
'bi' are expected to have shape [n[i], 1]
All the returned activations will have shape [n[i], 1]. Where, n[i] indicates number of units in the ith layer.

Forward porpagation equations: First/Input layer is considered to be 0th or, mathematically A[0] = X

Where, l ranges from 0 to L where, L is index of output layer.

In [13]:
def forward_propagation(X, parameters):
    activations = {}
    activations['A0'] = X
    limit = 1 + len(parameters)//2
    for idx in range(1, limit):
        Z = np.dot(parameters['W'+str(idx)], activations['A'+str(idx-1)]) + parameters['b'+str(idx)]
        activations['A'+str(idx)] = sigmoid(Z)

    return activations, activations['A'+str(limit-1)]

Compute cost: The expected arguments are the computed or predicted labels and actual labels. The requirements is both the inputs are supposed to have shape [1, n_observation]

Returns average log loss for the given input vector of observations

In [15]:
def compute_cost(act, y, parameters, lambd, regularization='l2'):
    m = y.shape[1]
    logprobs = np.multiply(np.log(act), y) + np.multiply(np.log(1-act), 1-y)
    
    if regularization == 'l2':
        reg_term = 0
        for i in range(1, 1+ len(parameters)//2):
            reg_term += np.sum(np.square(parameters['W'+str(i)]))
        reg_term = (lambd/(2*m))*reg_term
    else:
        reg_term = 0
    
    return (-1/m)*np.sum(logprobs) + reg_term

Backward Propagation: Perform backward propagation with the given parameters and activations after forward propagation. The expected arguments are actual labels y, parameters (Wi and bi) and activations (Ai) after forward propagation. Of course all the dimensions should be appropriate.

Returns a dictionary of gradients with keys dWi and dbi for each layer 'i' which can be used to perform gradient descent.

In [17]:
def backward_propagation(y, parameters, activations, lambd, regularization='l2'):
    m = y.shape[0]
    last = len(parameters)//2
    gradients = {}
    dz_dict = {}
    dz_dict['dZ'+str(last)] = activations['A'+str(last)] - y
    for l in range(last-1, 0, -1):
        dz_dict['dZ'+str(l)] = np.multiply(sigmoid_derivative(activations['A'+str(l)]), np.dot(parameters['W'+str(l+1)].T, dz_dict['dZ'+str(l+1)]))
    for i in range(last, 0, -1):
        gradients['dW'+str(i)] = (1/m) * np.dot(dz_dict['dZ'+str(i)], activations['A'+str(i-1)].T)
        gradients['db'+str(i)] = (1/m) * np.sum(dz_dict['dZ'+str(i)], axis=1, keepdims=True)

    if regularization == 'l2':
        for i in range(1, 1+ len(parameters)//2):
            gradients['dW'+str(i)] = gradients['dW'+str(i)] + (lambd/m)*parameters['W'+str(i)]
            
    return gradients

Update parameters: will update the parameters based on their existing value and gradients given a learning rate. The expected arguments are parameters, gradients and learning_rate.

Returns dictionary of updated parameters in the same format as the input parameters

In [19]:
# Classical gradient descent batch/mini-batch
def update_parameters(parameters, gradients, learning_rate):
    lim = 1 + len(parameters)//2
    for i in range(1, lim):
        parameters['W'+str(i)] -= learning_rate*gradients['dW'+str(i)]
        parameters['b'+str(i)] -= learning_rate*gradients['db'+str(i)]
        
    return parameters

In [21]:
# Initialization for Gradient descent with momentum
def initialize_momentum_params(parameters):
    lim = 1 + len(parameters)//2
    v = {}
    for i in range(1, lim):
        v['dW'+str(i)] = np.zeros(parameters['W'+str(i)].shape)
        v['db'+str(i)] = np.zeros(parameters['b'+str(i)].shape)
        
    return v

# Gradient descent with momentum
def update_parameters_momentum(parameters, gradients, learning_rate, v, beta=0.9):
    lim = 1 + len(parameters)//2
    for i in range(1, lim):
        v['dW'+str(i)] = beta*v['dW'+str(i)] + (1-beta)*gradients['dW'+str(i)]
        v['db'+str(i)] = beta*v['db'+str(i)] + (1-beta)*gradients['db'+str(i)]
        parameters['W'+str(i)] -= learning_rate*v['dW'+str(i)]
        parameters['b'+str(i)] -= learning_rate*v['db'+str(i)]
        
    return parameters

In [23]:
# Initialization for Gradient descent with RMSProp
def initialize_rmsprop_params(parameters):
    lim = 1 + len(parameters)//2
    s={}
    for i in range(1, lim):
        s['dW'+str(i)] = np.zeros(parameters['W'+str(i)].shape)
        s['db'+str(i)] = np.zeros(parameters['b'+str(i)].shape)
        
    return s

# Gradient descent with RMSProp implementation
def update_parameters_rmsprop(parameters, gradients, learning_rate, s, t, beta=0.999, epsilon=1e-8):
    s_corrected={}
    lim = 1 + len(parameters)//2
    for i in range(1, lim):
        s['dW'+str(i)] = beta*s['dW'+str(i)] + (1-beta)*gradients['dW'+str(i)]**2
        s['db'+str(i)] = beta*s['db'+str(i)] + (1-beta)*gradients['db'+str(i)]**2
        
        s_corrected["dW" + str(i)] = s["dW"+str(i)]/(1-beta**t)
        s_corrected["db" + str(i)] = s["db"+str(i)]/(1-beta**t)
        
        parameters['W'+str(i)] -= learning_rate*1/(epsilon + np.sqrt(s_corrected['dW'+str(i)]))
        parameters['b'+str(i)] -= learning_rate*1/(epsilon + np.sqrt(s_corrected['db'+str(i)]))
        
    return parameters

In [25]:
# Initialization for Adam optimization
def initialize_adam_params(parameters):
    lim = 1 + len(parameters)//2
    v = {};s={}
    for i in range(1, lim):
        v['dW'+str(i)] = np.zeros(parameters['W'+str(i)].shape)
        s['dW'+str(i)] = np.zeros(parameters['W'+str(i)].shape)
        v['db'+str(i)] = np.zeros(parameters['b'+str(i)].shape)
        s['db'+str(i)] = np.zeros(parameters['b'+str(i)].shape)
    return v, s

# Adam optimization implementation
def update_parameters_adam(parameters, gradients, learning_rate, v, s, t, beta1=0.9, beta2=0.999, epsilon=1e-8):
    v_corrected={};s_corrected={}
    lim = 1 + len(parameters)//2
    for i in range(1, lim):
        v['dW'+str(i)] = beta1*v['dW'+str(i)] + (1-beta1)*gradients['dW'+str(i)]
        v['db'+str(i)] = beta1*v['db'+str(i)] + (1-beta1)*gradients['db'+str(i)]

        s['dW'+str(i)] = beta2*s['dW'+str(i)] + (1-beta2)*gradients['dW'+str(i)]**2
        s['db'+str(i)] = beta2*s['db'+str(i)] + (1-beta2)*gradients['db'+str(i)]**2
        
        v_corrected["dW" + str(i)] = v["dW"+str(i)]/(1-beta1**t)
        v_corrected["db" + str(i)] = v["db"+str(i)]/(1-beta1**t)
        
        s_corrected["dW" + str(i)] = s["dW"+str(i)]/(1-beta2**t)
        s_corrected["db" + str(i)] = s["db"+str(i)]/(1-beta2**t)
        
        parameters['W'+str(i)] -= learning_rate*(v_corrected["dW"+str(i)]/(epsilon + np.sqrt(s_corrected["dW"+str(i)])))
        parameters['b'+str(i)] -= learning_rate*(v_corrected["db"+str(i)]/(epsilon + np.sqrt(s_corrected["db"+str(i)])))
        
    return parameters

Predict output: This method will return the prediction for a given data. The data can be any number of observations that match [n_features, n_observations]

Returns a vector of shape [1, n_observations] containing 0 if the predicted probability is less than or equal 0.5 and 1 otherwise

In [27]:
def predict(parameters, X):
    _, op = forward_propagation(X, parameters)
    predictions = np.zeros(op.shape)
    predictions[op > 0.5] = 1;
    
    return predictions

Generate mini batches: This method randomizes both input features and labels and returns batches of given size

In [29]:
def generate_random_mini_batches(X, Y, mini_batch_size = 64):
    m = X.shape[1]
    mini_batches = []

    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation].reshape((1,m))

    num_complete_minibatches = 1 + math.floor(m/mini_batch_size)
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[:, k*mini_batch_size:(k+1)*mini_batch_size]
        mini_batch_Y = shuffled_Y[:, k*mini_batch_size:(k+1)*mini_batch_size]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    return mini_batches

Train Neural network: This method expects at least X, y and hidden_layers. If no hidden layer is give, a logistic regression is performed. X has to have shape [n_features, n_observations] and y has to have shape [1, n_observation] hidden_layers is a list of number of neurons excluding input and output layer.

Returns parameters or weights of a trained model.

In [31]:
def train_nn_model(X, y, hidden_layers=[], epochs=1500, learning_rate=0.0001, lambd=2.0, regularization='l2', optimization='adam', beta1=0.9, beta2=0.999, epsilon=1e-8, t=2):

    layers = [X.shape[0]]
    layers.extend(hidden_layers)
    layers.append(1)
    
    # Initializing parameters
    parameters = create_layers(layers_list=layers)
    
    if optimization == 'rmsprop':
        s = initialize_rmsprop_params(parameters)
    elif optimization == 'adam':
        v, s = initialize_adam_params(parameters)
    elif optimization == 'momentum':
        v = initialize_momentum_params(parameters)
    
    mini_batch_size = 64
    for epoch in range(1, 1+epochs):
        
        cost = 0
        minibatches = generate_random_mini_batches(X, y, mini_batch_size)

        for minibatch in minibatches:

            # Select a minibatch
            (minibatch_X, minibatch_y) = minibatch
        
            # forward propagation
            activations, prediction = forward_propagation(minibatch_X, parameters)

            # backward propagation
            grads = backward_propagation(minibatch_y, parameters, activations, lambd, regularization)

            # calculate cost
            cost += compute_cost(prediction, minibatch_y, parameters, lambd, regularization)
            
            #updating parameters
            if optimization == 'rmsprop':
                parameters = update_parameters_rmsprop(parameters, grads, learning_rate, s, t, beta2, epsilon)
            elif optimization == 'adam':
                parameters = update_parameters_adam(parameters, grads, learning_rate, v, s, t, beta1, beta2, epsilon)
            elif optimization == 'momentum':
                parameters = update_parameters_momentum(parameters, grads, learning_rate, v, beta1)
            else:
                parameters = update_parameters(parameters, grads, learning_rate)
        
        if epoch % 100 == 0:
            print ("Average training cost after Epoch {}: {}".format(epoch, cost/len(minibatches)))        
            
    return parameters

Accuracy: This method expects input features, labels and trained parameters of appropriate shape.

Returns average prediction rate.

In [35]:
def accuracy(X, y, parameters):
    m = X.shape[1]
    p = np.zeros((1,m))
    _, act = forward_propagation(X, parameters)
    for i in range(0, act.shape[1]):
        if act[0,i] > 0.5:
            p[0,i] = 1
        else:
            p[0,i] = 0
   
    return np.mean((p[0,:] == y[0,:]))

In [39]:
import h5py

def test():
    train_dataset = h5py.File('C:\\Users\\barca\\Desktop\\Andrew NG\\A\\train_catvnoncat.h5', "r")
    train_x = np.array(train_dataset["train_set_x"][:])
    train_x = train_x.reshape(train_x.shape[0], -1).T
    train_y = np.array(train_dataset["train_set_y"][:])
    train_y = train_y.reshape(1, train_y.shape[0])
    
    test_dataset = h5py.File('C:\\Users\\barca\\Desktop\\Andrew NG\\A\\train_catvnoncat.h5', "r")
    test_x = np.array(test_dataset["test_set_x"][:])
    test_x = test_x.reshape(test_x.shape[0], -1).T
    test_y = np.array(test_dataset["test_set_y"][:])
    test_y = test_y.reshape(1, test_y.shape[0])
    
    hidden_layers = [10]
    train_x = train_x/255
    test_x = test_x/255
    parameters = train_nn_model(train_x, train_y, hidden_layers=hidden_layers)

    print ("Accuracy on the training set:", accuracy(train_x, train_y, parameters))
    print ("Accuracy on the test set:", accuracy(test_x, test_y, parameters))

test()



