## Implementing Neural Network

In this notebook, I want to implement a neural network from scratch to understand its architecture. I will implement a simple one layer neural network with two units. 

Here is how the network will look like (of course, we will be dealing with more inputs!).

<img src="img/implementing_nnetwork/one_hidden_layer.png" width="300px">

In [55]:
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

In [10]:
mnist = datasets.load_digits()

In [50]:
data = mnist.data
targets = mnist.target

In [52]:
data.shape

(1797, 64)

In [49]:
def image_standarization(x):
    """
    Normalize a list of sample image data in the range of 0 to 1
    : x: List of image data.  The image shape is (32, 32, 3)
    : return: Numpy array of normalize data
    """
    x_demean = x - np.mean(x)
    adjusted_sd = np.maximum(np.std(x), 1.0/np.sqrt(np.prod(x.shape)))
    return  x_demean / adjusted_sd

In [54]:
#Standardizing data
data = np.apply_along_axis(image_standarization, 1, data)

In [58]:
#One hot encoding data
encoder = LabelBinarizer()
encoder.fit(targets)
targets = encoder.transform(targets)

In [60]:
#Separating train / test / validation
image, image_test, y, y_test = train_test_split(data, targets, test_size=0.2, train_size=0.8)
image_train, image_val, y_train, y_val = train_test_split(image, y,test_size = 0.25, train_size =0.75)

### Creating useful functions for out network

In [75]:
def gen_weights(input_size, output_size):
    "Generates weights for different units"
    return np.random.randn(input_size, output_size)

In [22]:
def apply_relu(matrix):
    "Applies relu to matrix"
    return np.maximum(0, matrix)

In [38]:
def apply_relu_der(matrix):
    return 1. * (matrix > 0)

In [96]:
def apply_softmax(matrix):
    exp = np.exp(matrix) 
    soft_max = exp / np.sum(exp, axis=1, keepdims=True)
    return soft_max

In [61]:
def d_apply_softmax(soft_res, n_training):
    soft_res[range(n_training), np.argmax(y_train, axis=1)] -= 1
    return soft_res

## Building the neural network

Our neural network will have two activation functions. First we will generate a matrix multiplication between inputs and first layer and, then, we will apply a ReLU. Then, after the ReLU we will further apply weights and a softmax layers that will produce the probabilities of each of the 10 labels that we are trying to predict.

In [62]:
image_train.shape

(1077, 64)

In [154]:
n_training = len(image_train)
n_classes = 10
learning_rate = 0.001
#Creating forward pass
bias_1 = np.zeros((1,10))
bias_2 = np.zeros((1,10))
W1 = gen_weights(64, 10) + bias_1 
W2 = gen_weights(64, 10) + bias_2  
h1 = image_train.dot(W1)
h2 = image_train.dot(W2)
relu_1 = apply_relu(h1)
relu_2 = apply_relu(h2)
W3 = gen_weights(10,10)
W4 = gen_weights(10,10)
h3 = relu_1.dot(W3) + relu_2.dot(W4) # Make final class a weighted average of the two units
out_soft = apply_softmax(h3)

#Calculate Loss
correct_labels_prob = out_soft[range(n_training), np.argmax(y_train, axis=1)]
loss = np.sum(-np.log(correct_labels_prob)) / n_training

## Backpropagation

#Calculating gradient
g_probs = np.copy(out_soft)
g_probs[range(n_training), np.argmax(y_train, axis=1)] -= 1

#Get the average of the gradients
g_probs = g_probs / n_training

#Calculating gradients for W3 and W4
dW3 = relu_1.T.dot(g_probs)
dW4 = relu_2.T.dot(g_probs)

#Calculating gradients for ReLU's
dr1 = apply_relu_der(relu_1).dot(dW3.T)
dr2 = apply_relu_der(relu_2).dot(dW4.T)

#Calculating gradients for initial weights
#dW = np.dot(X.T, dW1)
dW1 = image_train.T.dot(dr1)
db1 = np.sum(dr1, axis=0, keepdims=True)

dW2 = image_train.T.dot(dr2)
db2 = np.sum(dr2, axis=0, keepdims=True)


#Updating gradients

In [119]:
h1.shape

(1077, 10)

In [125]:
np.sum(np.array([[1,2], [3,4]]))

10