## Implementing Neural Network

In this notebook, I want to implement a neural network from scratch to understand its architecture. I will implement a simple one layer neural network with two units. 

Here is how the network will look like (of course, we will be dealing with more inputs!).

<img src="img/implementing_nnetwork/one_hidden_layer.png" width="300px">

In [55]:
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

In [10]:
mnist = datasets.load_digits()

In [50]:
data = mnist.data
targets = mnist.target

In [52]:
data.shape

(1797, 64)

In [49]:
def image_standarization(x):
    """
    Normalize a list of sample image data in the range of 0 to 1
    : x: List of image data.  The image shape is (32, 32, 3)
    : return: Numpy array of normalize data
    """
    x_demean = x - np.mean(x)
    adjusted_sd = np.maximum(np.std(x), 1.0/np.sqrt(np.prod(x.shape)))
    return  x_demean / adjusted_sd

In [54]:
#Standardizing data
data = np.apply_along_axis(image_standarization, 1, data)

In [58]:
#One hot encoding data
encoder = LabelBinarizer()
encoder.fit(targets)
targets = encoder.transform(targets)

In [60]:
#Separating train / test / validation
image, image_test, y, y_test = train_test_split(data, targets, test_size=0.2, train_size=0.8)
image_train, image_val, y_train, y_val = train_test_split(image, y,test_size = 0.25, train_size =0.75)

### Creating useful functions for out network

In [211]:
def gen_weights(input_size, output_size):
    "Generates weights for different units"
    return  np.random.uniform(size=(input_size, output_size))

In [22]:
def apply_relu(matrix):
    "Applies relu to matrix"
    return np.maximum(0, matrix)

In [96]:
def apply_softmax(matrix):
    exp = np.exp(matrix) 
    soft_max = exp / np.sum(exp, axis=1, keepdims=True)
    return soft_max

In [61]:
def d_apply_softmax(soft_res, n_training):
    soft_res[range(n_training), np.argmax(y_train, axis=1)] -= 1
    return soft_res

## Building the neural network

Our neural network will have two activation functions. First we will generate a matrix multiplication between inputs and first layer and, then, we will apply a ReLU. Then, after the ReLU we will further apply weights and a softmax layers that will produce the probabilities of each of the 10 labels that we are trying to predict.

In [62]:
image_train.shape

(1077, 64)

In [213]:
n_training = len(image_train)
n_classes = 10
step_size = 0.0001
epochs = 10000

#Creating forward pass
bias_1 = np.zeros((1,20))
bias_2 = np.zeros((1,20))
W1 = gen_weights(64, 20) 
W2 = gen_weights(64, 20)  
W3 = gen_weights(20,10)
W4 = gen_weights(20,10)

for e in range(epochs): 
    W1 = W1 + bias_1
    W2 = W2 + bias_2
    h1 = image_train.dot(W1)
    h2 = image_train.dot(W2)
    relu_1 = apply_relu(h1)
    relu_2 = apply_relu(h2)
    h3 = relu_1.dot(W3) + relu_2.dot(W4) # Make final class a weighted average of the two units
    out_soft = apply_softmax(h3)
    
    
    #Calculate Loss
    correct_labels_prob = out_soft[range(n_training), np.argmax(y_train, axis=1)]
    loss = np.sum(-np.log(correct_labels_prob)) / n_training
    
    if e%100==0:
        print("Los: {}".format(loss))
    
    ## Backpropagation
    
    #Calculating gradient
    g_probs = np.copy(out_soft)
    g_probs[range(n_training), np.argmax(y_train, axis=1)] -= 1
    
    #Get the average of the gradients
    g_probs = g_probs / n_training
    
    #Calculating gradients for W3 and W4
    dW3 = relu_1.T.dot(g_probs)
    dW4 = relu_2.T.dot(g_probs)
    
    #Calculating hidden gradients
    dhidden1 = np.dot(g_probs, W3.T)
    dhidden2 = np.dot(g_probs, W4.T)
    
    #Calculating gradients for ReLU's
    dhidden1[relu_1 <= 0] = 0
    dhidden2[relu_2 <= 0] = 0
    #
    ##Calculating gradients for initial weights
    dW1 = image_train.T.dot(dhidden1)
    db1 = np.sum(dhidden1, axis=0, keepdims=True)
    #
    dW2 = image_train.T.dot(dhidden2)
    db2 = np.sum(dhidden2, axis=0, keepdims=True)
    #
    ##Updating weights
    W3 += -step_size*dW3
    W4 += -step_size*dW4
    #
    W1 += -step_size*dW1
    bias_1 += -step_size*db1
    W2 += -step_size*dW2
    bias_2 += -step_size*db2




Los: 6.333350786928165
Los: 6.041106890495742
Los: 5.774669561004851
Los: 5.531172408236043
Los: 5.307641849105211
Los: 5.101382557714504
Los: 4.911126902147293
Los: 4.735656890547585
Los: 4.573294784712072
Los: 4.42286764280514
Los: 4.283652122444916
Los: 4.154932314978516
Los: 4.035845631603817
Los: 3.925631980881595
Los: 3.8236855085736976
Los: 3.729455774348733
Los: 3.642242221422941
Los: 3.561163833207815
Los: 3.4854955981524194
Los: 3.414636648024571
Los: 3.347899967393138
Los: 3.284723763829461
Los: 3.224756217131422
Los: 3.167557697361559
Los: 3.1125994900671987
Los: 3.0597425484675003
Los: 3.0088030456640302
Los: 2.959600896325475
Los: 2.911950264952808
Los: 2.865556154533466
Los: 2.8204112140722093
Los: 2.7763357485355664
Los: 2.733440566319764
Los: 2.6915214321390444
Los: 2.6506386769905483
Los: 2.6106988296458034
Los: 2.5716455204503523
Los: 2.5334660278300696
Los: 2.4960925123222926
Los: 2.4595503801064655
Los: 2.423804468175298
Los: 2.388855109934094
Los: 2.35468163889650

In [202]:
gen_weights(64, 100)

array([[ 0.00045548,  0.00585098,  0.00780963, ...,  0.00614043,
         0.00200482,  0.00694802],
       [ 0.00433465,  0.00356555,  0.00332302, ...,  0.0026552 ,
         0.00188209,  0.00321819],
       [ 0.00129413,  0.00867749,  0.0051469 , ...,  0.00723382,
         0.00159762,  0.00026414],
       ..., 
       [ 0.00672997,  0.00992511,  0.00703627, ...,  0.00641349,
         0.00477615,  0.00601461],
       [ 0.00598383,  0.00745225,  0.00247687, ...,  0.00969738,
         0.00339087,  0.00784775],
       [ 0.0036742 ,  0.00792733,  0.00743393, ...,  0.00898041,
         0.00773551,  0.00505815]])

In [125]:
np.sum(np.array([[1,2], [3,4]]))

10