# Implementing a simple two_layer Neural Network

This example is an exercise from open course CS231n Convolutional Neural Networks for Visual Recognition from Stanford university: http://cs231n.stanford.edu/

The credit will be given to the authors of the course material.



In this work we will develop a neural network with fully-connected layers to perform classification.

The NN model consists of two layers, and a final classifier/loss caculation: 
1 A fully connected layer, with ReLN nonlinearity:

Fully connected layer: output_vector = input_vector * weight_matrix + bias_vector;

ReLN nonlinearity : output_vector = max(0, input_vector);
        
In mathmatical expression, the output vector $h_1$ of this layer is:
       $$ h1 = max( 0, (x * W1 + b1) )$$ 
where $x$ is the input vector (a sample), $W1$ and $b1$ are weight matrix and bias vector respectively.
    
2 A fully connected layer.
$$ h2 = h1 * W2 + b2$$


3 The final loss of the output classifier (the weight that it predicts model INCORRECTLY) uses softmax classifier. The softmax classifier means, the element at index $i$ in output vector ($h_i$) equals its exponential probability in the output vector: 
$$ h3_i = \frac{exp(h2_{i})} {\sum\limits_{j} exp(h2_j)} $$

The final loss equals to the negative value of logarithm of $h$ at correct classifier index:
For a sample $x$ whose correct classifier is $y$, its loss is:

 $$ L =  - log(h3_y) = -   log(    \frac{exp(h2_{y})} {\sum\limits_{j} exp(h2_j)}      )           $$  


As a example, if an input sample vector $x$ has its correct classifier $y=1$, and the output $h3$ classifier is a 5-element vector, then the loss on this sample is the negative value of logarithm of $h$ at index 1:
$$ L =  - log(h3_y) = -log(h3_1) $$


Further notice, if multiple samples are considered, the final loss is the average loss from each sample.

        
    


In [14]:
# A bit of setup

import numpy as np
import matplotlib.pyplot as plt

from neural_net import TwoLayerNet

from __future__ import print_function

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

We will use the class `TwoLayerNet` in the file `neural_net.py` to represent instances of our network. The network parameters are stored in the instance variable `self.params` where keys are string parameter names and values are numpy arrays. Below, we initialize toy data and a toy model that we will use to develop your implementation.

In [15]:
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
    np.random.seed(0)
    return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
    np.random.seed(1)
    X = 10 * np.random.randn(num_inputs, input_size)
    y = np.array([0, 1, 2, 2, 1])
    return X, y

net = init_toy_model()
X, y = init_toy_data()

In [16]:
# X is the input matrix.
# each row is a sample
# multiple rows means we test multiple samples simultaneously
print(X)

[[ 16.24345364  -6.11756414  -5.28171752 -10.72968622]
 [  8.65407629 -23.01538697  17.44811764  -7.61206901]
 [  3.19039096  -2.49370375  14.62107937 -20.60140709]
 [ -3.22417204  -3.84054355  11.33769442 -10.99891267]
 [ -1.72428208  -8.77858418   0.42213747   5.82815214]]


In [17]:
# y is the correct classifier for each sample
print(y)

[0 1 2 2 1]


In [18]:
print(net.params['W1'])

[[ 0.17640523  0.04001572  0.0978738   0.22408932  0.1867558  -0.09772779
   0.09500884 -0.01513572 -0.01032189  0.04105985]
 [ 0.01440436  0.14542735  0.07610377  0.0121675   0.04438632  0.03336743
   0.14940791 -0.02051583  0.03130677 -0.08540957]
 [-0.25529898  0.06536186  0.08644362 -0.0742165   0.22697546 -0.14543657
   0.00457585 -0.01871839  0.15327792  0.14693588]
 [ 0.01549474  0.03781625 -0.08877857 -0.19807965 -0.03479121  0.0156349
   0.12302907  0.12023798 -0.03873268 -0.03023028]]


In [19]:
print(net.params['b1'])

[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9]


In [20]:
print(net.params['W2'])

[[-0.1048553  -0.14200179 -0.17062702]
 [ 0.19507754 -0.05096522 -0.04380743]
 [-0.12527954  0.07774904 -0.16138978]
 [-0.02127403 -0.08954666  0.03869025]
 [-0.05108051 -0.11806322 -0.00281822]
 [ 0.04283319  0.00665172  0.03024719]
 [-0.06343221 -0.03627412 -0.06724604]
 [-0.03595532 -0.08131463 -0.17262826]
 [ 0.01774261 -0.04017809 -0.16301983]
 [ 0.04627823 -0.09072984  0.00519454]]


In [21]:
print(net.params['b2'])

[ 0.   0.2  0.4]


# Forward pass: compute scores
Open the file `neural_net.py`, see the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.

In [22]:
scores = net.loss(X)
print('Your scores:')
print(scores)

Your scores:
[[-0.82172636 -1.2186243  -0.328118  ]
 [-0.1673226  -1.16037191 -0.22064339]
 [-0.50381418 -0.9880024  -0.5997831 ]
 [-0.15021874 -0.45863519 -0.27655848]
 [ 0.03443199 -0.07557897  0.08156292]]


# Forward pass: compute loss
In the same function, the second part computes the data and regularizaion loss.

In [23]:
loss, _ = net.loss(X, y, reg=0.05)
print ("loss =", loss)
print ("loss for each sample =", net.loss_eachsample)

loss = 1.26619342637
loss for each sample = [ 1.19713533  1.83397888  1.02208699  1.08795739  1.18980855]


# Backward pass
Compute the gradient of the loss with respect to the variables `W1`, `b1`, `W2`, and `b2`.

In [27]:
loss, grads = net.loss(X, y, reg=0.05)


# Train the network
To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function `TwoLayerNet.train` and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement `TwoLayerNet.predict`, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.

In [12]:
net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=5e-6,
            num_iters=100, verbose=False)

print('Final training loss: ', stats['loss_history'][-1])

Final training loss:  0.0155149278736
