# Implementing a Neural Network
In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.

### Babysitting Training

1. Overfit on very small data
2. Loss not going down
    - Learning Rate must be low
    - Very high learning rate is usually NaN cost
3. If cost is ever > 3*initial cost, break out early.
4. Coarse to fine hyperparameter search.
5. Random Search is always better than grid search.
6. Large difference in training and validation accuracy = **overfitting**
7. Track ratios of weight update/weight magnitudes
    - Around 0.001 is a good ratio

In [4]:
import numpy as np
import matplotlib.pyplot as plt

# from cs231n.classifiers.neural_net import TwoLayerNet

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [5]:
from cs231n.data_utils import load_CIFAR10

    # Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

In [6]:
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)


In [12]:
#### NEURAL NET ##########

from Functions.classifiers import two_layer_net

nn = two_layer_net(50, X_train.shape[1], 10)
loss = nn.SGD(X_train, y_train, 5e-4, 1e1, 1000, 64, True)

print("Final Loss : ", loss)

In the 1th iteration, loss : 176.853645
In the 11th iteration, loss : 218.765800
In the 21th iteration, loss : 116.722197
In the 31th iteration, loss : 112.086983
In the 41th iteration, loss : 90.681610
In the 51th iteration, loss : 89.931758
In the 61th iteration, loss : 66.552978
In the 71th iteration, loss : 64.587511
In the 81th iteration, loss : 60.801729
In the 91th iteration, loss : 52.725108


  loss = (- np.sum(np.log(expfy))) / X.shape[0]


In the 101th iteration, loss : 44.307454
In the 111th iteration, loss : 43.852887
In the 121th iteration, loss : 37.192822
In the 131th iteration, loss : 42.315814
In the 141th iteration, loss : 38.761599
In the 151th iteration, loss : 31.538834
In the 161th iteration, loss : 29.486593
In the 171th iteration, loss : 28.440400
In the 181th iteration, loss : 26.233926
In the 191th iteration, loss : 23.566439
In the 201th iteration, loss : 22.706628
In the 211th iteration, loss : 20.875978
In the 221th iteration, loss : 19.353643
In the 231th iteration, loss : 18.503162
In the 241th iteration, loss : 18.283325
In the 251th iteration, loss : 16.718199
In the 261th iteration, loss : 15.386514
In the 271th iteration, loss : 14.962014
In the 281th iteration, loss : 14.326262
In the 291th iteration, loss : 14.103403
In the 301th iteration, loss : 13.129803
In the 311th iteration, loss : 12.580491
In the 321th iteration, loss : 12.079309
In the 331th iteration, loss : 11.440547
In the 341th ite

In [13]:
predicted, acc = nn.predict(X_val, y_val)

('Training Accuracy : ', 0.39000000000000001)


### Hyperparameter Tuning

1. Small Random Valued Initialization


Learning Rate | Regularization | Final Loss | Acc on Val Set
--- | --- | ---
1e-4| 2.5e-1 | 2.027 | 0.357
1e-4| 1e-1 | 1.944 | 0.357

2. He et al 2015, unit variance outputs on RELU activations

Learning Rate | Regularization | Final Loss | Acc on Val Set | Comments
--- | --- | --- | ---
1e-4| 2.5e-1 | 25.898 | 0.246
1e-4| 1e-1 | 18.767 | 0.277
2.5e-4| 1e-1 | 10.124 | 0.337
5e-4| 1e-1 | 2.468 | 0.3589 | Cost function frequently NaN
5e-4| 1e0 | 4.678 | 0.400 | Cost function NaN once
5e-4| 1e1 | 2.679 | 0.373 | Cost function NaN once

### To-Do

1. Implement Batch Norm
2. Make Mini-Batch Not Random, but in order (to implement epochs)