# Implementing a Neural Network
In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.

### Babysitting Training

1. Overfit on very small data
2. Loss not going down
    - Learning Rate must be low
    - Very high learning rate is usually NaN cost
3. If cost is ever > 3*initial cost, break out early.
4. Coarse to fine hyperparameter search.
5. Random Search is always better than grid search.
6. Large difference in training and validation accuracy = **overfitting**
7. Track ratios of weight update/weight magnitudes
    - Around 0.001 is a good ratio

In [7]:
import numpy as np
import matplotlib.pyplot as plt

# from cs231n.classifiers.neural_net import TwoLayerNet

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [10]:
from cs231n.data_utils import load_CIFAR10

    # Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

In [22]:
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)


In [100]:
#### NEURAL NET ##########

from classifiers import two_layer_net_BN

nn = two_layer_net_BN(50, X_train.shape[1], 10)
loss = nn.SGD(X_train, y_train, 5e-4, 1e1, 1000, 64, True)

print("Final Loss : ", loss)

('y1 : ', (64, 50))
('f : ', (64, 10))
('dLdf : ', (64, 10))
('dLdy1 : ', (64, 50))
('dLdh1_hat : ', (64, 50))
((64, 50), (64, 3072))
('dLdsigma_bsq : ', (), (50, 3072))


ValueError: operands could not be broadcast together with shapes (50,3072) (64,3072) 

In [89]:
predicted, acc = nn.predict(X_val, y_val)

('Training Accuracy : ', 0.373)


### Hyperparameter Tuning

1. Small Random Valued Initialization


Learning Rate | Regularization | Final Loss | Acc on Val Set
--- | --- | ---
1e-4| 2.5e-1 | 2.027 | 0.357
1e-4| 1e-1 | 1.944 | 0.357

2. He et al 2015, unit variance outputs on RELU activations

Learning Rate | Regularization | Final Loss | Acc on Val Set | Comments
--- | --- | --- | ---
1e-4| 2.5e-1 | 25.898 | 0.246
1e-4| 1e-1 | 18.767 | 0.277
2.5e-4| 1e-1 | 10.124 | 0.337
5e-4| 1e-1 | 2.468 | 0.3589 | Cost function frequently NaN
5e-4| 1e0 | 4.678 | 0.400 | Cost function NaN once
5e-4| 1e1 | 2.679 | 0.373 | Cost function NaN once

### To-Do

1. Implement Batch Norm
2. Make Mini-Batch Not Random, but in order (to implement epochs)

In [5]:
# ############################# DATA PREPROCESSING ###############################
# # X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# X_mean = np.mean(X_train, axis=0).reshape(3072)
# X_std = np.std(X_train, axis=0).reshape(3072)

# X_trial = X_train[np.random.choice(X_train.shape[0], 64),:]

# X_plot = X_trial.reshape(X_trial.shape[0],32,32,3)

# #### Flatten #####
# X_trial = np.reshape(X_trial, (X_trial.shape[0], -1))

# #### Zero Mean ####
# X_trial -= X_mean

# #### Normalization ####
# X_trial /= X_std

# #### Principal Components Analysis ####
# cov = np.dot(X_trial.T, X_trial)/X_trial.shape[0]

# U, S, V = np.linalg.svd(cov)

# #X_plot = X_train.reshape(X_train.shape[0], 32, 32, 3)


# for i in range(64):
#     plt.subplot(8,8,i+1)
#     #plt.imshow(X_plot[i,:,:,:].astype('uint8'))
#     plt.imshow((X_trial[i,:]*X_std + X_mean).reshape(32,32,3).astype('uint8'))
#     plt.axis('off')
    
# plt.show()

In [6]:
# print(U.shape, S.shape, V.shape)

# #U_red = U[:,:100] ######Keeping first 100 eigenvectors

# X_rot = np.dot(X_trial, U)
# X_rec = np.dot(X_rot, U.T)

# X_rot_reduced = np.dot(X_trial, U[:,:144])

# U_red = U
# U_red[:,100:] = 0

# X_rec_reduced = np.dot(X_rot_reduced, U.T[:144,:])
# #X_plot = np.dot(X_rot, U.T[:100,:])

# for i in range(64):
#     plt.subplot(8,8,i+1)
#     plt.imshow((X_rec_reduced[i,:]*X_std + X_mean).reshape(32,32,3).astype('uint8'))
#     plt.axis('off')
    
# plt.show()