# Building a simple neural-network with Keras

This is a simple quick-start in performing image recognition in a neural network in Keras. Parts of the tutorial are adapted from Xavier Snelgrove at the University of Toronto.

<a href="https://www.tensorflow.org">TensorFlow</a> is a software library for efficient numerical computations, and is particularly well suited for deep neural networks. It includes several functionalities, such as running code on multiple CPUs/GPUs, and automatic gradient computations and training, among others. 

TensorFlow's computation model is a bit idiosyncratic. A computation is represented as a graph, where the nodes are operations, and edges are connections between operations. A graph must be created first, and then launched to "lazily" execute the computations. 

This computation model is very powerful, but has a steep learning curve. Therefore, instead of using TensorFlow directly, we'll use <a href="https://keras.io">Keras</a>, a high-level interface to TF that is sufficiently powerful for the kind of neural nets we want to build.

## Install prerequisites

Type 

`pip install keras`

to install Keras. Test it out:

In [None]:
import keras

If you have issues, follow the instructions below to install in a virtual environment.

### Configure a virtual environment

Ignore this part if `keras` is installed.

The `virtualenv` virtual environment basically encapsulates a full isolated Python environment. Install virtualenv thus:

    pip install virtualenv

Navigate to your home directory `cd ~` and create a virtual environment. We'll call it `kerasenv`

    virtualenv kerasenv

Now, to switch your shell environment to be within the env:

    source kerasenv/bin/activate
    
Great: now you can install keras and tensorflow.

    pip install tensorflow keras 

## Time to build a neural network!
First let's import some prerequisites

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (7,7) # Make the figures a bit bigger

from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils
from keras import regularizers
from keras import optimizers

## Load training data

First, we load the image data. Keras will download it and parse it for you.

In [None]:
C = 10

# the data, shuffled and split between tran and test sets
(trainX, trainy), (testX, testy) = cifar10.load_data()
print("trainX original shape", trainX.shape)
print("trainy original shape", trainy.shape)

Let's look at some examples of the training data

In [None]:
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.axis('off')
    plt.imshow(trainX[i], interpolation='none')
    plt.title("Class {}".format(y_train[i]))

Our neural-network is going to take a single vector for each training example, so we need to reshape the input so that each image becomes a single 32x32x3 dimensional vector. 

In [None]:
trainX = trainX.reshape(trainX.shape[0], -1)
testX = testX.reshape(testX.shape[0], -1)
print("Training matrix shape", trainX.shape)
print("Testing matrix shape", testX.shape)

Modify the target matrices to be in the one-hot format, i.e.

```
0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0]
1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0]
2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0]
etc.
```

This is a common representation for many neural net packages. Each label becomes a vector in 10 dimensions (since 10 is the number of classes), with a 1 on the index corresponding to the label and zero everywhere else.

The one-hot representation is nice for two reasons: it matches the shape of the output layer, and it's flexible enough to accommodate a *distribution* over labels for a data point rather than a single label -- a departure from the paradigms we've used in class!

While our images only have one label each, you can imagine problems where multiple labels per datapoint are given.

**Exercise**: What is an example of a dataset and task where each data-point may have more than one label?

In [None]:
trainy_onehot = keras.utils.to_categorical(trainy, num_classes=10)  # convert to one-hot
testy_onehot = keras.utils.to_categorical(testy, num_classes=10)  

print('First three labels of training')
print(trainy[:3])  # show first three labels of training set
print('First three labels of training in one-hot representation')
print(trainy_onehot[:3, :])

## Linear Classification with Keras

The code below implements multi-class logistic (softmax) regression. In other works, this is only the top layer of your PS5 network, with no hidden layer. This means it's a linear classifier (albeit with 10 hyperplanes, each represented by a neuron).

The training method is `fit` (just like in `scikit-learn`). 

In [None]:
def nn_softmax_model(eta, epochs, batch_size):
    model = Sequential()   
    model.add(Dense(10, 
                activation='softmax', 
                input_dim=trainX.shape[1]))  # the dim of the data that feeds to this layer
                
    sgd = optimizers.SGD(lr=eta)   # lr is the learning rate (eta)
    model.compile(optimizer=sgd,
              loss='categorical_crossentropy',  # data loss function at output
              metrics=['accuracy'])  # how to evaluate correctness

    # Train the model using a given batch size and some number of epochs
    model.fit(trainX, trainy_onehot, epochs=epochs, batch_size=batch_size, verbose = 0) 
    # return the trained model
    return model

softmax_classifier = nn_softmax_model(0.001, 10, 2000)

Evaluate this trained model on the test data, which produces a tuple consisting of the loss on the test set and the accuracy.

In [None]:
test_loss, test_accuracy = softmax_classifier.evaluate(testX, testy_onehot)
print 'Accuracy on test set:', test_accuracy

## Single-Hidden Layer Model

Now let's replicate your PS5 model (a hidden layer with rectified linear units) using Keras. We have a bit more flexibility with Keras than your model in that we can specify different alpha values for the regularizer at each layer, but in this example, we'll use the same alpha for all layers.

The layers must be added in order.

In [None]:
def nn_hidden1_model(H, eta, epochs, batch_size):
    model = Sequential()  

    # hidden relu layer
    model.add(Dense(H,   # number of neurons H in this hidden layer
                activation='relu', 
                name='hidden',  # can name layers for our convenience
                input_dim=trainX.shape[1]))  # dim of the data that feeds to this layer
                
    # output softmax layer
    model.add(Dense(10, 
                activation='softmax'))                

    sgd = optimizers.SGD(lr=eta)    
    model.compile(optimizer=sgd,
              loss='categorical_crossentropy',  # data loss function at output
              metrics=['accuracy'])  # how to evaluate correctness

    # Train the model using a given batch size and some number of epochs
    model.fit(trainX, trainy_onehot, epochs=epochs, batch_size=batch_size, verbose=0)  
    
    return model

hidden1_classifier = nn_hidden1_model(100, 0.001, 10, 2000)

Evaluate this new model on the test data...

In [None]:
test_loss, test_accuracy = hidden1_classifier.evaluate(testX, testy_onehot)
print 'Accuracy on test set:', test_accuracy

## Two-Hidden-Layer Network with Dropout

Now we'll do a 3 layer fully connected network. 

With three layers and so many neurons, even L2 regularization isn't enough to prevent overfitting. Instead, we'll use a technique called "dropout" for regularization,
which freezes a random set of neurons on each gradient descent step, preventing them from being updated. It turns out this simple technique is an extremely effective form of regularization, so much so that we can even do away with the L2 regularizer.

In [None]:
def nn_hidden2_model(H1, H2, eta, epochs, batch_size):
    model = Sequential()  

    # hidden relu layer
    model.add(Dense(H1,   # number of neurons H in this hidden layer
                activation='relu', 
                name='hidden1',  # can name layers for our convenience
                input_dim=trainX.shape[1]))  # dim of the data that feeds to this layer
    model.add(Dropout(0.2))
    
    # hidden relu layer
    model.add(Dense(H2,   # number of neurons H in this hidden layer
                activation='relu', 
               name='hidden2'))  # can name layers for our convenience
    model.add(Dropout(0.2)) 
                
    # output softmax layer
    model.add(Dense(10, 
                activation='softmax'))                
    
    sgd = optimizers.SGD(lr=eta)    
    model.compile(optimizer=sgd,
              loss='categorical_crossentropy',  # data loss function at output
              metrics=['accuracy'])  # how to evaluate correctness

    # Train the model using a given batch size and some number of epochs
    model.fit(trainX, trainy_onehot, epochs=epochs, batch_size=batch_size, verbose=0)  
    
    return model

hidden2_classifier = nn_hidden2_model(100, 100, 0.001, 10, 2000)

Evaluate this new model on the test data...

In [None]:
test_loss, test_accuracy = hidden2_classifier.evaluate(testX, testy_onehot)
print 'Accuracy on test set:', test_accuracy

## Convolutional Neural Network

Here is a convolutional network for the task.

In [None]:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D

batch_size = 32
num_classes = 10
epochs = 200
data_augmentation = True

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

sgd = optimizers.SGD(lr=eta)  

model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(trainX, trainy_onehot,
              batch_size=batch_size,
              epochs=epochs,
              shuffle=True)

test_loss, test_accuracy = model.evaluate(testX, testy_onehot)
print 'Accuracy on test set:', test_accuracy