# [LEGALST-190] Lab 4/3: Neural Nets

In this lab, we'll learn how to implement neural net methods to pattern recognize handwriting images.


*Estimated Time: 30-40 minutes*

---
### Table of Contents

[The Data](#section data)<br>

1 - [Visualizing Data](#section 1)<br>

2 - [Neural Network](#section 2)<br>

3 - [Multi-Layer Perceptrons](#section 3)<br>

4 - [A Simple MLP](#section 4)<br>

5 - [Convolutional Nerual Networks](#section 5)<br>

**Dependencies:**

In [1]:
!pip install tensorflow keras

Collecting tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/f6/7b/5ab19bac13810661ff9e2485811553f1dea9a5d611ce846345aed4766bbe/tensorflow-1.13.1-cp36-cp36m-macosx_10_11_x86_64.whl (71.6MB)
[K    100% |████████████████████████████████| 71.6MB 251kB/s eta 0:00:01    63% |████████████████████▏           | 45.2MB 18.5MB/s eta 0:00:02
[?25hCollecting keras
[?25l  Downloading https://files.pythonhosted.org/packages/5e/10/aa32dad071ce52b5502266b5c659451cfd6ffcbf14e6c8c4f16c0ff5aaab/Keras-2.2.4-py2.py3-none-any.whl (312kB)
[K    100% |████████████████████████████████| 317kB 8.1MB/s eta 0:00:01
[?25hCollecting protobuf>=3.6.1 (from tensorflow)
[?25l  Downloading https://files.pythonhosted.org/packages/46/1e/c481d59760cded074d89ff51c99381708111c550ff698934cc296d27df2c/protobuf-3.7.1-cp36-cp36m-macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (1.3MB)
[K    100% |████████████████████████████████| 1.3MB 10.4MB/s ta 0:00:01
[?25hCollecting

---

## The Data <a id='data'></a>

In this notebook, you'll be working with the MNIST handwriting dataset, considered "hello, world" of object recognition in machine learning. It contains images of handwritten digits centered and normalized. Modified NIST (National Institute of Standards and Technology) is constructed from scanned documents available from NIST. 

---

## Visualizing Data  <a id='section 1'></a>

The Keras deep learning library provides a convenience method for loading the MNIST dataset.

In [None]:
from keras.datasets import mnist
import matplotlib.pyplot as plt

# load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Using TensorFlow backend.


ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

SystemError: <class '_frozen_importlib._ModuleLockManager'> returned a result with an error set

ImportError: numpy.core._multiarray_umath failed to import

ImportError: numpy.core.umath failed to import

As you can see above, the dataset is split into a set to train our model and one to test it. X_train and X_test are inputs while y_train and y_test are outputs.

Let's visualize MNIST dataset by running the cell below.

In [None]:
# plot 4 images as gray scale
plt.subplot(221)
plt.imshow(X_train[0], cmap=plt.get_cmap('gray'))
plt.subplot(222)
plt.imshow(X_train[1], cmap=plt.get_cmap('gray'))
plt.subplot(223)
plt.imshow(X_train[2], cmap=plt.get_cmap('gray'))
plt.subplot(224)
plt.imshow(X_train[3], cmap=plt.get_cmap('gray'))
# show the plot
plt.show()

---

## Neural Network <a id='section 2'></a>

Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns.

Neural networks help us cluster and classify. They group unlabeled data according to similarities among the example inputs.

Classification problems depend on labeled datasets, i.e. humans need to label the data for a neural to learn the correlation between labels and data. This is <i>supervised learning</i>. Below, we'll dive into MLP, which utilizes a supervised learning technique.

## Multi-Layer Perceptrons <a id='section 3'></a>

A multi-layer perceptron (MLP) is a class of neural network that consists of at least three layers of nodes (first layer being inputs and last layer being outputs). Except for the input layer, activation of nodes of a certain layer depends on which nodes are activated in the previous layer. Each node is a neuron that uses a nonlinear activation function. Below is an image that represents a MLP.

<img src="https://www.safaribooksonline.com/library/view/getting-started-with/9781786468574/graphics/B05474_04_05.jpg" style="width: 500px;"/>

### MLP

A multilayer perceptron (MLP) is a deep, artificial neural network. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input, and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP

MLPs train on a set of input-output pairs and learn to model the correlation between those inputs and outputs by adjusting parameters to minimize error.

## A Simple MLP <a id='section 4'></a>

Let's try to build a simple MLP to identify digits.

### Input vs. Output

Each pixel of an image will be an input to our MLP.

<b>Question:</b> How many nodes would our MLP's first layer have for a 20x20 image?

<i>Your answer here</i>  <b>Answer: 400</b>

The output layer will produce what we are looking for.

<b>Question:</b> What will each node in the output layer represent?

<i>Your answer here</i> <b>Answer: Each node represents a digit from 0-9</b>

### Fixing MNIST

The training dataset is a 3-D array of instance, image width, image height. To make an image "inputable" to our MLP, we need to vectorize the representations of MNIST training dataset, i.e. we need to flatten a image into a vector.

In [None]:
num_pixels = X_train.shape[1] * X_train.shape[2]
num_pixels

In [None]:
# NumPy's reshape function can help flatten multi-dimensional arrays
# As all images are 28x28 pixels, we flatten all into vectors of length 784
X_train = X_train.reshape(X_train.shape[0], num_pixels)
X_test = X_test.reshape(X_test.shape[0], num_pixels)

The pixel values are grey scale, which ranges from 0-255. Neural network models work better with normalized inputs. Thus, normalize our training dataset below.

In [None]:
X_train = X_train / 255
X_test = X_test / 255

The outputs (y_train and y_test) are integers from 0 to 9 and we can think fo each integer as a class. We'll use a one-hot encoding of the class values, which transforms the output vector into a binary matrix.

<i>One-hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.</i>

In [None]:
# use built-in function in np_utils
from keras.utils import np_utils

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

Just to make sure that we still have 10 classes(0-9) after one-hot encoding.

In [None]:
num_classes = y_test.shape[1]
num_classes

### Create the Model

Keras has useful methods we can utilize to develop an MLP.

In [None]:
# importing the methods we need
from keras.models import Sequential
from keras.layers import Dense

Create an empty MLP, i.e. empty linear stack of layers

In [None]:
model = Sequential()

Create an input layer using the Dense function (refer to <a href="https://keras.io/layers/core/">this doc</a>), then add the layer to our model using `model.add(...)`. The input layer should use 'relu' activation, a 'normal' kernel_initializer, and the number of pixels for input_dim. 

In [None]:
"""
Dense(...) creates a regular densely connected layer. 
relu is a rectifier activation function.
"""
model.add(
    Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu')
)

Add in the output layer to our model.

In [None]:
"""
A softmax activation function is used on the output layer to turn the outputs into 
probability-like values
"""
model.add(
    Dense(num_classes, kernel_initializer='normal', activation='softmax')
)

Compile our model

In [None]:
"""
categorical_crossentropy is a logarithmic loss function and
adam is a gradient descent algorithm
"""
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Let's now fit our model on the training dataset. We fit our model over 10 epochs and update it every 200 images. It might take a few minutes.

In [None]:
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

Comparing the log of our training produced above, what can you conclude?

<i>Your Answer Here</i>

Let's evaluate our model on the test dataset.

In [None]:
scores = model.evaluate(X_test, y_test, verbose=0)
print("Error: %.2f%%" % (100-scores[1]*100))

## Convolutional Nerual Networks <a id='section 5'></a>

MLPs did give us pretty good accuracy. However, As each pixel of an image is an input, we will have too many inputs for a large image. Convolutional neural nets, on the other hand, can take in 3-D inputs (2D + color) instead of just 1-D.

### How a CNN works <a id='conv_image'></a>

<img src="https://www.mathworks.com/content/mathworks/www/en/discovery/convolutional-neural-network/jcr:content/mainParsys/image_copy.adapt.full.high.jpg/1517522275430.jpg" style="width: 600px;"/>

#### Filtering

ConvNets have "features" and match them with parts of an image rather than the whole thing. 

<img src="./imgs/match_pieces.png" style="width: 500px;"/>

In the example above, we have three features: left-to-right diagonal, central X, and right to left diagonal. To match features, we use a process called filtering.

1. Assign each black pixel -1 and each white pixel 1. 
2. Line up a feature with an image patch (for example, line up left-to-right diagonal with the green box above)
3. Multiply each feature pixel with the corresponding image patch pixel
4. Take the average of the products

<b>Question:</b> In the example above when we match the left-to-right diagonal feature with the green box, what would the process output?

<i>Your Answer Here</i> <b>Answer: 1</b>

#### Convolutional Layer
 
The process of filtering for every possible image patch with every feature.

<img src="./imgs/convolution.png" style="width: 500px;"/>

As you can see in the image above, after applying our left-to-right diagonal filter, we get higher scores on the left to right diagonal.

<b>Question:</b> If we apply the central X filter in this example, where will the highest score occur?

<i>Your Answer Here</i> <b>Answer: The very center of the entire image</b>

#### ReLU (Rectified Linear Units)

This process is applied on filtered images. It simply changes every negative value to 0 and leaves positive values unchanged.

#### Pooling Layer

In this layer, we shrink the filtered images by the following process:
1. Pick a window size and a stride size
2. Walk our window on the filtered image, each time shifting by the stride size
3. For each step, take the maximum score contained in the window

<img src="./imgs/pooling.png" style="width: 500px;"/>

This layer helps because it does not care about where in the window the maximum value occurs, i.e. it's less sensitive to specific positioning of pixels.

<b>Question:</b> How would the pooling layer help in classifying digits? 

<i>Your Answer Here</i> <b>Answer: Handwritten digits follow a general pattern but where each pixel lands can be undeterminstic. Matching parts of an image can tell us which general pattern occurs without caring too much about where it occurs.</b>

#### Fully Connected Layer

This layer flattens the pooled images and each value gets a vote, which is how strongly that value suggests a certain outcome.

One question you might have now is where do the initial features and voting weights come from. They are obtained by a process called <a href="https://brilliant.org/wiki/backpropagation/">backpropragation</a>, which we don't have enough time to explain here.

Now we've learned that the layers are combined together: <b>convolutional + ReLU + pooling</b> constitute the first part while <b>fully connected</b> the second. One note is that each part can be applied multiple times, as we can see in [the overview of a CNN at the beginning](#conv_image).

### Create a CNN

Keras also provides useful methods to create a CNN.

In [None]:
# importing dependencies
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_dim_ordering('th')

In Keras, the layers used for two-dimensional convolutions expect pixel values with the dimensions [pixels][width][height]. We'll set the pixels dimension to 1 because the pixel values in MNIST are gray scale. As a reminder, all images in MNIST are 28x28.

In [None]:
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to [samples][pixels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)

Below, fill out the blanks to normalize the datasets and one hot encode the output values.

In [None]:
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

Let's define a function that creates our CNN. Read through the comments to understand what each line does.

In [None]:
def CNN_model():
    # create model
    model = Sequential()
    
    # A convolutional layer that has 32 features of size 5x5
    model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation='relu'))
    
    # A pooling layer with a window size of 2x2
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))
    
    # A flatten layer
    model.add(Flatten())
    
    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    
    # An output layer with softmax as in MLP
    model.add(Dense(num_classes, activation='softmax'))
    
    # Compile model as before in MLP
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [None]:
model = CNN_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))

Now let's experiment with the parameters: number of features, size of features, size of the window in pooling, dropout percentage, and so on. Modify the parameters to your best judgment in the function below.

In [None]:
def diff_CNN_model():
    # create model
    model = Sequential()
    
    # A convolutional layer that has 32 features of size 5x5
    model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation='relu'))
    
    # A pooling layer with a window size of 2x2
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))
    
    # A flatten layer
    model.add(Flatten())
    
    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    
    # An output layer with softmax as in MLP
    model.add(Dense(num_classes, activation='softmax'))
    
    # Compile model as before in MLP
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [None]:
model = diff_CNN_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))

Do you get different accuracies for the two different models? Make some conclusions about different parameters.

<i>Your Answer Here</i>

### Bibliography

Image source:
- How Convolutional Neural Networks work, https://www.youtube.com/watch?v=FmpDIaiMIeA&t=1070s

Code source:
- Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras, https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/

---
Notebook developed by: Tian Qin

Data Science Modules: http://data.berkeley.edu/education/modules
