## Our first Neural Network

In [1]:
import numpy as np
import warnings; warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
from tensorflow.keras import models, layers, utils, optimizers, losses, metrics

ImportError: No module named 'tensorflow'

### Data

We start with a task we are familiar with -- classifying grayscale images of handwritten digits into 10 classes (0 to 9). We will use the MNIST dataset which includes 60,000 training images and 10,000 test 
images, each of size 28x28 pixels.

**Loading the data**

In [None]:
from tensorflow.keras.datasets import mnist
(Xtrain, Ytrain), (Xtest, Ytest) = mnist.load_data()

Let's take a look at the data

In [None]:
# shape
print('train data size is ', Xtrain.shape)
print('test  data size is ', Xtest.shape)

# max/min vals
print('max, min valuse in the train data are ', str(Xtrain.min()) + ', ' + str(Xtrain.max()))

Label values

In [None]:
print('The labels in the data are ' + str(np.unique(Ytrain)) + '\n')

Label distribution

In [None]:
[print('label ' + str(u) + ' appears ' + str(len(np.where(Ytrain==u)[0])) +' times in the training set, and ' + str(len(np.where(Ytest==u)[0])) +' times in the test set') for u in np.unique(Ytrain)];

Plotting some samples of the data

In [None]:
fig, axes = plt.subplots(3, 8, figsize=(18, 7), subplot_kw={'xticks':[], 'yticks':[]}, gridspec_kw=dict(hspace=0.3, wspace=0.1))
for n, ax in enumerate(axes.flat):
    ax.imshow(Xtrain[n], cmap='gist_yarg')

**Data preparation**

We reshape the data, and rescale it so that all its values are in the `[0, 1]` interval.

In [None]:
Xtrain = Xtrain.reshape((60000, 784))/255.0
Xtest = Xtest.reshape((10000, 784))/255.0

We One-Hot encode the labels as categorical (we'll discuss it later)

In [None]:
def OneHot(labels, dim= 10):
    OneHotLabels = np.zeros((len(labels), dim),dtype=int)
    for n, label in enumerate(labels):
        OneHotLabels[n, label] = 1  
    return OneHotLabels

In [None]:
Ytrain_1hot = OneHot(Ytrain)
Ytest_1hot  = OneHot(Ytest)

### Training a network

**Network architecture**

Defining our neural network

In [None]:
# select a model 
network = models.Sequential()

# add a single (hidden) fully connected layer
Layer1Activation = 'relu'
network.add(layers.Dense(16, activation=Layer1Activation, input_shape=(28 * 28,)))

# add an output layer
Layer2Activation = 'softmax'
network.add(layers.Dense(10, activation=Layer2Activation))

In [None]:
network.summary()

**Network training**

In [None]:
# select an optimizer
rms = optimizers.RMSprop(lr=0.0005, rho=.9, decay=0)

# select a loss
loss = losses.categorical_crossentropy

# select metrics to monitor during training
metrics = ['accuracy']

# compile the network
network.compile(optimizer=rms, loss=loss, metrics=metrics)

# train the network
network.fit(Xtrain, Ytrain_1hot, epochs=10, batch_size=128);

**Performance evaluation**

In [None]:
_, accuracy = network.evaluate(Xtrain, Ytrain_1hot, verbose=0)
print('train accuracy =', accuracy)

In [None]:
_, accuracy  = network.evaluate(Xtest, Ytest_1hot, verbose=0)
print('test accuracy =', accuracy)

#### Exercise

1. Repeat the above for the following number of neurons in the hidden layer: 64/128/512. Discuss the effect of this change. 
2. Train the above network with 128 units in the hidden layer for 5 epochs and for 20 epochs. Discuss the effect of this change.

#### Solution