1. Run MNIST
- Add a text cell and comment on network training and test accuracy
- Train for 20 epochs and evaluate. Comment on your findings
- The first layer transforms the 784-element image vector to a 512 dimensional intermediate representation Experiment with different intermediate dimensions. Make a markdown table of network performance on the test set for varying intermediate dimension. Comment on your results
- Replace network compilation with 
```
from tensorflow.keras import optimizers
network.compile(optimizer=optimizers.RMSprop(lr=0.001, momentum=0.0),
                loss='categorical_crossentropy', 
                metrics=['accuracy'])
```
The code is exactly equivalent, but we are now able to adjust learning rate and momentum. `lr=0.001` is the default value: experiment with different learning rates. Tabulate your results and interpret
- Experiment with different momentums. Tabulate and interpret

In [1]:
# MNIST

# load
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

print('tensor shape')
print('\ttraining images:', train_images.shape)
print('\ttraining labels:', train_labels.shape)
print('\ttest images:', test_images.shape)
print('\ttest labels:', test_labels.shape)


tensor shape
	training images: (60000, 28, 28)
	training labels: (60000,)
	test images: (10000, 28, 28)
	test labels: (10000,)


### Preprocess

1. Reshape to flatten 28x28 array to a vector containing 784 elements 
2. Cast vector as floats 
3. Rescale from [0,1]

In [2]:
# preprocess

# reshape flattens 28x28 array to a vector of 784 elements
train_images = train_images.reshape((60000, 28 * 28))
test_images = test_images.reshape((10000, 28 * 28))


# cast as floats and rescale from [0,1]
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255



### Hot Encoding
Network requires categorically encoded labels

encode with the ```to_categorical``` function

In [3]:
# encode with the to_categorical function
from tensorflow.keras.utils import to_categorical

orig_label = test_labels[0]
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

print('\'', orig_label, '\'', 'as one-hot vector:\t', test_labels[0], sep='')

'7'as one-hot vector:	[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


## Buliding Network

Sequential: model is a series of transformational layers. Data moves in a single forward direction through the network. A fit forward NN.

Each layer has specific attributes which we'll investigate later

2nd ```softmax``` layer outputs a vector whose elements form a **probability distribution**
- numbers are nonnegative and sum to one => prob distribution
- outputs are interperted as probs of membership of each class
    - the prob that the input sample is label 0, 1, 2 etc.

In [4]:
# build
from tensorflow.keras import models, layers

# create empty network
network = models.Sequential()

# add 2 layers
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28, )))
network.add(layers.Dense(10, activation='softmax'))




## Set up For Traning
Must specify:
1. **A loss function**: quantifies how farr off network prediction is from target 
    - ```categorical_crossentropy```: preferred loss for single label multiclass problems
2. **Optimizer**: makes parameter adjustments in the training loop
    - ```rmsprop```
3. One or more **training metrics**: report on progress
    - ```accuracy```: fraction of correctly classified samples



In [5]:
network.compile(optimizer='rmsprop',
               loss='categorical_crossentropy', 
               metrics=['accuracy'])


## Train Network
Optimizer (rmsprop) tweaks layer parameters, weights, and biases.  
Sliders are adjusted to attempt to lower the loss  

Have to decide on:
1. **Mini-Batch Size**: Number of samples processed in a single pass of the algorithm
2. **Number of Epochs**: Number of complete passes through entire training set

Suppose a training set has 32,768 samples. How many samples are processed training by the command:  
```network.fit(train_images, train_labels, epochs=5, batch_size=128)```?  
*32768 x 5*

Suppose a training set has 32,768 samples. How many mini-batches are passed through the network by the command:   
```network.fit(train_images, train_labels, epochs=5, batch_size=128)```?  
*256 x 5*



2345 Gradient updates = 60000/128 (batch size) * 5 epochs

In [6]:

# train
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x295703b65c0>

## Test Network
NN generally perform less well on new data  

In [7]:
# evaluate on the test set
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [8]:
# network prediction for a single sample
# ten numbers, each a probability of class membership
network.predict(test_images[:1])



array([[3.0968488e-09, 5.2965782e-10, 7.4819457e-07, 8.3084633e-06,
        3.9866253e-12, 3.3456447e-09, 9.7639360e-15, 9.9998999e-01,
        1.7995653e-07, 7.7392184e-07]], dtype=float32)

### What is the most probable Class?

In [9]:
import numpy as np
np.argmax(network.predict(test_images[:1]))



7

In [10]:
print(test_labels[0])

[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


In [None]:
import matplotlib.pyplot as plt
#breaks code
# plt.imshow(test_images[0], cmap=plt.cm.binary)
#plt.show()