# The MNIST Dataset

Remember that you can connect this notebook to a Google drive using this piece of code:

```python
from google.colab import drive
drive.mount('/content/drive')
```

Then you cam `import os` and use `os.listdir()`, `os.mkdir()` and `os.chdir()` to point the notebook to the directory you want ([tutorial](https://www.geeksforgeeks.org/os-module-python-examples/)).

---

## 1. Theory

Make sure you understand the second video of 3Blue1Brown's introduction to neural nets, and ask questions if there's anything unclear.

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('IHZwWFHWa-w', width=853, height=480) # 3Blue1Brown 2

---


## 2. Hello world

Let's import TensorFlow.

In [None]:
import tensorflow as tf

Let's do some preprocessing first.

In [None]:
# MNIST

# load
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# preprocess
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

train_labels_one_hot = tf.keras.utils.to_categorical(train_labels)
test_labels_one_hot = tf.keras.utils.to_categorical(test_labels)

Our labels are now encoded as one-hot vectors (a one at the index of the correct class).

To see that, try and print out the label at the same index in `train_labels` and `train_labels_one_hot`.

In [None]:
# build
model = tf.keras.models.Sequential()
model.add(tf.keras.Input((28 * 28,)))
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy', 
    metrics=['accuracy']
)

How does our 'raw' model perform on our test set?

Apply the network's `.evaluate()` method to `test_images` and `test_labels_one_hot`, collect the result into `test_loss` and `test_acc`, and see how well your network performs.

Now we can train the model.

In [None]:
# train
model.fit(
    train_images,
    train_labels_one_hot,
    epochs=5,
    batch_size=128
)

Now apply the network's `.evaluate()` method again to `test_images` and `test_labels_one_hot`. Any change?

## 3. Research

- Add a text cell and comment on network training and test accuracy;
- Train for 20 epochs and evaluate. Comment on your findings;
- The first layer transforms the 784-element image vector to a 512 dimensional intermediate representation:  
  - Experiment with different intermediate dimensions; 
  - Make a markdown table of network performance on the test set for varying intermediate dimension. 
  - Comment on your results;
- Replace network compilation with:
```
from tensorflow.keras import optimizers
model.compile(
    optimizer=optimizers.RMSprop(learning_rate=0.001, momentum=0.0),
    loss='categorical_crossentropy', 
    metrics=['accuracy']
)
```
The code is exactly equivalent, but we are now able to adjust learning rate and momentum. `learning_rate=0.001` is the default value: experiment with different learning rates. Tabulate your results and interpret.
- Experiment with different momentums. Tabulate and interpret.

### Two playgrounds to keep in mind

- [TensorFlow playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.29184&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)
- [Why Momentum Really Works](https://distill.pub/2017/momentum/)

---

## 4. Extra: use your model

Can you select an image from the training set, display it using `plt.imshow()` (you need to do `import matplotlib as plt` before that)?

Can you then use the same image (don't forget it should be of shape [1, 784], the 1 being the batch size), pass it to `model.predict()`, collect the predictions, and find which class was predicted using the `.argmax()` method? Note again the shape of the prediction tensor: it contains a batch dimension at the front!

Try with different images, or even take a picture of a number you've drawn, open it in Python, resize it to 28x28 (using pillow for instance), and try the network on it!.

---

## 5. Saving/Reloading

You can save and reload your model like so:

In [None]:
model.save("dense.mnist.keras")

In [None]:
network_reloaded = tf.keras.models.load_model("dense.mnist.keras")

It is possible to call `fit()` on the loaded model and continue training.

In [None]:
network_reloaded.fit(train_images, train_labels_one_hot, epochs=5, batch_size=128)

---

## Reminder

Check out the numerical dojo if you haven't done so:
- #### [`1.first-steps-tensorflow.ipynb`](https://github.com/jchwenger/AI/blob/main/labs/1-lab/1.first-steps-tensorflow.ipynb)
- #### [`1.first-steps-tensorflow.QUIZ.ipynb`](https://github.com/jchwenger/AI/blob/main/labs/1-lab/1.first-steps-tensorflow.QUIZ.ipynb)