<a href="https://colab.research.google.com/github/munich-ml/MLPy2021/blob/main/31_fMNIST_classifier_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

## References
Resources used to create this notebook:
- [scikit-learn website](https://scikit-learn.org)
- [Matplotlib website](https://matplotlib.org/)
- [Wikipedia](https://en.wikipedia.org/wiki/Main_Page)
- Hands-on Machine Learning with Scikit-learn, Keras & TensorFlow, Aurelien Geron, [Book on Amazon](https://www.amazon.de/Aur%C3%A9lien-G%C3%A9ron/dp/1492032646/ref=sr_1_3?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=Hands-on+Machine+Learning+with+Scikit-learn%2C+Keras+%26+TensorFlow%2C+Aurelien+Geron%2C&qid=1589875241&sr=8-3)
- Introduction to Machine Learning with Python, Andreas Mueller, [Book on Amazon](https://www.amazon.de/Introduction-Machine-Learning-Python-Scientists/dp/1449369413)


## Setup

First, do the common imports.

Tensorflow must be 2.x, because there are major changes from 1.x

In [None]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Common imports
import os
import numpy as np
import pandas as pd

# to make this notebook's output stable across runs
np.random.seed(42)

# Setup matplotlib
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

# TensorFlow ≥2.0 is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

in_colab = 'google.colab' in sys.modules   # check if note is executed within Colab

# Forces tensorflow version (only in colab)
if in_colab:
    %tensorflow_version 2.x           

# Clone the repository if executed in Google Colab
if in_colab:  
    if "MLPy2021" in os.listdir():
        !git -C MLPy2021 pull
    else:
        !git clone https://github.com/munich-ml/MLPy2021/

# lib.helper_funcs.py. The import path depends on Colab or local execution 
if in_colab:
    from MLPy2021.lib.helper_funcs import pickle_out
else: 
    from lib.helper_funcs import pickle_out


# Get the data

**MNIST** is probably **THE classical dataset for image recognition**. 

A more challanging dataset is the **[fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)** from Zalando.

``tf.keras`` already includes [fashion MNIST](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/fashion_mnist/load_data) and some other popular datasets in `keras.datasets`. 

The fashion MNIST dataset is already split into a training set and a test set, but it can be useful to split the training set further to have a validation set:


In [None]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

From the [dataset documentation](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/fashion_mnist/load_data) we know, that the **labels** are **class IDs** that correspond to the following **`class_names`**:

In [None]:
class_names = ["T-shirt", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

In [None]:
y_train_full[0]

In [None]:
class_names[y_train_full[0]]

Plot part of the dataset to get an overview


In [None]:
# code for viewing, not for teaching
n_rows, n_cols = 5, 12
plt.figure(figsize=(n_cols*1.2, n_rows*1.2))
for row in range(n_rows):
    for col in range(n_cols):
        index = n_cols * row + col
        plt.subplot(n_rows, n_cols, index+1)
        plt.imshow(X_train_full[index], cmap=plt.cm.binary, interpolation="nearest")
        plt.title("{} {}".format(index, class_names[y_train_full[index]]), fontsize=12)
        plt.axis('off')
plt.tight_layout()

##Inpect the data


The training set contains 60,000 grayscale images, each 28x28 pixels:

In [None]:
X_train_full.shape

In [None]:
sample_img = X_train_full[100, :, :]
sample_img.shape

One may plot an image using Matplotlib's `imshow()` function::

In [None]:
plt.imshow(sample_img, cmap=plt.cm.binary);

Each pixel intensity is an 8bit interger value.
- 0 is white
- 255 is black

In [None]:
sample_img.dtype

In [None]:
pd.Series(sample_img.flatten()).value_counts().sort_index()

In [None]:
sample_img[:,:13]

## Scale the data

Since we are going to train the neural network using **Gradient Descent**, we must scale the input features. 

The ``SciKit-learn MinMaxScaler`` doesn't work for 2D-features. Therefore, let's implement a simple Max Scaler with ``SciKit-learn`` compliant interface:

In [None]:
from sklearn.base import TransformerMixin,BaseEstimator

class MaxScaler(BaseEstimator,TransformerMixin):

    def fit(self, X, y=None):
        self._scale = X.max()
        return self

    def transform(self, X):
        return X / self._scale

In [None]:
scaler = MaxScaler()
X_train_full = scaler.fit_transform(X_train_full)
X_test = scaler.transform(X_test)

In [None]:
X_train_full.max(), X_test.max()

##Split a validation set

Let's split the *full training set* into a *validation set* and a (smaller) *training set*. 

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, 
                                                      test_size=5000)

In [None]:
X_train.shape

In [None]:
X_valid.shape

In [None]:
y_train.shape

In [None]:
plt.figure(figsize=(12,3))
for i, label in enumerate(["y_train_full", "y_train", "y_valid"]):
    plt.subplot(1, 3, i+1)
    plt.plot(eval(label+"[:500]"), "d")
    plt.yticks(range(len(class_names)), labels=class_names)
    plt.title(label), plt.tight_layout();

# Build a model

The folloging  code creates a **classification MLP** (multi layer perceptron) with 2 hidden layers:

In [None]:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))

Alternatively to using `model.add` we can pass a list of layers to the `Sequential()` constructor

In [None]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(47)

In [None]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation=keras.activations.relu),
    keras.layers.Dense(100, activation=keras.activations.relu),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

In [None]:
keras.activations.relu([-2, -1, 0, 1, 2])

## Model summary

Let's print a summary of the model, using `model.summary()`


In [None]:
model.summary()

Note the huge amount of (trainable) parameters. For example, the first hidden layer has 784 * 300 connection weights, plus 300 bias terms. This gives the model quite a flexibility to fit the training data, but it's also prone to overfitting, especially with few traning data.

In [None]:
keras.utils.plot_model(model, show_shapes=True)

The layers of the model can be accessed with `model.layers`

In [None]:
for i, layer in enumerate(model.layers):
    print("layer {}: {}".format(i, layer.name))

In [None]:
weights, biases = model.layers[2].get_weights()

In [None]:
weights.shape

In [None]:
weights

In [None]:
biases.shape

In [None]:
biases

Note that the biases are initialized with zeros while the weights are initalized randomly. This is required to support convergence.

# Train the model

In [None]:
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])

In [None]:
model.compile?

This is equivalent to:

```
model.compile(loss=keras.losses.sparse_categorical_crossentropy,
              optimizer=keras.optimizers.SGD(),
              metrics=[keras.metrics.sparse_categorical_accuracy])
```

Some explanation to the `compile` params:
- The loss must be **sparse_**categorical_crossentropy because the labels are sparse, meaning just one value per training instance and not a tensor of len(10)
- The optimizer is a simple **Stochastic Gradient Descent**
- Since this is a **classifier**, it's useful to measure **accuracy** during training 

Now, let's kick-off training using `model.fit`

In [None]:
history = model.fit(X_train, y_train, epochs=30, verbose=1,
                    validation_data=(X_valid, y_valid))

In [None]:
history.params

In [None]:
history.history.keys()

In [None]:
plt.figure(figsize=[10, 3]) 
for i, word in enumerate(["loss", "accuracy"]):
    plt.subplot(1, 2, i+1)
    for key, vals in history.history.items():
        if word in key:
            plt.plot(vals, label=key)
    plt.grid(), plt.legend(), plt.title(word)

## Mount google drive

In [None]:
mount_dir = os.path.join(os.getcwd(), "drive")
mount_dir

In [None]:
from google.colab import drive
drive.mount(mount_dir)

## model.save()


In [None]:
save_dir = os.path.join(mount_dir, "My Drive", "Colab Notebooks", "models")
save_dir

In [None]:
os.path.isdir(save_dir)

In [None]:
fn = "fMNIST_NN_v1_ageron"
model.save(os.path.join(save_dir, fn + ".h5"))

### Save validation and test data along with the model

In [None]:
pickle_out(os.path.join(save_dir, fn+"_data.pkl"), locals(),
           X_valid, y_valid, X_test, y_test, class_names)