# Fashion MNIST: A Multi-Class Classification Problem
We will create a multi-class MLP network to solve a multi-class classification problem. Fashion MNIST is intended as a drop-in replacement for the classic MNIST dataset - a handwriting digit dataset often used as a "Hello World" dataset for machine learning. Fashion MNIST contains fashion item images, which turns out to be more challenging than MNIST.  

Fashion MNIST contains 60,000 training images and 10,000 test images, 28 x 28 pixels each, with 10 categories. 

<img src="w2-fashionMnist.png">


## 1. Load the dataset
Keras provides some utility functions to fetch and load some commonly used datasets, including Fashin MNIST. The `load_data()` method directly splits the training and test set. 

Since the class names are not included with the dataset, store them here to use later when plotting the images.

We will explore the format of the dataset, the data type of the input images, also display a few images to have a first impression of the dataset.

In [16]:
from keras.datasets import fashion_mnist # Pip install both keras and tensor flow in the venv
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

n_classes = 10
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Inspect data
print(f" There are {X_train.shape[0]} images which are {X_train.shape[1]} x {X_train.shape[2]} pixels. These are for training.")
print(f" We also have {y_train.shape[0]} labels for each image.")
print(f" An exampe of a label for the first image is {y_train[0]} which corresponds to {class_names[y_train[0]]}")

print(f" There are {X_test.shape[0]} images which are {X_test.shape[1]} x {X_test.shape[2]} pixels. These are for testing.")



 There are 60000 images which are 28 x 28 pixels. These are for training.
 We also have 60000 labels for each image.
 An exampe of a label for the first image is 9 which corresponds to Ankle boot
 There are 10000 images which are 28 x 28 pixels. These are for testing.


## 2. Prepare the data
Since pixel values in an image are in the same range [0, 255], we don't need to standarize or normalize the input data as what we did for the Indian Diebetes dataset. The only thing we are suppose to do for this dataset is to scale the pixel values down to the [0,1] range by simply dividing them by 255.0 (this also converts them to floats). 

In [23]:
# For each row of data, 
X_train = X_train.astype("float32") / 255.0
X_test  = X_test.astype("float32") / 255.0

# Verify this worked
print(f"After rescaling, an examlpe of training data X-axis pixes are: {X_train[5][0]}")


After rescaling, an examlpe of training data X-axis pixes are: [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.4263261e-17
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 3.1379167e-16
 1.2551667e-15 2.6814924e-15 2.4532804e-15 1.8827503e-15 1.7829074e-15
 2.0111196e-15 2.8383883e-15 2.0396460e-15 1.2836934e-16 0.0000000e+00
 0.0000000e+00 0.0000000e+00 1.4263261e-17 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00]


## 3. Build your network
Similar to the previous network you have created, you first create a `sequential` model, then add `Dense` layers one by one. The only difference here is that you need add a `Flatten` layer before the first `Dense` layer. The `Flatten` layer is to convert the 2-D image (28 x 28) into a 1-D array (784 x 1). This layer does not have any parameters, as it is just there to do simple preprocessing.

For the output layer, its node number would be the class number, the activation function for a multi-class problem is typically `softmax`.

In [None]:
from keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Create a model
model = Sequential()

# 1st layer: FLATTEN the image. 784 nodes.
model.add(Flatten(input_shape=(28, 28)))

# 2nd layer: Dense + ReLU. 256 nodes. So here we have (784+1) x 256 weights to calculate between 1st and 2nd layer
model.add(Dense(256, activation='relu'))

# 3rd layer: Dense + ReLU. 128 nodes. So we have (256+1) x 128 weights to claculate between 2nd and 3rd layer
model.add(Dense(128, activation='relu'))

# Output layer. 10 nodes, 1 for each class. So we have (128+1) x 10 weights to calculate between 3rd and output layer
model.add(Dense(10, activation='softmax'))

# Bcause of the softmax the output is going to be something like 
# [0.02, 0.01, 0.85, 0.03, 0.01, 0.02, 0.01, 0.01, 0.02, 0.02]

model.summary()

  super().__init__(**kwargs)


## 4. Compile the model
The typical loss function for a multi-class problem is the multi-class cross-entropy loss function. In Keras, there are two options. One is to use the `sparse_categorical_crossentropy` loss with the original sparse labels (i.e., for each image, there is just one actual class index, from 0 to 9 in this case). The other is to use `categorical_crossentropy` loss if the actual output is a one-hot vector (e.g., [0, 0, 1, 0, ...., 0]). In this case, we will need to first convert the current sparse label (i.e., class index) to one-hot vecore labels by using `keras.utils.to_categorical()` method.

In [30]:
# Remember that in y_train, we have labels like y_train[0] =2. So with the sparse categorical cross entropy,
# Keras internally, will take the softmax output, and look at the probability of class 2 and it will compute the cross entropy loss. 

model.compile(
    loss = "sparse_categorical_crossentropy", 
    optimizer = 'adam', # This is not SGD, or Momentum, it increases learning rate when slope is reliable and slows it down when noisy. THIS IS ADAPTIVE LEARNING RATE
    metrics = ['accuracy'] # accuracy = (number of correct predictions) / (total predictions)
)


## 5. Train and validate the model
We use a validation set to moniter your model. We also draw the learning curve on the training and validation sets, to see how your model is learnt and how it generalises to new data, then try to adjust our model and add any regularization techniques accordingly till we are satisfied.

In [33]:
# Firstly, lets create a checkpoint where we store the best model obtained during training.
# We watch validation accurace and save the model when it improves. 
from keras.callbacks import ModelCheckpoint

checkpoint = ModelCheckpoint(
    'fashion_mnist_best.h5', # Where to save the mode
    monitor='val_accuracy', # Which variable to measure
    save_best_only=True, # Only save when the variable improves
    mode='max', # Success is when the metric increases
    verbose=1 # Display messages 
)

callbacks = [checkpoint] # Store each checkpoint in a list

# Now we train the model

print('Starting training...')
# train the model, store the results for plotting
history = model.fit(
    X_train, y_train,
    validation_split=0.1,   # 10% of training used as validation
    epochs=100, # Epoch  = one full pass through the training dataset. `too few = underfitting, too many = overfitting`
    batch_size=100, # Insead of going throuigh 60000 images, we group them in groups of 100 (i.e. 600 groups) so for 1 epoch, the model will update its weights 600 times
    callbacks=callbacks,
    verbose = "auto"
)

Starting training...
Epoch 1/100
[1m506/540[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 1ms/step - accuracy: 0.1020 - loss: 2.3026
Epoch 1: val_accuracy improved from None to 0.09250, saving model to fashion_mnist_best.h5





Epoch 1: finished saving model to fashion_mnist_best.h5
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0988 - loss: 2.3027 - val_accuracy: 0.0925 - val_loss: 2.3029
Epoch 2/100
[1m518/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 1ms/step - accuracy: 0.1020 - loss: 2.3026
Epoch 2: val_accuracy improved from 0.09250 to 0.09417, saving model to fashion_mnist_best.h5





Epoch 2: finished saving model to fashion_mnist_best.h5
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.1004 - loss: 2.3027 - val_accuracy: 0.0942 - val_loss: 2.3028
Epoch 3/100
[1m510/540[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 1ms/step - accuracy: 0.1014 - loss: 2.3026
Epoch 3: val_accuracy did not improve from 0.09417
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0988 - loss: 2.3027 - val_accuracy: 0.0925 - val_loss: 2.3027
Epoch 4/100
[1m509/540[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 1ms/step - accuracy: 0.0994 - loss: 2.3027
Epoch 4: val_accuracy did not improve from 0.09417
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0992 - loss: 2.3027 - val_accuracy: 0.0942 - val_loss: 2.3028
Epoch 5/100
[1m521/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 2ms/step - accuracy: 0.1002 - loss: 2.3026
Epoch 5: val_accuracy did




Epoch 8: finished saving model to fashion_mnist_best.h5
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0998 - loss: 2.3027 - val_accuracy: 0.0973 - val_loss: 2.3027
Epoch 9/100
[1m532/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 1ms/step - accuracy: 0.0960 - loss: 2.3027
Epoch 9: val_accuracy did not improve from 0.09733
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0981 - loss: 2.3027 - val_accuracy: 0.0925 - val_loss: 2.3029
Epoch 10/100
[1m499/540[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 1ms/step - accuracy: 0.0977 - loss: 2.3027
Epoch 10: val_accuracy improved from 0.09733 to 0.09850, saving model to fashion_mnist_best.h5





Epoch 10: finished saving model to fashion_mnist_best.h5
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0982 - loss: 2.3027 - val_accuracy: 0.0985 - val_loss: 2.3028
Epoch 11/100
[1m514/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 1ms/step - accuracy: 0.1026 - loss: 2.3026
Epoch 11: val_accuracy improved from 0.09850 to 0.10033, saving model to fashion_mnist_best.h5





Epoch 11: finished saving model to fashion_mnist_best.h5
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0997 - loss: 2.3027 - val_accuracy: 0.1003 - val_loss: 2.3028
Epoch 12/100
[1m529/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 2ms/step - accuracy: 0.1011 - loss: 2.3026
Epoch 12: val_accuracy did not improve from 0.10033
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0989 - loss: 2.3027 - val_accuracy: 0.0942 - val_loss: 2.3029
Epoch 13/100
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.0985 - loss: 2.3027
Epoch 13: val_accuracy did not improve from 0.10033
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0985 - loss: 2.3027 - val_accuracy: 0.0925 - val_loss: 2.3029
Epoch 14/100
[1m528/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 1ms/step - accuracy: 0.1001 - loss: 2.3026
Epoch 14: val_accur




Epoch 17: finished saving model to fashion_mnist_best.h5
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0982 - loss: 2.3027 - val_accuracy: 0.1032 - val_loss: 2.3029
Epoch 18/100
[1m512/540[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 1ms/step - accuracy: 0.0990 - loss: 2.3027
Epoch 18: val_accuracy did not improve from 0.10317
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.0976 - loss: 2.3027 - val_accuracy: 0.0925 - val_loss: 2.3029
Epoch 19/100
[1m511/540[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 1ms/step - accuracy: 0.0977 - loss: 2.3027
Epoch 19: val_accuracy did not improve from 0.10317
[1m540/540[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.0978 - loss: 2.3027 - val_accuracy: 0.0925 - val_loss: 2.3029
Epoch 20/100
[1m513/540[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 1ms/step - accuracy: 0.1042 - loss: 2.3027
Epoch 20: val_accur

## 6. Evaluate the model
First evaluate our model on the test set to report the accuracy on the test set. Then use the `model`'s `predict()` method to make predictions on new instances. Display a few images and compare their predicting classes with their actual classes.

In [6]:
# Add your code here
