# Fashion MNIST: A Multi-Class Classification Problem
We will create a multi-class MLP network to solve a multi-class classification problem. Fashion MNIST is intended as a drop-in replacement for the classic MNIST dataset - a handwriting digit dataset often used as a "Hello World" dataset for machine learning. Fashion MNIST contains fashion item images, which turns out to be more challenging than MNIST.  

Fashion MNIST contains 60,000 training images and 10,000 test images, 28 x 28 pixels each, with 10 categories. 

<img src="w2-fashionMnist.png">


## 1. Load the dataset
Keras provides some utility functions to fetch and load some commonly used datasets, including Fashin MNIST. The `load_data()` method directly splits the training and test set. 

Since the class names are not included with the dataset, store them here to use later when plotting the images.

We will explore the format of the dataset, the data type of the input images, also display a few images to have a first impression of the dataset.

In [16]:
from keras.datasets import fashion_mnist # Pip install both keras and tensor flow in the venv
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

n_classes = 10
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Inspect data
print(f" There are {X_train.shape[0]} images which are {X_train.shape[1]} x {X_train.shape[2]} pixels. These are for training.")
print(f" We also have {y_train.shape[0]} labels for each image.")
print(f" An exampe of a label for the first image is {y_train[0]} which corresponds to {class_names[y_train[0]]}")

print(f" There are {X_test.shape[0]} images which are {X_test.shape[1]} x {X_test.shape[2]} pixels. These are for testing.")



 There are 60000 images which are 28 x 28 pixels. These are for training.
 We also have 60000 labels for each image.
 An exampe of a label for the first image is 9 which corresponds to Ankle boot
 There are 10000 images which are 28 x 28 pixels. These are for testing.


## 2. Prepare the data
Since pixel values in an image are in the same range [0, 255], we don't need to standarize or normalize the input data as what we did for the Indian Diebetes dataset. The only thing we are suppose to do for this dataset is to scale the pixel values down to the [0,1] range by simply dividing them by 255.0 (this also converts them to floats). 

In [23]:
# For each row of data, 
X_train = X_train.astype("float32") / 255.0
X_test  = X_test.astype("float32") / 255.0

# Verify this worked
print(f"After rescaling, an examlpe of training data X-axis pixes are: {X_train[5][0]}")


After rescaling, an examlpe of training data X-axis pixes are: [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.4263261e-17
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 3.1379167e-16
 1.2551667e-15 2.6814924e-15 2.4532804e-15 1.8827503e-15 1.7829074e-15
 2.0111196e-15 2.8383883e-15 2.0396460e-15 1.2836934e-16 0.0000000e+00
 0.0000000e+00 0.0000000e+00 1.4263261e-17 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00]


## 3. Build your network
Similar to the previous network you have created, you first create a `sequential` model, then add `Dense` layers one by one. The only difference here is that you need add a `Flatten` layer before the first `Dense` layer. The `Flatten` layer is to convert the 2-D image (28 x 28) into a 1-D array (784 x 1). This layer does not have any parameters, as it is just there to do simple preprocessing.

For the output layer, its node number would be the class number, the activation function for a multi-class problem is typically `softmax`.

In [28]:
from keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Create a model
model = Sequential()

# 1st layer: FLATTEN the image. 784 nodes.
model.add(Flatten(input_shape=(28, 28)))

# 2nd layer: Dense + ReLU. 256 nodes
model.add(Dense(256, activation='relu'))

# 3rd layer: Dense + ReLU. 128 nodes
model.add(Dense(128, activation='relu'))

# Output layer. 10 nodes, 1 for each class
model.add(Dense(10, activation='softmax'))

model.summary()

  super().__init__(**kwargs)


## 4. Compile the model
The typical loss function for a multi-class problem is the multi-class cross-entropy loss function. In Keras, there are two options. One is to use the `sparse_categorical_crossentropy` loss with the original sparse labels (i.e., for each image, there is just one actual class index, from 0 to 9 in this case). The other is to use `categorical_crossentropy` loss if the actual output is a one-hot vector (e.g., [0, 0, 1, 0, ...., 0]). In this case, we will need to first convert the current sparse label (i.e., class index) to one-hot vecore labels by using `keras.utils.to_categorical()` method.

In [4]:
# Add your code here


## 5. Train and validate the model
We use a validation set to moniter your model. We also draw the learning curve on the training and validation sets, to see how your model is learnt and how it generalises to new data, then try to adjust our model and add any regularization techniques accordingly till we are satisfied.

In [5]:
# Add your code here


## 6. Evaluate the model
First evaluate our model on the test set to report the accuracy on the test set. Then use the `model`'s `predict()` method to make predictions on new instances. Display a few images and compare their predicting classes with their actual classes.

In [6]:
# Add your code here
