<a href="https://colab.research.google.com/github/schwartz-cnl/Computational-Neuroscience-Class/blob/main/Convolutional%20Neural%20Network/cnn_fashion_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Convolutional neural network (CNN) for Fashion MNIST dataset**


by Md Nasir Uddin Laskar; 
Edited 2022: Odelia Schwartz and Xu Pan

Change colab runtime type to GPU to accelerate: "Runtime" --> "Change runtime type"--> under "Hardware accelerator" select "GPU".


-  We will implement a simple CNN model with 2 convolution and 2 fully connected layers as the following figure. **conv1 --> relu1 --> pool1 ----> conv2 --> relu2 --> pool2 ----> fc1 --> fc2**




![picture](https://nasirml.files.wordpress.com/2019/01/simple_convnet-3.png?w=580&h=182)







## **Common deep learning steps**

* Step 1: Generate training and test data (and preprocess)
* Step 2: Initialize the network parameters
* Step 3: Forward propagation
* Step 4: Compute the cost/loss
* Step 5: Backpropagation or create an optimizer to minimize the cost from step 4
* Step 6: Evaluate the model with your test set


In [None]:
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
import os   # to save the checkpoint
#sess = tf.InteractiveSession()

# **Load the dataset**
* **Fashion-MNIST** is a dataset of article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. It shares the same image size and structure of training and testing splits of MNIST.


* The images are 28x28 NumPy arrays, with pixel values ranging between 0 and 255. The labels are an array of integers, ranging from 0 to 9. These correspond to the class of clothing the image represents.

* MNIST is very easy. Classic machine learning algorithms can also achieve 97% easily as you have see in the last lab. With convolutional nets, we can achieve 99.7% on MNIST. That is why we are trying a more serious dataset which a bit challenging but still gives us the fexibility and ease of MNIST.

* Sample images from the Fashion MNIST data in the following figure (each class takes three rows).





<table>
  <tr><td>
    <img src="https://tensorflow.org/images/fashion-mnist-sprite.png"
         alt="Fashion MNIST sprite"  width="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>&nbsp;
  </td></tr>
</table>

Loading the dataset returns four NumPy arrays:

* The `x_train` and `y_train` arrays are the *training set*—the data the model uses to learn.
* The model is tested against the *test set*, the `x_test`, and `y_test` arrays.

The images are 28x28 NumPy arrays, with pixel values ranging between 0 and 255. The *labels* are an array of integers, ranging from 0 to 9. These correspond to the *class* of clothing the image represents:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th> 
  </tr>
  <tr>
    <td>0</td>
    <td>T-shirt/top</td> 
  </tr>
  <tr>
    <td>1</td>
    <td>Trouser</td> 
  </tr>
    <tr>
    <td>2</td>
    <td>Pullover</td> 
  </tr>
    <tr>
    <td>3</td>
    <td>Dress</td> 
  </tr>
    <tr>
    <td>4</td>
    <td>Coat</td> 
  </tr>
    <tr>
    <td>5</td>
    <td>Sandal</td> 
  </tr>
    <tr>
    <td>6</td>
    <td>Shirt</td> 
  </tr>
    <tr>
    <td>7</td>
    <td>Sneaker</td> 
  </tr>
    <tr>
    <td>8</td>
    <td>Bag</td> 
  </tr>
    <tr>
    <td>9</td>
    <td>Ankle boot</td> 
  </tr>
</table>

* Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:

* **loading the dataset is very easy as it is already included in the TensorFlow/Keras library. We just have to include `tf.keras.datasets.fashion_mnist` and call the `load_data()` function.**
* It returns training and test datasets including the labels. 


In [None]:
def read_dataset():
  '''
  - read fashion mnist dataset
  - 
  '''
  fashion_mnist = tf.keras.datasets.fashion_mnist
  (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
  
  return x_train, y_train, x_test, y_test


In [None]:
# call the function to read fashion mnist
x_train, y_train, x_test, y_test = read_dataset()

# **Look at the data**

* **Training data** — used for training the model.

* **Test data** — are the images not seen the network yet. A totally new set to test the generalization capability of the network. 



In [None]:
print('Training data size:' + str(x_train.shape))
print('Training labels size: ' + str(y_train.shape))

print('Test data size: ' + str(x_test.shape))
print('Test label size: ' + str(y_test.shape))

In [None]:
fashion_mnist_labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
                        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
print(fashion_mnist_labels)
print(y_train[100], fashion_mnist_labels[y_train[100]])

In [None]:
# look at the first 10 images
for i in range(10):
    plt.subplot(2, 5, i+1); plt.axis('off')
    plt.imshow(np.reshape(x_train[i], [28, 28]), cmap='gray')
    plt.title(fashion_mnist_labels[y_train[i]])    
plt.show()

# **Preprocess data**
* **One-hot encoding.** 
* Reshape to `28 x 28x 1`


In [None]:
# normalize the images
#x_train = x_train.astype('float32')/255.0  # sometimes standard to have pixel values in the range [0,1]
#x_test = x_test.astype('float32')/255.0 

# Convert the labels to one-hot vectors
print(y_train[0:5])
y_train = tf.keras.utils.to_categorical(y_train, 10)  # 10 classes
y_test = tf.keras.utils.to_categorical(y_test, 10)
print(y_train[0:5])

In [None]:
# Reshape input data from (28,28) to (28, 28, 1), to 3D
# height x width x channel. we have gray scale images so channel = 1
print(np.shape(x_test))
w, h = 28, 28
x_train = x_train.reshape(x_train.shape[0], w, h, 1)
x_test = x_test.reshape(x_test.shape[0], w, h, 1)
print(np.shape(x_test))

# **Build the CNN model**
- Make sure to define the input shape in the first layer of the neural network. 
- The rest of the layers can adjust the input and output shapes automatically





---

![picture](https://nasirml.files.wordpress.com/2019/01/simple_convnet-3.png?w=580&h=182)






*   Convolutional layer (conv):
*   ReLU layer:
*   Pooling layer:
*   Dropout layer:
*   Fully connected layer (FC):




* We use `model.compile()`  to configure the learning process before training the model. 

* This is where we define the type of loss function, optimizer and the metrics evaluated by the model during training and testing.

* **Important:** before we apply the fully-connected layers, we need to flatten the the current output.

In [None]:
def create_model():
  model = tf.keras.Sequential()

  model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', activation='relu', input_shape=(28,28,1))) 
  model.add(tf.keras.layers.MaxPooling2D(pool_size=2))

  model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'))
  model.add(tf.keras.layers.MaxPooling2D(pool_size=2))

  model.add(tf.keras.layers.Flatten())
  model.add(tf.keras.layers.Dense(256, activation='relu')) # 1568*256+256 = 401664 params total
  model.add(tf.keras.layers.Dropout(0.5))
  model.add(tf.keras.layers.Dense(10, activation='softmax'))
  
  model.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])
  
  return model

In [None]:
# create the model 

model = create_model()

## **Model summary**

* `conv2d` means the convolution in 2D (using 3x3 filters).
* Number of parameters 640 comes from `3*3*64 + 64`


In [None]:

# Take a look at the model summary
model.summary()


# **Now Train our model CNN**



*  Train the model and also save the checkpoint so that we do not have to train everytime

*   Shuffle is TRUE by default. Still I include it to tell that this is very important.

*  Takes around 20 minutes for 10 epochs/iterations.

* Loss decreases and accuracy increases



In [None]:
'''
- more params
  - validation_split=0.1 means 10% of the training data will be used for validation set
  -
'''
n_iteration = 10
history = model.fit(x_train, y_train, batch_size=64, epochs=n_iteration, shuffle=True)


# **See how the training has gone**

* Lets evaluate our model on the test set.
* Looks like our CNN model doing a pretty good job at classifying the unknown images with almost 90% accuracy with 3 iterations and  and 92% with 10 epochs!

In [None]:
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test accuracy: %f' %test_acc)

## **See the loss**

* Loss should decrease over time.
* If you see that the loss in your program does not decrease over time, most likely, something went wrong and you should check with your input and network initialization and hyper-parameters.


In [None]:
# history. all losses
all_loss = history.history['loss']
print(len(all_loss))

# show the training errors
plt.plot(np.squeeze(all_loss))
plt.xlabel('iterations');plt.ylabel('cost')
#plt.title('Learning rate %f' %learning_rate)
plt.show()



## **Visualize some predictions**

* Let's randomly pick some **test** images and visualize the prediction using the model we just trained. First we get the predictions with the model from the test data. Then we print out some images from the test data set, and set the titles with the prediction (and the groud truth label). If the prediction matches the true label, the title will be green; otherwise it's displayed in red.

* As we know the accuracy is around 90%, we see that the network occasionaly makes mistakes to predict a class.

* You can convince yourself that the mistakes are somewhat reasonable.

In [None]:
# find the probability of the 10 classes and then assign the class label with 
# the higher probability.  model.predict(x_test) gives directly the probabilities
# predicted for each of the class labels.
y_hat = model.predict(x_test)
print(y_hat.shape)
print (y_hat[500])
print (np.round(y_hat[500],2))

# Plot a random sample of 16 test images, their predicted labels and ground truth
figure = plt.figure(figsize=(20, 8))
for i, index in enumerate(np.random.choice(x_test.shape[0], size=16, replace=False)):
    ax = figure.add_subplot(4, 4, i + 1, xticks=[], yticks=[])
    # Display each image
    ax.imshow(np.squeeze(x_test[index]),cmap='gray')
    predict_index = np.argmax(y_hat[index])                 # The fashion label number that is predicted 
    true_index = np.argmax(y_test[index])                   # The true label number
    # Set the title for each image
    ax.set_title("{} ({})".format(fashion_mnist_labels[predict_index], 
                                  fashion_mnist_labels[true_index]),
                                  color=("green" if predict_index == true_index else "red"))

## **TODO**


1. **Add one convolution layer** in the network and train again. See if it increases the accuracy. Optional: You could further try changing other aspects of the network archirecture and computations to see if you can improve the result.


2. Find the **Confusion Matrix** and see how many samples in each class the network can predict correctly and how many are not predicted correctly. To do so, create a 10 by 10 matrix initialized to zeros, with the X axis corresponding to the true index and the Y axis to the predicted index. Loop over the 10,000 samples and each time add 1 to the matrix position asscoiated with the true index on the X axis and predicted index on the Y axis. Use plt.imshow to display your resulting matrix. What fashion items are most confused? What would the confusion matrix look like for a perfect network that always predicts correctly?

3. Try training the regular mnist database (replace fashion_mnist with mnist). How do the test accuracy and confusion matrices compare to the fashion_mnist?


