In [64]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.optimizers import Adam
from keras.callbacks import TensorBoard
from keras.callbacks import ReduceLROnPlateau
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical

In [65]:
from keras.datasets import fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

I normalized images between [0, 1] since the CNN converges faster in this range than it does in [0,255]. I also reshaped the images to (60000,28,28,1) and (10000,28,28,1). 1 represents the canal for gray scale images. (e.g canal is 3 for RGB). I also tried mean subtraction and variance normalization but it didn't work out so I gave up on it.

In [66]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
#mean=(x_train.mean()+x_test.mean())/2
#std=(x_train.std()+x_test.std())/2
#x_train=(x_train-mean)/std
#x_test=(x_test-mean)/std
x_train = x_train.reshape(x_train.shape[0], *(28,28,1))
x_test = x_test.reshape(x_test.shape[0], *(28,28,1))

I made sure that x is image data and y is label. I shuffled and split the training data to create a separate validation set. train_test_split both shuffles and splits.

In [67]:
x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train, test_size = 0.1, random_state = 19052005)
print('x_train shape: ', x_train.shape)
print('y_train shape: ', y_train.shape)
print('x_validation shape: ', x_validation.shape)
print('y_validation shape: ', y_validation.shape)
print('x_test shape: ', x_test.shape)
print('y_test shape: ', y_test.shape)

x_train shape:  (54000, 28, 28, 1)
y_train shape:  (54000,)
x_validation shape:  (6000, 28, 28, 1)
y_validation shape:  (6000,)
x_test shape:  (10000, 28, 28, 1)
y_test shape:  (10000,)


I realized that this dataset is prone to overfitting. I used multiple Dropout layers to avoid that.

In [68]:
cnn_model = Sequential()
cnn_model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', 
                 activation ='relu', input_shape = (28,28,1)))
cnn_model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same', 
                 activation ='relu'))
cnn_model.add(MaxPooling2D(pool_size=(3,3)))
cnn_model.add(Dropout(0.35))
cnn_model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same', 
                 activation ='relu'))
cnn_model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same', 
                 activation ='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2,2)))
cnn_model.add(Dropout(0.35))
cnn_model.add(Flatten())
cnn_model.add(Dense(units = 256, activation = 'relu'))
cnn_model.add(Dropout(0.4))
cnn_model.add(Dense(units = 10, activation = 'softmax'))
cnn_model.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_26 (Conv2D)           (None, 28, 28, 32)        320       
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 28, 28, 64)        18496     
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 9, 9, 64)          0         
_________________________________________________________________
dropout_14 (Dropout)         (None, 9, 9, 64)          0         
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 9, 9, 64)          36928     
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 9, 9, 64)          36928     
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 4, 4, 64)         

I choose Adam Optimizer. For the learning rate I use a trick called annealing. With high learning rates the model converges faster. However, low learning rates are better at finding the global minima. Hence, I make my learning rate start high first and get lower when necessary. <em>This is the specifically tricky part of my model.</em>

In [69]:
cnn_model.compile(loss ='sparse_categorical_crossentropy', optimizer=Adam(lr=0.001),metrics =['accuracy'])

learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

I applied data augmentation on training data right before the training. Out of Feature Standardization, ZCA Whitening, Random Zoom; I only used Feature Standardization since only that boosted the model.

In [70]:
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
datagen.fit(x_train)

In [71]:
epochs = 50
batch_size=64
history = cnn_model.fit(x_train,
                        y_train,
                        batch_size = batch_size,
                        epochs = epochs,
                        verbose = 2,
                        validation_data = (x_validation, y_validation), callbacks=[learning_rate_reduction])

Train on 54000 samples, validate on 6000 samples
Epoch 1/50
 - 252s - loss: 0.5729 - acc: 0.7863 - val_loss: 0.3088 - val_acc: 0.8838
Epoch 2/50
 - 241s - loss: 0.3466 - acc: 0.8731 - val_loss: 0.2925 - val_acc: 0.8880
Epoch 3/50
 - 243s - loss: 0.2981 - acc: 0.8914 - val_loss: 0.2338 - val_acc: 0.9118
Epoch 4/50
 - 245s - loss: 0.2717 - acc: 0.9006 - val_loss: 0.2254 - val_acc: 0.9153
Epoch 5/50
 - 238s - loss: 0.2549 - acc: 0.9060 - val_loss: 0.2135 - val_acc: 0.9202
Epoch 6/50
 - 240s - loss: 0.2408 - acc: 0.9114 - val_loss: 0.2116 - val_acc: 0.9197
Epoch 7/50
 - 237s - loss: 0.2305 - acc: 0.9156 - val_loss: 0.2072 - val_acc: 0.9245
Epoch 8/50
 - 246s - loss: 0.2199 - acc: 0.9191 - val_loss: 0.2023 - val_acc: 0.9248
Epoch 9/50
 - 227s - loss: 0.2122 - acc: 0.9211 - val_loss: 0.1931 - val_acc: 0.9238
Epoch 10/50
 - 856s - loss: 0.2034 - acc: 0.9252 - val_loss: 0.1856 - val_acc: 0.9308
Epoch 11/50
 - 264s - loss: 0.1968 - acc: 0.9277 - val_loss: 0.1854 - val_acc: 0.9288
Epoch 12/50
 -

This is to find the test accuracy.

In [72]:
evaluation = cnn_model.evaluate(x_test, y_test)
print('Test Accuracy : {:.3f}'.format(evaluation[1]))

Test Accuracy : 0.940


# Small Report:

I actually used Google Colab for their GPU while training the model. I modified my method a couple of times. This was possible because of the GPU provided by Google. In the docker image, the program works on a CPU and one epoch takes almost thirty times how much it would take on a GPU. However, my computer got broken and I was able to finish it on my sister's computer without any further installation. This is the good thing about Docker.

I found 0.9414 accuracy with the current model. I think I achieved such a result by overfitting avoidance, adaptive learning rate and data augmentation.