# MNIST fashion classification
**This notebook is a demo of classifying MNIST fashion dataset.**

1. importing CNN models (defined in `image_models.py`) and acquiring data.
2. defining necessary functions, such as loss function or train function.
3. comparing hyperparameters (weight decay, learning rate) and train methods.
4. comparing different models with visualization.
5. summarizing.

## 1. Importing CNN models and Acquiring data

In this project, I used mxnet's ML module to define my CNN models. Please check implemetation of them in `image_models.py`. I have following pre-defined models.
- Basic MLP (no convlutional layer)
- LeNet
- AlexNet
- VGG 11
- ResNet 18
- ResNet 34

I used FashionMNIST dataset (https://github.com/zalandoresearch/fashion-mnist), a popular image classification dataset for benchmarking machine learning models.

In [13]:
import numpy as np
import matplotlib.pyplot as plt
import time
import mxnet as mx
from mxnet import autograd, gluon, init, nd
from mxnet.gluon import data as gdata, loss as gloss, nn, utils
from image_models import *

train_data = gdata.vision.FashionMNIST(train=True)
test_data = gdata.vision.FashionMNIST(train=False)

## 2. Defining necessary functions.

- loss function
- train function
- accuracy function
- data loader & batch size

In [18]:
# loss function
# since we aim to classify images into different categories, we use softmax.
loss = gloss.SoftmaxCrossEntropyLoss()

# if you are using a machine with a GPU, set a context as GPU.
# if you are using a machine without a GPU, uncomment the following line .
# context = mx.cpu()
context = mx.gpu()

# train fuction
def train(net, train_iter, test_iter, batch_size, trainer, num_epochs, loss):
    # iterate through epochs
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        for X, y in train_iter:
            y = y.as_in_context(context)
            with autograd.record():
                y_hat = net(X.as_in_context(context))
                l = loss(y_hat, y).sum()
            l.backward()
            trainer.step(batch_size)
            y = y.astype('float32')
            train_l_sum += l.asscalar()
            train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar()
            n += y.size
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d --> loss %.4f, train acc %.3f, test acc %.3f, '
              'time %.1f sec'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc,
                 time.time() - start))

# accuracy function (that returns average accuracy)
def evaluate_accuracy(data_iter, net):
    # we have to use nd array for cumulative accuracy
    cum_acc = nd.array([0], ctx=context)
    cum_size = 0
    for X, y in data_iter:
        X = X.as_in_context(context) 
        y = y.as_in_context(context).astype('float32')
        cum_acc += (net(X).argmax(axis=1) == y).sum()
        cum_size += y.size
    
    # return the average accuracy as scalar
    return cum_acc.asscalar() / cum_size

# data loader & batch size
batch_size = 128
transformer = gdata.vision.transforms.ToTensor()
train_iter = gdata.DataLoader(train_data.transform_first(transformer),
                              batch_size, shuffle=True)
test_iter = gdata.DataLoader(test_data.transform_first(transformer),
                             batch_size, shuffle=False)

## 3. Comparing hyperparameters and train methods.

We have following hyperpameters/learning methods to tune.

- learning rate
    - 1, 0.1, 0.01
- trainer function
    - SGD
    - Adam (https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam)
- weight decay
    - 0.3 ~ 0.5
- batch normalization
- activation layer
    - Sigmoid
    - ReLu (https://en.wikipedia.org/wiki/Rectifier_(neural_networks))
- pooling type
    - average pooling
    - max pooling
    
**We will tune the hyperparameter with the base model of LeNet.**

*`get_LeNet` function has following default parameters, as shown in `image_models.py`*
- *pooling='avg'*
- *activation='sigmoid*' 
- *batch_norm=False*


*For dropout, I set the proportion as 0.5 (as default).*

### 3-1. Learning rate

In [19]:
lrs = [1, 0.1, 0.01]
for lr in lrs:
    print("trying lr: {}".format(lr))
    print("-------------------------")
    # with default parameters 
    LeNet = get_LeNet()
    num_epochs = 10
    LeNet.initialize(force_reinit=True, ctx=context, init=init.Xavier())
    trainer = gluon.Trainer(LeNet.collect_params(), 'sgd', {'learning_rate': lr})
    train(LeNet, train_iter, test_iter, batch_size, trainer, num_epochs, loss)
    print("-------------------------\n")

trying lr: 1
-------------------------
epoch 1 --> loss 2.3129, train acc 0.100, test acc 0.100, time 6.2 sec
epoch 2 --> loss 2.0652, train acc 0.196, test acc 0.583, time 5.9 sec
epoch 3 --> loss 0.8443, train acc 0.664, test acc 0.714, time 6.1 sec
epoch 4 --> loss 0.6299, train acc 0.754, test acc 0.772, time 5.8 sec
epoch 5 --> loss 0.5433, train acc 0.791, test acc 0.791, time 5.8 sec
epoch 6 --> loss 0.4899, train acc 0.813, test acc 0.821, time 5.8 sec
epoch 7 --> loss 0.4520, train acc 0.830, test acc 0.840, time 5.9 sec
epoch 8 --> loss 0.4235, train acc 0.842, test acc 0.856, time 5.9 sec
epoch 9 --> loss 0.4021, train acc 0.850, test acc 0.859, time 5.9 sec
epoch 10 --> loss 0.3811, train acc 0.858, test acc 0.864, time 6.0 sec
-------------------------

trying lr: 0.1
-------------------------
epoch 1 --> loss 2.3079, train acc 0.101, test acc 0.100, time 6.0 sec
epoch 2 --> loss 2.3044, train acc 0.105, test acc 0.100, time 5.9 sec
epoch 3 --> loss 2.2693, train acc 0.162

**It seems like we can use the learning rate of 1 or 0.1, but since we plan on training with more epochs later, we will use 0.1 as our learning rate.**

In [23]:
lr = 0.1

### 3-2. Trainer function

In [24]:
# trainer types
t_types = ['sgd', 'adam']
for t_type in t_types:
    print("trying trainer type: ", t_type)
    print("-------------------------")
    # with default parameters 
    LeNet = get_LeNet()
    num_epochs = 10
    LeNet.initialize(force_reinit=True, ctx=context, init=init.Xavier())
    trainer = gluon.Trainer(LeNet.collect_params(), t_type, 
                            {'learning_rate': lr})
    train(LeNet, train_iter, test_iter, batch_size, trainer, num_epochs, loss)
    print("-------------------------\n")

trying trainer type:  sgd
-------------------------
epoch 1 --> loss 2.3105, train acc 0.099, test acc 0.100, time 5.8 sec
epoch 2 --> loss 2.3074, train acc 0.100, test acc 0.100, time 5.8 sec
epoch 3 --> loss 2.3020, train acc 0.111, test acc 0.170, time 5.8 sec
epoch 4 --> loss 2.1054, train acc 0.277, test acc 0.514, time 5.8 sec
epoch 5 --> loss 1.2565, train acc 0.548, test acc 0.585, time 5.9 sec
epoch 6 --> loss 1.0079, train acc 0.611, test acc 0.654, time 5.9 sec
epoch 7 --> loss 0.8955, train acc 0.666, test acc 0.686, time 5.8 sec
epoch 8 --> loss 0.8332, train acc 0.691, test acc 0.683, time 5.8 sec
epoch 9 --> loss 0.7834, train acc 0.711, test acc 0.721, time 6.0 sec
epoch 10 --> loss 0.7340, train acc 0.727, test acc 0.726, time 6.7 sec
-------------------------

trying trainer type:  adam
-------------------------
epoch 1 --> loss 2.4372, train acc 0.100, test acc 0.100, time 6.8 sec
epoch 2 --> loss 2.4034, train acc 0.098, test acc 0.100, time 6.8 sec
epoch 3 --> los

**We will use sgd as a trainer in this demo.**

### 3-3. Weight Decay

## 4. Comparing different models with visualization.

We will compare following neural network models with hyperparameters from part 3.

- Basic MLP (no convlutional layer)
- LeNet
- AlexNet
- VGG 11
- ResNet 18
- ResNet 34

I will import models from `image_models.py`