# MNIST fashion classification
**This notebook is a demo of classifying MNIST fashion dataset.**

1. importing CNN models (defined in `image_models.py`) and acquiring data.
2. defining necessary functions, such as loss function or train function.
3. comparing hyperparameters (weight decay, learning rate) and train methods.
4. comparing different models with visualization.
5. summarizing.

## 1. Importing CNN models and Acquiring data

In this project, I used mxnet's ML module to define my CNN models. Please check implemetation of them in `image_models.py`. I have following pre-defined models.
- Basic MLP (no convlutional layer)
- LeNet
- AlexNet
- VGG 11
- ResNet 18
- ResNet 34

I used FashionMNIST dataset (https://github.com/zalandoresearch/fashion-mnist), a popular image classification dataset for benchmarking machine learning models.

In [11]:
import numpy as np
import matplotlib.pyplot as plt
import mxnet as mx
from mxnet import autograd, gluon, init, nd
from mxnet.gluon import data as gdata, loss as gloss, nn, utils
from image_models import *

train = gdata.vision.FashionMNIST(train=True)
test = gdata.vision.FashionMNIST(train=False)

## 2. Defining necessary functions.

- loss function
- train function
- accuracy function
- data loader & batch size

In [12]:
# loss function
loss = gloss.SoftmaxCrossEntropyLoss()

# if you are using a machine with a GPU, set a context as GPU.
# if you are using a machine without a GPU, uncomment the line .
# context = mx.cpu()
context = mx.gpu()

# train fuction
def train(net, train_iter, test_iter, batch_size, trainer, num_epochs, loss):
    # iterate through epochs
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        for X, y in train_iter:
            y = y.as_in_context(context)
            with autograd.record():
                y_hat = net(X.as_in_context(context))
                l = loss(y_hat, y).sum()
            l.backward()
            trainer.step(batch_size)
            y = y.astype('float32')
            train_l_sum += l.asscalar()
            train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar()
            n += y.size
        test_acc = evaluate_accuracy(test_iter, net, ctx)
        print('epoch %d --> loss %.4f, train acc %.3f, test acc %.3f, '
              'time %.1f sec'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc,
                 time.time() - start))

# accuracy function
def evaluate_accuracy(data_iter, net):
    acc_sum, n = nd.array([0], ctx=context), 0
    for X, y in data_iter:
        X = X.as_in_context(context), 
        y = y.as_in_context(context).astype('float32')
        acc_sum += (net(X).argmax(axis=1) == y).sum()
        n += y.size
    return acc_sum.asscalar() / n

# data loader, batch size, num_epochs
batch_size = 256
num_epochs = 
gdata.vision.transforms.ToTensor()
train_iter = gdata.DataLoader(train.transform_first(transformer),
                              batch_size, shuffle=True,
                              num_workers=num_workers)
test_iter = gdata.DataLoader(test.transform_first(transformer),
                             batch_size, shuffle=False,
                             num_workers=num_workers)

AttributeError: 'function' object has no attribute 'transform_first'

## 3. Comparing hyperparameters and train methods.

We have following hyperpameters/learning methods to tune.

- learning rate
    - 1, 0.1, 0.01
- trainer function
    - SGD
    - Adam
- weight decay
    - 0.3 ~ 0.5
- batch normalization
- activation layer
    - Sigmoid
    - ReLu (https://en.wikipedia.org/wiki/Rectifier_(neural_networks))
- pooling type
    - average pooling
    - max pooling
    
**We will tune the hyperparameter with the base model of LeNet.**

*for dropout, I set the proportion as 0.5 (as default).*

### 3-1. Learning rate

In [None]:
lrs = [1, 0.1, 0.01]
for lr in lrs:
    