# Fashion MNIST image recognition with cross validation
## Yijing Xiao

## https://www.kaggle.com/zalando-research/fashionmnist

In the last blog post, I applied the CNN model to fashion MNIST. Not surprisingly, the model does not achieve as high accuracy as it did on the MNIST handwritten digit recongnition task. 

In this post, I applied the method of [Cross Validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)), hoping to get a better model for fashion MNIST task. 

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

- Data preparation

In [2]:
data = pd.read_csv("fashion-mnist_train.csv")
data.head()
image_size = data.iloc[:, 1:].values.shape[1]
image_width = image_height = np.ceil(np.sqrt(image_size)).astype(np.uint8)
image_width
num_data = data.shape[0]

## Standard Convolutional Neural Network 

borrowed from https://github.com/pytorch/examples/blob/master/mnist/main.py


The CNN model is exactly the same as before.

In [4]:
BATCH_SIZE = 32
def eval(model, data):
    '''
        args:
            data: 42000 * 785 matrix
            
    '''
    model.eval()
    correct_count = 0.
    total_count = 0.
    for i in range(0, data.shape[0], BATCH_SIZE):
        batch_data = data.iloc[i:i+BATCH_SIZE, :].values
        x = batch_data[:, 1:] # 32 * 784
        y = batch_data[:, 0] # 32
        x = Variable(torch.from_numpy(x), volatile=True).float()
        y = Variable(torch.from_numpy(y), volatile=True)
        pred = model(x)
        loss = F.nll_loss(pred, y)
        correct_count += torch.sum(torch.max(pred, 1)[1] == y).data[0]
        total_count += batch_data.shape[0]
    return correct_count, total_count
        

In [3]:
# define a neural network, convolutional neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = x.view(x.size(0), 1, int(image_height), int(image_width)) # B * 1 * 28 * 28
        x = F.relu(F.max_pool2d(self.conv1(x), 2)) # B * 10 * 12 * 12
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # B * 20 * 4 * 4
        x = x.view(-1, 320) 
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)



## Cross Validation

Cross validation happens here. I apply the so-called k-fold cross validation. I evenly split the training set into 10 folds. Each time of training, I keep one fold as the dev set and the remaining 9 sets as the training set. Same as before, I do early stopping on the dev set while training. This process is repeated 10 times, so we get 10 models. 

In [6]:
num_dev_data = int(num_data * 0.1)
for split in range(10):
    dev_index = data.index.isin(list(range(num_dev_data*split, num_dev_data*(split+1))))
    train_data = data[~dev_index]
    dev_data = data[dev_index]

    # instantiate the model
    model = Net()
    
    learning_rate = 0.005
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    best_acc = 0.
    # start training
    for epoch in range(20):
        model.train()
        for i in range(0, train_data.shape[0], BATCH_SIZE):
            batch_data = train_data.iloc[i:i+BATCH_SIZE, :].values
            x = batch_data[:, 1:] # 32 * 784
            y = batch_data[:, 0] # 32
            x = Variable(torch.from_numpy(x)).float()
            y = Variable(torch.from_numpy(y))
            pred = model(x)
            optimizer.zero_grad()
            loss = F.nll_loss(pred, y)
            loss.backward()
            optimizer.step()
        correct_count, total_count = eval(model, dev_data)
        acc = correct_count / total_count
        print("dev acc: {}".format(acc))
        if acc > best_acc:
            best_acc = acc
            print("save the model")
            torch.save(model.state_dict(), "model-cross-validate/model-{}.th".format(split))
        else:
            learning_rate *= 0.8
            optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) 

dev acc: 0.5945
save the model
dev acc: 0.6318333333333334
save the model
dev acc: 0.6103333333333333
dev acc: 0.6651666666666667
save the model
dev acc: 0.6885
save the model
dev acc: 0.5778333333333333
dev acc: 0.6818333333333333
dev acc: 0.7003333333333334
save the model
dev acc: 0.6788333333333333
dev acc: 0.6648333333333334
dev acc: 0.7033333333333334
save the model
dev acc: 0.7135
save the model
dev acc: 0.7331666666666666
save the model
dev acc: 0.7343333333333333
save the model
dev acc: 0.725
dev acc: 0.739
save the model
dev acc: 0.7388333333333333
dev acc: 0.7133333333333334
dev acc: 0.7321666666666666
dev acc: 0.7421666666666666
save the model
dev acc: 0.7543333333333333
save the model
dev acc: 0.7735
save the model
dev acc: 0.8011666666666667
save the model
dev acc: 0.7731666666666667
dev acc: 0.7756666666666666
dev acc: 0.7836666666666666
dev acc: 0.7848333333333334
dev acc: 0.7823333333333333
dev acc: 0.792
dev acc: 0.8071666666666667
save the model
dev acc: 0.805
dev acc

## load the models for testing purpose

During testing time, I load all ten models and keep them in a list. When making a prediction on a single instance, I apply each of the 10 models to give scores, and add the predicted scores up to make a single prediction. I do argmax over the summed scores to make the final prediction.  

In [5]:
models = []
for split in range(10):
    model = Net()
    model.load_state_dict(torch.load("model-cross-validate/model-{}.th".format(split)))
    models.append(model)

In [6]:
def test(models, data):
    '''
        args:
            data: 42000 * 785 matrix
    '''
    for model in models:
        model.eval()
    correct_count = 0.
    total_count = 0.
    for i in range(0, data.shape[0], BATCH_SIZE):
        
        batch_data = data.iloc[i:i+BATCH_SIZE, :].values
        x = batch_data[:, 1:] # 32 * 784
        y = batch_data[:, 0] # 32
        x = Variable(torch.from_numpy(x), volatile=True).float()
        y = Variable(torch.from_numpy(y), volatile=True)
        preds = 0.
        for model in models:
            preds += model(x)
        correct_count += torch.sum(torch.max(preds, 1)[1] == y).data[0]
        total_count += batch_data.shape[0]
    return correct_count, total_count
        

In [7]:

test_data = pd.read_csv("fashion-mnist_test.csv")
correct_count, total_count = test(models, test_data)
print("test acc: {}".format(correct_count/total_count))

test acc: 0.8267


There is another way to do it, we can apply each model to the prediction task, and do argmax to get a single prediction of each model. Then we do a vote among those ten models. The majority wins. 

In [19]:
from scipy.stats import mode


def test_vote(models, data):
    '''
        args:
            data: 42000 * 785 matrix
    '''
    for model in models:
        model.eval()
    correct_count = 0.
    total_count = 0.
    for i in range(0, data.shape[0], BATCH_SIZE):
        
        batch_data = data.iloc[i:i+BATCH_SIZE, :].values
        x = batch_data[:, 1:] # 32 * 784
        y = batch_data[:, 0] # 32
        x = Variable(torch.from_numpy(x), volatile=True).float()
        y = Variable(torch.from_numpy(y), volatile=True)
        preds = []
        for model in models:
            preds.append(torch.max(model(x), 1)[1])
        preds = torch.cat(preds, 1).data.numpy()
        votes = mode(preds, axis=-1)[0].reshape(-1)
        y = y.data.numpy().reshape(-1)
        correct_count += (y == votes).sum()
        total_count += batch_data.shape[0]
    return correct_count, total_count

In [20]:
test_data = pd.read_csv("fashion-mnist_test.csv")
correct_count, total_count = test_vote(models, test_data)
print("test acc with majority vote: {}".format(correct_count/total_count))

test acc with majority vote: 0.8172


In this case, majority vote is worse than summing over the scores. 