****Part III**** - Finally, take the following
[dataset](https://www.dropbox.com/s/otc12z2w7f7xm8z/mnistTask3.zip),
train on this dataset and provide test accuracy on the MNIST test set,
using the same test split from part 2. Train using scratch random
initialization and using the pretrained network part 1. Do the same
analysis as 2 and report what happens this time. Try and do qualitative
analysis of what's different in this dataset. Please save your model
checkpoints.

****Dataset Overview****

On first look the dataset appears to be MNIST itself, containing 60k
images and \[0-9\] as classes. However, on closer inspection, it is
evident that images have not been placed in the correct category, such
that every class contains images from every other class except its own
i.e.

The folder for `0` contains images 1-9, the folder for `1` contains 0,
2-9 and so on.

Since I have already built a framework for plugging in different
datasets, I made use of the same, writing a DataModule
(`MNISTWrongModule`) to load the provided dataset. As mentioned, the
test dataset is from MNIST and the network was trained from scratch.

In [None]:
import os
import argparse

import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping

from numbers_and_letters import NumbersAndLettersCNN, NumbersAndLettersModule
from mnist import MNISTModule
from mnist_wrong import MNISTWrongModule

parser = argparse.ArgumentParser()
parser.add_argument('--dataset', default='numbers_and_letters')
parser.add_argument('--pretrained', default=False)
args = parser.parse_args()

args.dataset = 'mnist_wrong'

SAVE_PATH = "models/"
MODEL_NAME = '5conv1fc_mnist_wrong'
LOAD_MODEL_NAME = '5conv1fc_numbers'
BATCH_SIZE = 32
NUMBERS_ONLY = True

if args.dataset == 'numbers_and_letters':
    BASE_DIR = "train"
    INPUT_DIM = torch.tensor([3, 900, 1200])

    # Create DataModule to handle loading of dataset
    data_module = NumbersAndLettersModule(BASE_DIR, BATCH_SIZE, NUMBERS_ONLY)
    model = NumbersAndLettersCNN(INPUT_DIM, len(data_module.img_labels),
                                 data_module.img_labels, NUMBERS_ONLY)
elif args.dataset == 'mnist':
    data_module = MNISTModule(BATCH_SIZE)
    INPUT_DIM = torch.tensor([1, 28, 28])
    model = NumbersAndLettersCNN(INPUT_DIM, 10, ['0','1','2','3','4',
                                                 '5','6','7','8','9'], NUMBERS_ONLY)
elif args.dataset == 'mnist_wrong':
    data_module = MNISTWrongModule(BATCH_SIZE)
    INPUT_DIM = torch.tensor([1, 28, 28])
    model = NumbersAndLettersCNN(INPUT_DIM, 10, ['0','1','2','3','4',
                                                 '5','6','7','8','9'], NUMBERS_ONLY)
else:
    print("Invalid dataset choice")
    exit(0)

# Log metrics to WandB
wandb_logger = pl.loggers.WandbLogger(save_dir='logs/',
                                        name=MODEL_NAME,
                                        project='midas-task-2')
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=1
)

if args.pretrained:
    model.load_state_dict(torch.load(os.path.join(SAVE_PATH, LOAD_MODEL_NAME),
                                     map_location=torch.device('cuda')))

trainer = pl.Trainer(gpus=1, logger=wandb_logger,
                     callbacks=[early_stopping])
trainer.fit(model, data_module)
trainer.test(model=model, datamodule=data_module)

# Save model
torch.save(model.state_dict(), os.path.join(SAVE_PATH, MODEL_NAME))


****Summary****

|                         |            |
|-------------------------|------------|
| **Metric**              | **Result** |
| Test Accuracy (%)       | 0.20       |
| Validation Accuracy (%) | 11.55      |
| Epochs trained          | 4          |
| Training time (min)     | 6m 16s     |

As expected, the model has learnt incredibly poorly. Every class
contains images from 9 other classes, making learning a model and
classifying an impossible task. Even with a powerful model like a CNN,
without quality data the model will not be able to learn any meaningful
representation and will perform poorly and this task illustrates that.

The validation accuracy hovers around 10%, i.e. the expected performance
when choosing randomly among 10 classes, indicating that the network
cannot really distinguish between them.

Finally, the training is repeated, but this time, a pretrained model is
used.

In [None]:

MODEL_NAME = '5conv1fc_mnist_wrong_pretrained'
LOAD_MODEL_NAME = '5conv1fc_numbers'
data_module = MNISTWrongModule(BATCH_SIZE)
INPUT_DIM = torch.tensor([1, 28, 28])
model = NumbersAndLettersCNN(INPUT_DIM, 10, ['0','1','2','3','4',
                                                '5','6','7','8','9'], NUMBERS_ONLY)
# Log metrics to WandB
wandb_logger = pl.loggers.WandbLogger(save_dir='logs/',
                                        name=MODEL_NAME,
                                        project='midas-task-2')
early_stopping = EarlyStopping(
    monitor='val_loss',
)

# Load pretrained model
model.load_state_dict(torch.load(os.path.join(SAVE_PATH, LOAD_MODEL_NAME),
                                     map_location=torch.device('cuda')))

trainer = pl.Trainer(gpus=1, logger=wandb_logger,
                     callbacks=[early_stopping])
trainer.fit(model, data_module)
trainer.test(model=model, datamodule=data_module)

# Save model
torch.save(model.state_dict(), os.path.join(SAVE_PATH, MODEL_NAME))


****Summary****

|                         |            |
|-------------------------|------------|
| **Metric**              | **Result** |
| Test Accuracy (%)       | 0.19       |
| Validation Accuracy (%) | 11.03      |
| Epochs trained          | 4          |
| Training time (min)     | 6m 57s     |

Pretraining the model did not have any significant effect on the
accuracy of the model and it still hovers below 1%. As before, the
validation accuracy hovers around 10%, indicating that the model isn't
learning well.

In summary, because there is such a large amount of wrongly labeled
data, the effects of the pretraining are wiped away quickly and the
accuracy descends to initial levels.