# Classifier Diagnostics

Task: plot a confusion matrix, find images that were misclassified

## Setup

You do not need to read or modify the code in this section to successfully complete this assignment.

In [2]:
# Import fastai code.
from fastai.vision.all import *

# Set a seed for reproducibility.
set_seed(12345, reproducible=True)

Monkey-patch `plot_top_losses` because of a bug.

In [17]:
def _plot_top_losses(self, k, largest=True, **kwargs):
    losses,idx = self.top_losses(k, largest)
    if not isinstance(self.inputs, tuple): self.inputs = (self.inputs,)
    if isinstance(self.inputs[0], Tensor): inps = tuple(o[idx] for o in self.inputs)
    else: inps = self.dl.create_batch(self.dl.before_batch([tuple(o[i] for o in self.inputs) for i in idx]))
    b = inps + tuple(o[idx] for o in (self.targs if is_listy(self.targs) else (self.targs,)))
    x,y,its = self.dl._pre_show_batch(b, max_n=k)
    b_out = inps + tuple(o[idx] for o in (self.decoded if is_listy(self.decoded) else (self.decoded,)))
    x1,y1,outs = self.dl._pre_show_batch(b_out, max_n=k)
    if its is not None:
        plot_top_losses(x, y, its, outs.itemgot(slice(len(inps), None)), self.preds[idx], losses,  **kwargs)
ClassificationInterpretation.plot_top_losses = _plot_top_losses

### Set up the dataset

In [6]:
path = untar_data(URLs.PETS)/'images'

In [7]:
image_files = get_image_files(path).sorted()

In [8]:
# Cat images have filenames that start with a capital letter.
def is_cat(filename):
    return filename[0].isupper()

### Deliberately mislabel some of the images

In [None]:
FLIP_PROB = 0.1
correct_labels = [is_cat(path.name) for path in image_files]
scrambled_labels = [
    not correct_label if random.random() < FLIP_PROB else correct_label
    for correct_label in correct_labels]

Check how many labels are still correct.

In [12]:
sum(correct_label == scrambled_label for correct_label, scrambled_label in zip(correct_labels, scrambled_labels)) / len(correct_labels)

0.8972936400541272

### Train the classifier on the scrambled labels

In [15]:
dataloaders = ImageDataLoaders.from_lists(
    path=path, fnames=image_files, labels=scrambled_labels,
    valid_pct=0.2,
    seed=42,
    item_tfms=RandomResizedCrop(224)
)
true_dataloaders = ImageDataLoaders.from_lists(
    path=path, fnames=image_files, labels=correct_labels,
    valid_pct=0.2,
    seed=42,
    item_tfms=RandomResizedCrop(224)
)

In [16]:
learn = cnn_learner(
    dls=dataloaders,
    arch=resnet34,
    metrics=error_rate
)
learn.fine_tune(epochs=1) # << TODO: more, to learn the mislabeled training examples.

epoch,train_loss,valid_loss,error_rate,time


KeyboardInterrupt: 

## Task

We've given you a classifier that was trained on a 
Starting with the classifier trained above:

1. Show one batch from each of the training and validation sets. (`dataloaders.train.show_batch()`)
2. Compute the accuracy of this classifier on the validation set (`accuracy(interp.preds, interp.targs)`)
3. Plot the confusion matrix on the validation set. (see chapter 2)
4. Compute the accuracy on the *training* set. (`interp = ClassificationInterpretation.from_learner(learn, dl=dataloaders.train)`)
5. Find some images in the validation set that were misclassified by plotting the top losses.
6. Same, now on the training set.

## Solution

In [2]:
# Your code here

## Analysis

**How many dogs were misclassified as cats? Vice versa?**

**For the following scenarios, specify whether the loss would be *low* or *high*:**

- Classifier identified the image correctly, and the dataset had 

**If we had only looked at the accuracy on the training set, how mistaken would we have been?**