# `000-train-basic-classifier`

Task: fine-tune a ResNet classifier on the Oxford pets dataset.

## Setup

In [23]:
# setup fastai if needed
try: import fastbook
except ImportError: import subprocess; subprocess.run(['pip','install','-Uq','fastbook'])

# Import fastai code.
from fastai.vision.all import *
from fastbook import widgets

# Set a seed for reproducibility.
set_seed(12345, reproducible=True)

In [2]:
# on macOS CPU I need (https://stackoverflow.com/a/64855500/69707)
import os; os.environ['OMP_NUM_THREADS']='1'

## Task

Train a classifier to distinguish between images of cats and dogs.

* Use the [Oxford-IIIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/).
* Fine-tune a 34-layer ResNet model for 1 epoch
* Report the error rate on a held-out validation set of 20% of the data.

The first code block from chapter 1 accomplishes this task. Retype or copy-paste it here, but add some comments if you haven't already.

## Solution

In [3]:
# Get the data
path = untar_data(URLs.PETS) / "images"

In [4]:
# Cat images have filenames that start with a capital letter.
def is_cat(filename):
    return filename[0].isupper()

image_files = get_image_files(path)

# Construct the data loader
dataloaders = ImageDataLoaders.from_name_func(
    path=path,
    fnames=image_files,
    # Use a 20% validation split, with a seed of 42 for reproducibiliy
    valid_pct=0.2,
    seed=42,
    # label files using their name
    label_func=is_cat,
    # Resize all images to 224x224 pixels
    item_tfms=Resize(224)
)

In [5]:
# Construct a model by starting with the ResNet34 pretrained model.
learner = cnn_learner(dls=dataloaders, arch=resnet34, metrics=error_rate)

In [6]:
# Fine-tune the model for 1 epoch.
learner.fine_tune(epochs=1)

epoch,train_loss,valid_loss,error_rate,time


KeyboardInterrupt: 

## Analysis

**How many images were in the training set? Validation set?**

In [7]:
len(image_files)

7390

You can either do the multiplication here (7390 * 0.2), or:

In [9]:
print("Training set has", dataloaders.train.n, "images")
print("Valid set has", dataloaders.valid.n, "images")

Training set has 5912 images
Valid set has 1478 images


**How many dogs were there in the dataset? How many cats?**

Note: `plot_confusion_matrix` only runs on the validation set. So it's misleading here.

If you know about `collections.Counter`, you can do:

In [10]:
Counter(is_cat(path.name) for path in image_files)

Counter({True: 2400, False: 4990})

Or you can use the sum-as-count pattern:

In [11]:
sum(is_cat(path.name) for path in image_files)

2400

Or you can write an accumulator, 108-style:

In [13]:
num_cats = 0
for path in image_files:
    if is_cat(path.name):
        num_cats += 1
num_cats

2400

You can use a `fastcore` shortcut for all of the above.

In [19]:
Counter(image_files.attrgot('name').map(is_cat))

Counter({True: 2400, False: 4990})

**About how many of those images were classified correctly?**

This will depend on the error rate you got above, which measures the error on the *validation set*. If you got 0.004060:

In [24]:
round(1478 * (1 - 0.004060))

1472

## Extension

**Test your classifier on a new image of a dog or a cat. How well does it do?**

In [22]:
uploader = widgets.FileUpload()

NameError: name 'widgets' is not defined