> Image classification

Image classification is a very important application of machine learning: it’s what allows computers to “see.” From recognizing handwritten digits to identifying animals in photos, the goal is to teach a model to assign labels to images based on what it learns from examples. In this notebook, you’ll build and test a simple image classifier using a small dataset, exploring how machines can learn to tell apart two different kinds of objects (e.g., muffins and dogs!!).

# Setup

All the required `import`s come here

In [None]:
import pathlib
import zipfile

import fastai
from fastai.vision.all import *
from fastai.torch_core import set_seed

import torch

In principle, you could run the notebook either in *Colab* or locally. Is the notebook running in *Colab*?

In [None]:
try:
    import google.colab
    running_in_colab = True
except ImportError:
    running_in_colab = False

running_in_colab

If not running in *Colab*, you might want to choose a GPU if several are available. Ignore, if you are running in *Colab*

In [None]:
if not running_in_colab:

    import os
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"

GPU acceleration available?

In [None]:
torch.cuda.is_available()

# Data

We'll pull the [Muffin vs chihuahua](https://www.kaggle.com/datasets/samuelcortinhas/muffin-vs-chihuahua-image-classification/data) dataset from [Kaggle](https://www.kaggle.com/https://www.kaggle.com/). Let us download the corresponding *.zip* file and uncompress it.

In [None]:
# data will live inside this inside directory
data_dir = pathlib.Path('muffin_chihuahua')

# it it doesn't exist (from a previous run)...
if not data_dir.exists():

    # ...it is created
    data_dir.mkdir(exist_ok=True)

    # data is downloaded as a zip file, that will be named
    zip_file = data_dir / 'muffin-vs-chihuahua-image-classification.zip'

    # actual download
    !curl -L -o {zip_file} https://www.kaggle.com/api/v1/datasets/download/samuelcortinhas/muffin-vs-chihuahua-image-classification

    # data is unzipped inside the data directory
    with zipfile.ZipFile(zip_file, 'r') as zf:
        zf.extractall(data_dir)

What's in the directory?

In [None]:
list(data_dir.iterdir())

Hence, besides the *.zip* we have just downloaded, we have the usual *train* and *test* sets, and inside each one

In [None]:
for e in ['train', 'test']:
    print(f"{e}:")
    for c in ['chihuahua', 'muffin']:
        n = len(list((data_dir / e / c).iterdir()))
        print(f"  {c}: {n}")    

We'll focus on the training data only

In [None]:
data_root = data_dir / 'train'

## Data loading

We'll make use of two well-known deep learning libraries: [fastai](https://github.com/fastai/fastai) and [PyTorch](https://pytorch.org/).

We'll let *fastai* read images from the class folders and create a train/validation split automatically. What the code below does is essentially:

- Define the transformation(s) to be applied to every individual *item* (i.e., image). We'll resize every image to a manageable size (e.g., 256×256).

- Define the transformation(s) to be applied to every *batch*. These include, crucially, *augmentations* (check the [docs](https://docs.fast.ai/vision.augment.html#aug_transforms) for what's possible).

- Instantiate a `DataBlock` object, which is a "blueprint" specifying how to access the dataset.

- Use the `DataBlock` to obtain the actual `DataLoader`s (*PyTorch* objects intended to loop through the whole dataset one batch at a time), one for the *training* set, and one for the *validation* set.

Notice two *resize* operations are applied: one in `item_tfms` and another in `batch_tfms`. Never mind the details, but the latter is the one actually determining the size of the images the model will be trained on, and the former is just a *presizing* step that helps in avoiding artifacts in the transformed images (no need, but full story can be found in the [fastai book](https://nbviewer.org/github/fastai/fastbook/blob/master/05_pet_breeds.ipynb#Presizing)).

In [None]:
batch_size = 128

# a transformation to apply to every item
item_tfms = [Resize(256)]

# a `list` of transformations to apply to every *batch*
batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]

dblock = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    get_y=parent_label,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),  # TODO: try a different seed
    item_tfms=item_tfms,
    batch_tfms=batch_tfms
)

dls = dblock.dataloaders(data_root, bs=batch_size)
dls.show_batch(max_n=12, figsize=(8,8))

Among the *batch* transformations (those passed through `batch_tfms`) we have the function `aug_transforms`. This performs [data augmentation](https://en.wikipedia.org/wiki/Data_augmentation). The idea is: a picture of a dog is still a picture of a dog if you rotate it a little bit...or warp it a little bit...or crop it a little bit (e.g., you keep only the head). Then, each of these *versions* of the same picture can be used in training the model, and we are *artificially* generating (making up) training data (actually, an infinite amount!!...since the amount of rotation/warping/whatever will be selected at random at every iteration of the training procedure). *Data augmentation* is a very common trick in computer vision because it's very useful (and computationally cheap).

<font color='red'>TO-DO</font>: Notice there is an extra (besides that implied by the use of *data augmentation*) source of *randomness* here: the `seed` passed to `RandomSplitter`. What's that for?

# Model training

We will leverage [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning) and exploit a pre-trained model such as *mobilenet_v3_small*. More on this in future courses...but *transfer learning* allows us to reuse a model trained for some task onto a different (related) task (in this case, we are leveraging a model trained on classifying images in [imagenet](https://www.image-net.org/)...which has a 1,000 categories).

The *metric* we are interested in is *accuracy*.

In [None]:
learn = vision_learner(dls, mobilenet_v3_small, metrics=accuracy)

Let us train the model for a few *epochs*. Every *epoch* loops through the entire dataset.

In [None]:
learn.fine_tune(
    epochs=3,
    base_lr=3e-3
)

<font color='red'>TO-DO</font>: What percentage of the time does the model get it wrong?

<font color='red'>TO-DO</font>: What happens if you train again (by, essentially, running all over the last couple of cells)? Do you get the same results? Why not? (*Hint*: how are the parameters of the model initialized every time you call `vision_learner`?)

# Results

Let us show some predictions

In [None]:
learn.show_results(max_n=9, figsize=(8,8))

Even though the performance is really good, it's interesting to see where the model struggles.
Let's look at the [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) to see whether the model makes more mistakes when dealing with a certain class.

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(4,4), dpi=120)

<font color='red'>TO-DO</font>: At the sight of this confusion matrix, does the performance of the model depends on whether the input is from one class or the other?

Let us look at some of the images the model had the most troubles with.

In [None]:
interp.plot_top_losses(4, nrows=2)

<font color='red'>TO-DO</font>: Could you guess the class on your own?

# The model on you own images

Pick a photo (muffin, chihuahua...or something else) and see what the model predicts. **The GUI requires *Colab***.

In [None]:
if running_in_colab:

    try:
        from google.colab import files
        up = files.upload()
        for fn in up:
            img = PILImage.create(fn)
            pred,pred_idx,probs = learn.predict(img)
            print(f'{fn} -> {pred}; probs={probs.tolist()}')
            display(img.to_thumb(256,256))
    except Exception as e:
        print('Local environment or no file uploaded:', e)

else:

    print("The GUI requires Colab.")

The probability you get is that of the first label, i.e., that of a *chihuahua*.

<font color='red'>TO-DO</font>: What happens if you give the model something that is nor a muffin nor a chihuahua?

# Hyperparameter tuning

## Augmentation

Let us try and play around with *data augmentation*...

<font color='red'>TO-DO</font>: Explore other types of augmentation by passing different parameters to function `aug_transforms`

In [None]:
# aug = aug_transforms(...

We need to make another (different) `DataBlock`, which can be built by modifying the old one. Updated `DataLoaders` are obtained from it.

In [None]:
aug_dblock = dblock.new(item_tfms=item_tfms,batch_tfms=[*aug, Normalize.from_stats(*imagenet_stats)])
aug_dls = aug_dblock.dataloaders(data_root, bs=batch_size)

Let us visualize a batch with the new *augmentations* to see how the "new" data looks like...

In [None]:
aug_dls.show_batch(max_n=12, figsize=(8,8))

...before doing the training

In [None]:
aug_learn = vision_learner(aug_dls, mobilenet_v3_small, metrics=accuracy)
aug_learn.fine_tune(epochs=2, base_lr=3e-3)  # quick test
print('Accuracy:', aug_learn.validate()[1])

<font color='red'>TO-DO</font>: Do you get better results?

## Different architecture

<font color='red'>TO-DO</font>: Try a *larger* model, such as one from the *resnet* family (e.g., `resnet50`). You can stick with the original `DataBlock`. Is it worth it, considering the increase in training time?

It should be possible to obtain an icomplete list of available models by using
```
import timm
timm.list_models()
```

# Sample questions

## What is the main goal of the image classification model described here?
- [ ] To make image files smaller so they fit on disk
- [ ] To decide which of two labels best matches a picture (for example, which kind of object it shows)
- [ ] To turn all color pictures into black-and-white
- [ ] To draw new pictures from scratch

## Why are the images split into a training set and a validation set?
- [ ] So the model can be trained on one part and then checked on images it has not seen
- [ ] So that each image gets two different labels
- [ ] So that half of the images can be safely deleted
- [ ] So that the images can be sorted by file name