Assignment 7: Transfer Learning
===============================


Microsoft Forms Document: https://forms.office.com/r/MvPiCwh6jR


Here, we use parts of the Fruits and Vegetables dataset that can be downloaded from Kaggle: https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition

For this small example, I have subselected some images, which are available here: https://seafile.ifi.uzh.ch/f/72e1d9c4ef20420eb1d9/?dl=1

First, we need to download and extract all our data.

In [None]:
import os
dataset_zip_file = "fruits.zip"
if not os.path.exists(dataset_zip_file):
  import urllib.request
  urllib.request.urlretrieve("https://seafile.ifi.uzh.ch/f/72e1d9c4ef20420eb1d9/?dl=1", dataset_zip_file)
  print ("Downloaded datafile", dataset_zip_file)
  import zipfile
  zipfile.ZipFile(dataset_zip_file).extractall()

Task 1: Data Transformation
---------------------------

We need to instantiate a proper `torchvision.transform` instance to create the same input structure as used for training our network.
We need to combine 4 transforms, which can be compiled from the PyTorch website: https://pytorch.org/vision/stable/models.html

1. We need to resize the image such that the shorter side has size 256.
2. We need to take the center crop of size $224\times224$ from the image.
3. We need to convert the image into a tensor (including pixel values scaling)
4. We need to normalize the pixel values with mean $(0.485, 0.456, 0.406)$ and standard deviation $(0.229, 0.224, 0.225)

Since we will use networks pre-trained on ImageNet, we need to perform the exact same transform as used for ImageNet testing.

In [None]:
import torch
import torchvision

imagenet_transform = ...

Task 2: Dataset Loading
-----------------------

We here use the `torchvision.datasets.ImageFolder` dataset interface for processing images. 
You can use its documented `is_valid_file` parameter to distinguish between training and test set.
The training files are all called `gallery.jpg` while test files are called `probe.jpg`.

Create two datasets, one for the training set, one for the test set. Use the transform defined above.

In [None]:
trainset = torchvision.datasets.ImageFolder(
  root = "fruits",
  ...
)

testset = torchvision.datasets.ImageFolder(
  root = "fruits",
  ...
)

Test 1: Data Size and Types
---------------------------

Check that all datasets contain the same number of images as classes.
Check that all input images are `torch.tensor`s of size $3\times224\times224$ and of type `torch.float`.


In [None]:
assert(len(trainset) == len(testset))
...

Task 3: Pre-trained Network
---------------------------

Instantiate a pre-trained network of type ResNet-18. 
Modify the network such that we extract the deep features from before the last fully-connected layer of the network.
For your reference, the implementation of the `forward` function of ResNet networks (including ResNet-18) can be found here: https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L264

You can also check if other networks perform better, especially deeper ResNet topologies.
Be aware that our strategy to remove the last fully-connected layer might not work in other network topologies, only in residual networks.

Please Note: while we modify the `forward` function, we will still use the `__call__` function to extract our features.

In [None]:
# instantiate pre-trained resnet 18 network
network = ...

# make sure that deep features can be etxracted from the network
...

Task 4: Extract Features
------------------------

Implement a function that extracts all features for a given dataset.
Store the results in a dictionary: `target : feature`.
Extract the features for the training and the test set.

In [None]:
def extract(network, dataset):
  features = {}
  ...
  return features

train_features = ...
test_features = ...

Test 2: Check your Features
---------------------------

Check that all your features are of dimension 512 and of datatype `torch.float` (larger ResNet topologies might have 2048-dimensional features).

In [None]:
# check features
...

Task 5: Similarity Computation
------------------------------

Iterate over all samples in the test set.
Compute the cosine similarities to all samples in the training set.
Store the similarity values in a matrix.

In [None]:
O = ...
similarities = torch.empty((O, O))

# compute similarities
...

Task 6: Plot Similarity Values
------------------------------

Plot the similarity matrix as an image.

In [None]:
from matplotlib import pyplot

# plot similarities
...

Task 7: Classification Accuracy
-------------------------------

Compute the classification accuracy by checking if the class of highest similarity for a test sample is the correct class.

In [None]:
# compute accuracy for our small test set
accuracy = ...
print("Accuracy is", accuracy)

Task 8: Find Misclassified Images and Classes
----------------------------------------------

Find the test samples that are incorrectly classified. 
Get the class names (not only indexes) and write the names of the test sample class as well as the class that it was classified as.

What are the two most dissimilar classes?

In [None]:
classnames = ...

# find all misclassified test images and print their real and predicted class name
...

# find the pair of most dissimilar training and test class and print their names
...