# Module 8 - Using plankton ResNets across domains

How well do classifier that have been trained on other plankton data work on new stuff? Different regions of taxanomic tree? Different instruments?

This module will be about playing around with what you already learned.

Eric and Martin already trained a model on the ZooScan dataset before the workshop and saved it using [`torch.save`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-entire-model). Load it and have a look at the structure of the model:

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from utilities.custom_torch_utils import ImageFolderWithPaths
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from utilities.display_utils import make_confmat

MODEL_FN = "zooscan.pth"

model = torch.load(MODEL_FN)

print(model)

Strip away the classifier layer (*fc*) to receive a feature extractor (like we did in [Module 5](mod5_resnet_feature_extractor.ipynb)), activate [evaluation mode](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.eval) and [move the model to the GPU](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.cuda).

In [None]:
feature_extractor = ...

Let's apply the feature extractor (that was trained on ZooScan data) to the SPC data.

First, load the SPC dataset.
You can use the regular `ImageFolder` or the custom `ImageFolderWithPaths`.
Keep in mind, that you will need to apply some *transformations* to fit the image data to the network.

In [None]:
dataset = ...

Then PyTorch needs a `DataLoader` that prepares the batches that can be send through the GPU.

In [None]:
loader = DataLoader(...)

Now we are ready to send the data from the loader through the network. Keep in mind that (besides the features themselves) you will need to store the label of each image. 

In [None]:
# We will collect the calculated features in this list
features = []

# You may need more lists like these for possible other data

# We don't need to calculate gradients
with torch.no_grad():
    # Show a nice progress bar
    with tqdm_notebook(loader, desc="Evaluating") as loader:
        for ... in loader:
            # Copy the input data to GPU
            ...

            # Apply the feature_extractor to the input data an
            batch_features = feature_extractor(...)
            
            # Copy the batch_features back to the cpu and convert to a numpy array
            batch_features = batch_features.cpu().numpy()
            
            # Do the same for possible other data (labels, paths, ...)
            ...
            
            # Append the batch data to the list
            features.extend(batch_features)
            
            # Do the same for possible other data
            ...
        
# Convert the collected values to numpy arrays
features = np.array(features)

# Do the same for possible other data
...

print("Shape of features (N_images, N_features):", features.shape)

[`sklearn.model_selection.train_test_split`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) is another way of splitting data into distinct sets. Use it to split the features (and possible other data):

In [None]:
features_train, features_test, ... = train_test_split(features, ...)

print("Training features shape:", features_train.shape)
print("Testing features shape:", features_test.shape)

Instanciate a [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) and train it on the training features. Decide on reasonable parameters.

In [None]:
rf = RandomForestClassifier(...)
...

Now, evaluate the Random Forest Classifier. Look at [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html), [precision and recall](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) and the confusion matrix.

In [None]:
predictions = ...

acc = ...

# show a confusion matrix
make_confmat(..., predictions, acc)

## Excercises

- What happens, if you also strip away the AdaptiveAvgPool2d layer (*avgpool*)? Play around with the numbers of layers that are retained in the feature extractor.
- Try other classifiers (e.g. [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)).