<a href="https://colab.research.google.com/github/majsylw/detect-waste-workshop/blob/main/Detecting_trash_in_a_wild_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p><img alt="Colaboratory logo" height="45px" src="https://colab.research.google.com/img/colab_favicon.ico" align="left" hspace="10px" vspace="0px"></p>
Author: Sylwia Majchrowska


<h1>Welcome to the workshop notebook "Detecting trash in a wild"!</h1>

<img src="https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/graphic.jpg" alt="logo" width="700"/>

This workshop gives an overview of AI applications in waste detection and introduces a pipeline for waste classification and detection based on the TACO dataset. It explores strategies for improving model accuracy, such as data augmentation and various learning approaches, and showcases successful waste management projects in diverse settings.

# Notebook handling

## Data Access

To avoid installing all libraries and dependencies, you can use [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) - a Python workspace in the cloud. To do this, you need to move all data (as well as this notebook) to your Google Drive.

You can access files on Google Drive by connecting (mapping) Google Drive in the virtual machine of the execution environment (notebook). To do this, execute the two code cells below.

**NOTE:** Before executing the script below, make sure that you have uploaded the necessary data to your Google Drive and edited the access paths.

**NOTE:** If you prefer to work on your own device, you need to install (or make sure you have installed) a [Python interpreter](https://docs.anaconda.com/anaconda/install/windows/) and the modules used in this notebook - you can find them by looking at all the instructions with the keyword *import*.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Clone the TACO repo
!git clone https://github.com/pedropro/TACO

# Move the folder that contains the JSON annotations to pwd
#!cp -r TACO/data .
# Download the images
#!python TACO/download.py

# For simplicity we will use provided data at google drive
HOME_FOLDER = '/content/drive/MyDrive/detect-waste-workshop/'
TACO_DATA_FOLDER = HOME_FOLDER + 'TACO/'
MODELS_FOLDER = HOME_FOLDER + 'models/'
CLS_FOLDER = HOME_FOLDER + 'images_square/'

## Python libraries

**Useful imports** (select the below cell and press shift-enter to execute it)

In [None]:
# import libraries

import matplotlib.pyplot as plt  # Graphic library for drawing charts
# Special command for Jupyter notebooks to display images between cells, not in a new window
%matplotlib inline

import torch                                            # PyTorch - deep learning library
from torchvision import datasets, models, transforms    # PyTorch extension for data management
import torch.nn as nn                                   # Special PyTorch module for managing neural networks
from torch.nn import functional as F                    # Special functions
import torch.optim as optim                             # A set of algorithms that updates weights

import requests

import matplotlib.pyplot as plt
import numpy as np

import ipywidgets as widgets
from IPython.display import display, clear_output

from PIL import Image

import torch
from torchvision.models import resnet50
import torchvision.transforms as T
torch.set_grad_enabled(False);

import sklearn


# Paper or Plastics? Classifying Objects!

We will create a neural network here, train it, and then check how well it handles completely new data for it. In order to define a neural network, we need to give it a well-defined task - the task will be the classification of images, more specifically, we will be recognizing the type of waste present in the pictures.

## Detect-waste categories

The [Detect-waste](https://detectwaste.netlify.app/) dataset is inspired by the waste segregation principles in Gdańsk, Poland, and it categorizes waste according to these principles, aiming to align with Polish recycling standards. The dataset proposes seven well-defined categories for sorting litter, which are reflective of the broader recycling categories recognized in Poland. These categories are:

- **Bio**: food waste such as fruit, vegetables, herbs;
- **Glass**: glass objects such as glass bottles, jars, or broken glass;
- **Metal and Plastic**: metal and plastic rubbish such as beverage cans, beverage bottles, plastic shards, plastic food packaging, or plastic straws;
- **Non-recyclable**: non-recyclable rubbish such as disposable diapers, pieces of string, polystyrene packaging, polystyrene elements, blankets, clothing, or used paper cups;
- **Other**: construction and demolition, large-size waste (e.g. tires), used electronics and household appliances, batteries, paint and varnish cans, or expired medicines;
- **Paper**: paper items such as receipts, food packaging, newspapers, or cartons;
- **Unknown**: hard to recognize, obscured objects.

These categories are based on the recycling rules in Gdańsk, Poland, and aim to provide a comprehensive framework for waste classification that can be used for automatic waste detection and sorting.


<img src="https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/dw-cat.jpg" alt="DW-cat"/>

## First Steps with PyTorch - Building a Neural Network

Generally, a neural network looks like the image below. The input to the network is on the left side, and the output is on the right. The number of output neurons corresponds to the number of classes.

<img src="https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/what_is_nn_slide.jpg" alt="net"/>

So let's define a similar architecture for our classifier:

In [None]:
# Extends PyTorch's neural network baseclass
class MyNet(nn.Module):
    """
    A very basic neural network.
    """
    def __init__(self, input_dim=(3, 224, 224)):
        """
        Constructs a neural network.

        input_dim: a tuple that represents "channel x height x width" dimensions of the input
        """
        super().__init__()
        # the total number of RGB pixels in an image is the tensor's volume
        num_in_features = input_dim[0] * input_dim[1] * input_dim[2]
        # input layer
        self.layer_0 = nn.Linear(num_in_features, 128)
        # hidden layers
        self.layer_1 = nn.Linear(128, 64)
        self.layer_2= nn.Linear(64, 32)
        # output layer, output size of 2
        self.layer_3= nn.Linear(32, 7)

    def forward(self, x):
        """
        Define the forward pass through our network.
        """
        batch_size = x.shape[0]
        # convert our RGB tensor into one long vector
        x = x.view(batch_size, -1)

        # pass through our layers
        x = F.relu(self.layer_0(x))
        x = F.relu(self.layer_1(x))
        x = F.relu(self.layer_2(x))
        x = F.relu(self.layer_3(x))

        # convert the raw output to probability predictions
        x = F.softmax(x, dim=1)

        return x

In [None]:
 # cuda:0 means the first cuda device found
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MyNet().to(device)  # load our simple neural network
model

Essentially, our network looks like this:

![simple-network](https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/architecture.jpg)

## Data and Data Loading

We should make two separate subsets to train and test our model. That way, we know our model learns more than rote memorization.


### Data inspection
Let's look in our data folder to see what's there.

In [None]:
import os  # interact with the os. in our case, we want to view the file system

class_dict = {}

print("Data contents:", os.listdir(CLS_FOLDER))
for cl in os.listdir(CLS_FOLDER):
  class_dict[cl] = len(os.listdir(CLS_FOLDER + cl))
  print(f"{cl} contents: {len(os.listdir(CLS_FOLDER + cl))} images")

In [None]:
# Let's also look at some of the images

from PIL import Image  # import our image opening tool

_, ax = plt.subplots(2, 4, figsize=(15,15))  # to show 4 images side by side, make a "1 row x 4 column" axes
ax[0, 0].set_title("metals_and_plastics")
ax[0, 0].imshow(Image.open(CLS_FOLDER + "/metals_and_plastics/34.jpg"))  # show the metals_and_plastics in the first column

ax[0, 1].set_title("other")
ax[0, 1].imshow(Image.open(CLS_FOLDER + "/other/206207.jpg"))            # show the other in the second column

ax[0, 2].set_title("non-recyclable")
ax[0, 2].imshow(Image.open(CLS_FOLDER + "/non-recyclable/12.jpg"))       # show the non-recyclable in the third column

ax[0, 3].set_title("bio")
ax[0, 3].imshow(Image.open(CLS_FOLDER + "/bio/162163.jpg"))              # show the muffin in the fourth column

ax[1, 0].set_title("glass")
ax[1, 0].imshow(Image.open(CLS_FOLDER + "/glass/01.jpg"))                # show the glass in the first column

ax[1, 1].set_title("paper")
ax[1, 1].imshow(Image.open(CLS_FOLDER + "/paper/23.jpg"))                # show the paper in the second column

ax[1, 2].set_title("unknown")
ax[1, 2].imshow(Image.open(CLS_FOLDER + "/unknown/204205.jpg"))          # show the unknown in the third column

ax[1, 3].set_axis_off()


Our dataset is highly imbalanced, for the puprose of this workshop let's create 2 classes: `metals_and_plastics` and `non_metals_and_plastics`

In [None]:
"""
import os
import shutil

CLS_FOLDER2 = HOME_FOLDER + 'images_square2/'

if not os.path.exists(CLS_FOLDER2):
  os.mkdir(CLS_FOLDER2)
if not os.path.exists(os.path.join(CLS_FOLDER2, 'metals_and_plastics')):
  os.mkdir(os.path.join(CLS_FOLDER2, 'metals_and_plastics'))
if not os.path.exists(os.path.join(CLS_FOLDER2, 'non_metals_and_plastics')):
  os.mkdir(os.path.join(CLS_FOLDER2, 'non_metals_and_plastics'))

class_dict = {}
class_dict['non_metals_and_plastics'] = 0

print("Data contents:", os.listdir(CLS_FOLDER))
for cl in os.listdir(CLS_FOLDER):
  if cl == 'metals_and_plastics':
    class_dict[cl] = len(os.listdir(CLS_FOLDER + cl))
    for f in (os.listdir(CLS_FOLDER + cl)):
      shutil.copyfile(os.path.join(CLS_FOLDER, 'metals_and_plastics', f),
                      os.path.join(CLS_FOLDER2, 'metals_and_plastics', f))
  else:
    class_dict['non_metals_and_plastics'] += len(os.listdir(CLS_FOLDER + cl))
    for f in (os.listdir(CLS_FOLDER + cl)):
      shutil.copyfile(os.path.join(CLS_FOLDER, cl, f),
                      os.path.join(CLS_FOLDER2, 'non_metals_and_plastics', f))

print(f"metals_and_plastics contents: {class_dict['metals_and_plastics']} images")
print(f"non_metals_and_plastics contents: {class_dict['non_metals_and_plastics']} images")
"""

### Load our data
That's great that we have data! But we have to load all the images and convert them into a form that our neural network understands. Specifically, PyTorch works with Tensor objects. (A tensor is just a multidimensional matrix, i.e. an N-d array.)

![image-tensors](https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/image_to_tensor.jpg)

To easily convert our image data into tensors, we use the help of a "dataloader." The dataloader packages data into convenient boxes for our model to use. You can think of it like one person passing boxes (tensors) to another.

![dataloader](https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/dataloader_box_analogy.jpg)

First, we define some "transforms" to convert images to tensors. We must do so for both our train and validation datasets.

For more information about transforms, check out the link here: https://pytorch.org/docs/stable/torchvision/transforms.html

In [None]:
img_size = 224

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

# transforms for our training data
train_transforms = transforms.Compose([
    # resize to resnet input size
    transforms.Resize((img_size, img_size)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.5),
    # transform image to PyTorch tensor object
    transforms.ToTensor(),
    normalize
])

# these validation transforms are exactly the same as our train transforms
validation_transforms = transforms.Compose([
    transforms.Resize((img_size, img_size)),
    transforms.ToTensor(),
    normalize
])

print("Train transforms:", train_transforms)

Second, we create the datasets, by passing the transforms to the ImageFolder constructor.

Our classification problem exhibits a large imbalance in the distribution of the target classes: for instance there is several times more non-recyclable than bio samples. In such cases it is recommended to use stratified sampling to ensure that relative class frequencies is approximately preserved in each train and validation fold.

In [None]:
import sklearn.model_selection

img_dataset = datasets.ImageFolder(root=CLS_FOLDER,
                                   transform=train_transforms)

validation_dataset = datasets.ImageFolder(root=CLS_FOLDER,
                                          transform=validation_transforms)

print(class_dict)
class_array = []
for i, k in enumerate(class_dict):
  class_array += class_dict[k] * [i]

train_idx, test_idx = sklearn.model_selection.train_test_split(
    np.arange(len(class_array)), test_size=0.2, shuffle=True, stratify=class_array)

In [None]:
image_datasets = {
    'train':
        torch.utils.data.Subset(img_dataset, train_idx),
    'validation':
        torch.utils.data.Subset(validation_dataset, test_idx)}

print("==Dataset==\n", image_datasets["train"].dataset)

And finally, form dataloaders from the datasets:

In [None]:
dataloaders = {
    'train':
        torch.utils.data.DataLoader(
            image_datasets['train'],
            batch_size=8,
            shuffle=True,
            num_workers=4),
    'validation':
        torch.utils.data.DataLoader(
            image_datasets['validation'],
            batch_size=8,
            shuffle=False,
            num_workers=4)}

print("Train loader:", dataloaders["train"])
print("Validation loader:", dataloaders["validation"])

We can see a dataloader outputs 2 things: a BIG tensor to represent an image, and a vector to represent the labels (from 0 to 6).

In [None]:
next(iter(dataloaders["train"]))

## Train the model!
Hurray! We've built a neural network and have data to give it. Now we repeatedly iterate over the data to train the model.

Every time the network gets a new example, it looks something like this. Note the forward pass and the corresponding backward pass.

![backpropagation](https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/backpropagation.gif)

### Define the train loop
We want the network to learn from every example in our training dataset. However, the best performance comes from more practice. Therefore, we run through our dataset for multiple epochs.

After each epoch, we'll check how our model performs on the validation set to monitor its progress.

In [None]:
from tqdm import tnrange, tqdm_notebook # import progress bars to show train progress

def train_model(model, dataloaders, loss_function, optimizer, num_epochs):
    """
    Trains a model using the given loss function and optimizer, for a certain number of epochs.

    model: a PyTorch neural network
    loss_function: a mathematical function that compares predictions and labels to return an error
    num_epochs: the number of times to run through the full training dataset
    """
    epoch_loss_list = []
    epoch_acc_list = []

    # train for n epochs. an epoch is a full iteration through our dataset
    for epoch in tnrange(num_epochs, desc="Total progress", unit="epoch"):
        # print a header
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        print('----------------')

        # first train over the dataset and update weights; at the end, calculate our validation performance
        for phase in ['train', 'validation']:
            if phase == 'train':
                model.train()
            else:
                model.eval()

            # keep track of the overall loss and accuracy for this batch
            running_loss = 0.0
            running_corrects = 0

            # iterate through the inputs and labels in our dataloader
            # (the tqdm_notebook part is to display a progress bar)
            for inputs, labels in tqdm_notebook(dataloaders[phase], desc=phase, unit="batch", leave=False):
                # move inputs and labels to appropriate device (GPU or CPU)
                inputs = inputs.to(device)
                labels = labels.to(device)

                # FORWARD PASS
                outputs = model(inputs)
                # compute the error of the model's predictions
                loss = loss_function(outputs, labels)

                if phase == 'train':
                    # BACKWARD PASS
                    loss = torch.autograd.Variable(loss, requires_grad=True)
                    optimizer.zero_grad()  # clear the previous gradients
                    loss.backward()        # backpropagate the current error gradients
                    optimizer.step()       # update the weights (i.e. do the learning)

                # track our accumulated loss
                running_loss += loss.item() * inputs.size(0)
                # track number of correct to compute accuracy
                _, preds = torch.max(outputs, 1)
                running_corrects += torch.sum(preds == labels.data)

            # print our progress
            epoch_loss = running_loss / len(image_datasets[phase])
            epoch_acc = running_corrects.double() / len(image_datasets[phase])
            print(f'{phase} error: {epoch_loss:.4f}, Accuracy: {epoch_acc:.4f}')
            if phase == 'train':
              epoch_loss_list.append(epoch_loss)
            if phase == 'validation':
              epoch_acc_list.append(epoch_acc.item())

    return epoch_loss_list, epoch_acc_list

### Loss function and optimizer
One last thing: we must define a function that gives feedback for how well the model performs. This is the loss, or "error" function, that compares model predictions to the true labels.

Once we calculate the error, we also need to define how the model should react to that feedback. The optimizer determines how the network learns from feedback.

![gradients](https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/gradient_descent.gif)

In [None]:
loss_function = nn.CrossEntropyLoss()               # the most common error function in deep learning
optimizer = optim.SGD(model.parameters(), lr=1e-6)  # Stochastic Gradient Descent, with a learning rate of 0.1

### Run training
Let's put everything together and TRAIN OUR MODEL! =D

In [None]:
loss, acc = train_model(model, dataloaders, loss_function, optimizer, num_epochs=3)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16,5))
ax[0].plot(loss)
ax[0].set_title("Loss")
ax[1].plot(acc)
ax[1].set_title("Accuracy")

### Can You Do Better?
Now that we've shown you how to train a neural network, can you improve the validation accuracy by tweaking the parameters?

Some parameters to play with:

- Number of epochs
- The learning rate "lr" parameter in the optimizer
- The type of optimizer (https://pytorch.org/docs/stable/optim.html)
- Type of neural network (Number of layers and layer dimensions)
- Image size
- Data augmentation transforms (https://pytorch.org/vision/stable/transforms.html)

#### Transfer Learning using EfficientNet PyTorch

Now, we will carry out the transfer learning training. I propose to use EfficientNet=B0 model, which is a part of the EfficientNet models family. Models available in `torchvision` package have already been trained on the ImageNet dataset. The power of these pretrained models actually shines when we have a small dataset to train on. In such situations, training from scratch does not really help much. Transfer learning addresses these challenges effectively by leveraging knowledge gained from pre-trained models on large datasets. Here's how transfer learning proves useful for litter classification with the TACO dataset:

**Leveraging Pre-trained Models**
- Rapid Development: Starting with a model pre-trained on a large dataset (like ImageNet) allows for quicker convergence on the specific task of litter classification, reducing the time and computational resources required for training from scratch
- Improved Performance: Pre-trained models have learned rich feature representations that can be effectively transferred to the litter classification task, often resulting in improved accuracy and robustness compared to models trained from scratch, especially when the available training data is limited

**Addressing Data Scarcity and Diversity**
- Data Scarcity: While the TACO dataset is comprehensive, it may not cover the wide variety of litter types and conditions found in real-world scenarios. Transfer learning allows models to leverage learned features from broader contexts, making them more adaptable to new or unseen litter types.
- Variability and Complexity: Litter can vary greatly in appearance, and its detection is complicated by diverse backgrounds and conditions. Models pre-trained on large and varied datasets have encountered a wide range of objects and scenarios, equipping them with a better understanding of complex visual patterns

**Enhancing Generalization**
- Generalization to New Environments: The ability of transfer learning models to generalize from one task to another is particularly valuable for litter classification, where the model needs to perform well across different environments, such as urban areas, beaches, and natural landscapes
- Robustness to Overfitting: By fine-tuning pre-trained models on the TACO dataset, the risk of overfitting is reduced, as the model has already learned generalizable features. This is especially important when working with a limited amount of labeled data

**Facilitating Innovation**
- Innovation in Model Architecture: Transfer learning encourages experimentation with different model architectures. Researchers can explore how various pre-trained models perform on the TACO dataset, leading to insights that drive further innovation in litter detection technology
- Cross-Domain Applications: The success of transfer learning in litter classification can inspire similar approaches in related domains, such as recycling sorting or environmental monitoring, broadening the impact of machine learning solutions in sustainability efforts

In conclusion, transfer learning is invaluable for enhancing the performance, efficiency, and applicability of machine learning models for litter classification based on the TACO dataset. It leverages the strengths of pre-trained models to address the unique challenges of litter detection, making it a cornerstone technique in the development of advanced waste management solutions.

In [None]:
import torchvision.models as models
import torch.nn as nn

def build_model(pretrained=True, fine_tune=True, num_classes=7):
    if pretrained:
        print('[INFO]: Loading pre-trained weights')
    else:
        print('[INFO]: Not loading pre-trained weights')
    model = models.efficientnet_b0(pretrained=pretrained)
    if fine_tune:
        print('[INFO]: Fine-tuning all layers...')
        for params in model.parameters():
            params.requires_grad = True
    elif not fine_tune:
        print('[INFO]: Freezing hidden layers...')
        for params in model.parameters():
            params.requires_grad = False
    # Change the final classification head.
    model.classifier[1] = nn.Linear(in_features=1280, out_features=num_classes)
    return model

model = build_model(
    pretrained=True,
    fine_tune=True,
    num_classes=len(class_dict)).to(device)

# Total parameters and trainable parameters.
total_params = sum(p.numel() for p in model.parameters())
print(f"{total_params:,} total parameters.")
total_trainable_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad)
print(f"{total_trainable_params:,} training parameters.")
# Optimizer.
optimizer = optim.Adam(model.parameters(), lr=1e-6)
# Loss function.
loss_function = nn.CrossEntropyLoss()

loss, acc = train_model(model, dataloaders, loss_function, optimizer,
                        num_epochs=3)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16,5))
ax[0].plot(loss)
ax[0].set_title("Loss")
ax[1].plot(acc)
ax[1].set_title("Accuracy")

In [None]:
!pip install timm

In [None]:
import timm

model = timm.create_model('swin_tiny_patch4_window7_224',
                          pretrained=True,
                          num_classes=len(class_dict)).to(device)

# Total parameters and trainable parameters.
total_params = sum(p.numel() for p in model.parameters())
print(f"{total_params:,} total parameters.")
total_trainable_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad)
print(f"{total_trainable_params:,} training parameters.")
# Optimizer.
optimizer = optim.Adam(model.parameters(), lr=1e-6)
# Loss function.
loss_function = nn.CrossEntropyLoss()

loss, acc = train_model(model, dataloaders, loss_function, optimizer,
                        num_epochs=5)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16,5))
ax[0].plot(loss)
ax[0].set_title("Loss")
ax[1].plot(acc)
ax[1].set_title("Accuracy")

## Examine model performance

How do we examine our model's predictions? Let's visualize predicted by model classes.

In [None]:
import random

def move_to_gpu(data, device):
    if isinstance(data, (list, tuple)):
        return [move_to_gpu(x, device) for x in data]
    return data.to(device, non_blocking=True)

def predict(image, model):
    xb = move_to_gpu(image.unsqueeze(0), device)
    yb = model(xb)
    _, preds = torch.max(yb, dim=1)
    return validation_dataset.classes[preds[0].item()]

img, label = random.choice(validation_dataset)
plt.imshow(img.permute(1, 2, 0))  # Permuting the image to the format expected by matplotlib
plt.title(f'Actual Class: {validation_dataset.classes[label]}, Predicted Class: {predict(img, model)}')
plt.show()

A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known. It allows you to visualize the performance of a model.

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

def plot_confusion_matrix(model, dataloader, classes):
    model.eval()  # Set model to evaluation mode
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in dataloader:
            images, labels = batch
            images = move_to_gpu(images, device)  # Move images to GPU if available
            labels = labels.to(device)            # Move labels to GPU if available

            outputs = model(images)
            _, preds = torch.max(outputs, dim=1)

            all_preds.extend(preds.cpu().numpy())    # Collect predictions
            all_labels.extend(labels.cpu().numpy())  # Collect true labels

    # Generate confusion matrix
    cm = confusion_matrix(all_labels, all_preds)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes)

    # Plot confusion matrix
    plt.figure(figsize=(10, 10))
    disp.plot(cmap=plt.cm.Blues, xticks_rotation='vertical')
    plt.title("Confusion Matrix")
    plt.show()

plot_confusion_matrix(model, dataloaders["validation"], validation_dataset.classes)

In machine learning, especially in classification tasks, evaluating the performance of a model is crucial. The scikit-learn library provides several metrics for this purpose, including `accuracy_score`, `precision_score`, `recall_score`, and `f1_score`. Here's a brief overview of each:

`accuracy_score`
- Definition: Accuracy is the simplest and most intuitive performance metric. It calculates the ratio of the number of correct predictions to the total number of predictions.
- Formula: (Accuracy = $\frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}$)
- Use Case: While accuracy is straightforward, it may not be the best metric for imbalanced datasets, where the number of instances in each class varies significantly.

`precision_score`
- Definition: Precision measures the accuracy of positive predictions. It is the ratio of true positive predictions to the total number of positive predictions (including false positives).
- Formula: (Precision = $\frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$)
- Use Case: Precision is crucial when the cost of a false positive is high. For example, in spam detection, a false positive (marking a legitimate email as spam) is usually seen as more problematic than a false negative (failing to detect a spam email).

`recall_score`
- Definition: Recall (also known as sensitivity) measures the ability of a model to find all the relevant cases within a dataset. It is the ratio of true positive predictions to the total number of actual positives.
- Formula: (Recall = $\frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$)
- Use Case: Recall is important when the cost of a false negative is high. For instance, in medical diagnosis, failing to detect a disease (false negative) could be more critical than incorrectly diagnosing a disease (false positive).

`f1_score`
- Definition: The F1 score is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful when you need to balance precision and recall.
- Formula: (F1 = $2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$)
- Use Case: The F1 score is valuable in situations where neither false positives nor false negatives can be favored over the other, such as in document classification or customer support ticket categorization.

Each of these metrics offers a different perspective on the performance of a classification model, and the choice of which metric(s) to use depends on the specific requirements and constraints of the task at hand.

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def export_classification_metrics(model, dataloader, classes):
    model.eval()  # Set model to evaluation mode
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in dataloader:
            images, labels = batch
            images = move_to_gpu(images, device)  # Move images to GPU if available
            labels = labels.to(device)  # Move labels to GPU if available

            outputs = model(images)
            _, preds = torch.max(outputs, dim=1)

            all_preds.extend(preds.cpu().numpy())  # Collect predictions
            all_labels.extend(labels.cpu().numpy())  # Collect true labels

    for i, class_name in enumerate(classes):
        class_labels = [1 if label == i else 0 for label in all_labels]
        class_preds = [1 if pred == i else 0 for pred in all_preds]

        accuracy = accuracy_score(class_labels, class_preds) * 100
        precision = precision_score(class_labels, class_preds, zero_division=0) * 100
        recall = recall_score(class_labels, class_preds, zero_division=0) * 100
        f1 = f1_score(class_labels, class_preds, zero_division=0) * 100

        print(f'{class_name}, accuracy: {accuracy:.2f}%, precision: {precision:.2f}%, recall: {recall:.2f}%, F1 score: {f1:.2f}%')

export_classification_metrics(model, dataloaders["validation"], validation_dataset.classes)

# Additional excercise - Peekaboo, I Found You! Localization and classification in one.

Detection, i.e., simultaneous localization and classification of objects, is a much more challenging task. For this reason, in this case, we will not create and train our own network, but we will use an already available model provided by the [Facebook team](https://github.com/facebookresearch/detr).

In [None]:
# import libraries
import requests

import matplotlib.pyplot as plt

import ipywidgets as widgets
from IPython.display import display, clear_output

from PIL import Image

import torch
from torchvision.models import resnet50
import torchvision.transforms as T
torch.set_grad_enabled(False)


%matplotlib inline
from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import pylab
from shutil import copyfile
import os, sys
pylab.rcParams['figure.figsize'] = (8.0, 10.0)

In [None]:
# classes in COCO
CLASSES = [
    'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack',
    'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A',
    'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
    'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
    'toothbrush'
]

# colors for visualization purposes
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

In [None]:
# standardization for COCO data
transform = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# functions to convert bounding box positions for the found objects
def box_cxcywh_to_xyxy(x):
    x_c, y_c, w, h = x.unbind(1)
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         (x_c + 0.5 * w), (y_c + 0.5 * h)]
    return torch.stack(b, dim=1)

def rescale_bboxes(out_bbox, size):
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
    return b

In [None]:
def plot_results(pil_img, prob, boxes):
    plt.figure(figsize=(16,10))
    plt.imshow(pil_img)
    ax = plt.gca()
    colors = COLORS * 100
    for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), colors):
        ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                   fill=False, color=c, linewidth=3))
        cl = p.argmax()
        text = f'{CLASSES[cl]}: {p[cl]:0.2f}'
        ax.text(xmin, ymin, text, fontsize=15,
                bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')
    plt.show()

In [None]:
# We will download the pre-trained detector model from the Internet.
model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
model.eval();

In [None]:
# loading an image from a specific website
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
im = Image.open(requests.get(url, stream=True).raw)

# loading an image from disk
# im = Image.open('./images/000000039769.jpg')

In [None]:
# image normalization
img = transform(im).unsqueeze(0)

# passing through the network
outputs = model(img)

# keeping only the predictions for which the network was at least 90% confident
probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values >= 0.9

# calculating the object locations in pixels
bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)

In [None]:
plot_results(im, probas[keep], bboxes_scaled)

## Finetuning DETR on TACO

In [None]:
!git clone https://github.com/wimlds-trojmiasto/detect-waste.git

# add modules from detr path
sys.path.append('detect-waste/detr/')

In [None]:
# Set up path to annotations
annFile = TACO_DATA_FOLDER + 'annotations.json'

# initialize COCO api for instance annotations
coco = COCO(annFile)

# display COCO categories and supercategories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(', '.join(nms)))

In [None]:
# load and display image
catIds = coco.getCatIds(catNms=['non recyclable'])
imgIds = coco.getImgIds(catIds=catIds)
img_id = imgIds[np.random.randint(0,len(imgIds))]
print('Image n°{}'.format(img_id))

img = coco.loadImgs(img_id)[0]

img_name = '%s/%s'%(TACO_DATA_FOLDER, img['file_name'])
print('Image name: {}'.format(img_name))

I = io.imread(img_name)
plt.figure()
plt.imshow(I)
plt.axis('off')

In [None]:
# load and display instance annotations
plt.imshow(I); plt.axis('off')
annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds)
anns = coco.loadAnns(annIds)
coco.showAnns(anns, draw_bbox=True)

In [None]:
OUTPUTS = HOME_FOLDER + "outputs/"

!python detect-waste/detr/main.py \
  --dataset_file "coco" \
  --coco_path TACO_DATA_FOLDER \
  --output_dir OUTPUTS \
  --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth \
  --epochs 10 \
  --num_classes 7