<a href="https://colab.research.google.com/github/kristupas-g/deep_learning_course/blob/main/resnet50_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep learning course first task

Student: **Kristupas Gaidys** *(2015973)*

Model: **resnet50**

Classes: **broccoli**, **hotdog**, **zucchini**

--- 

## Configuration

In [29]:
!pip install openimages

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [30]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Downloading data

In [31]:
from os import path, makedirs
from math import ceil
from openimages.download import download_dataset

In [32]:
amount_to_classify = 1000
data_dir = "data"
images_per_class = ceil(amount_to_classify/3)
classes = ["Broccoli", "Pizza", "Banana"]

In [33]:
if not path.exists(data_dir):
    makedirs(data_dir)

In [34]:
download_dataset(data_dir, classes, limit=images_per_class)

100%|██████████| 161/161 [00:10<00:00, 15.28it/s]
100%|██████████| 334/334 [00:20<00:00, 16.02it/s]
100%|██████████| 334/334 [00:19<00:00, 17.10it/s]
100%|██████████| 10/10 [00:02<00:00,  4.27it/s]
100%|██████████| 39/39 [00:04<00:00,  9.17it/s]


{'broccoli': {'images_dir': 'data/broccoli/images'},
 'pizza': {'images_dir': 'data/pizza/images'},
 'banana': {'images_dir': 'data/banana/images'}}

## Custom Dataset class

In [35]:
from torchvision.io import read_image
from torch.utils.data.dataset import Dataset
from glob import glob

In [36]:
class ClassificationDataset(Dataset):
    def __init__(self,image_dir,transforms = None):
        self.transforms = transforms
        self.image_dir = image_dir


        self.files = glob(image_dir + "**/*.jpg", recursive=True) 
        
        
    def __getitem__(self, index):
        image_dir = self.files[index]
        image_tensor = read_image(image_dir).float()

        to_rgb = transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x)
        image_tensor = to_rgb(image_tensor)
 
        if self.transforms is not None:
            image_tensor = self.transforms(image_tensor)

        class_name = image_dir.split("/")[1]
        label = torch.tensor([x.lower() for x in classes].index(class_name))

        image_tensor, label = image_tensor.to(device), label.to(device)

        return (image_tensor, label)    


    def __len__(self):
        return len(self.files)

## Model initialization


In [37]:
from torchvision.models import resnet50, ResNet50_Weights 

In [38]:
weights = ResNet50_Weights.DEFAULT

model = resnet50(weights = weights)
model.eval().to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

## Dataloader


In [39]:
from torch.utils.data import DataLoader
import torchvision.transforms as transforms

In [40]:
transform = weights.transforms()

In [41]:
dataset = ClassificationDataset("data/", transforms = transform)
batchsize = 32
workers = 3

dataloader = DataLoader(
    dataset,
    batch_size = batchsize,
    #num_workers = workers 
)

## Performing inference

*class_idx* is a list of indexes of our chosen classes in the model

In [42]:
models_classes = weights.meta["categories"]

class_idx = [models_classes.index(chosen_class.lower()) for chosen_class in classes]

In [43]:
results_as_probabilities_with_target = []

for _, (data, target) in enumerate(dataloader):
    prediction = model(data).sigmoid()
    
    for image_idx, class_predictions in enumerate(prediction):
        chosen_class_predictions = [class_predictions[idx].item() for idx in class_idx]
        actual_class = target[image_idx].item()
        results_as_probabilities_with_target.append((chosen_class_predictions, actual_class))

*results_as_probabilities_with_target* variable holds a tuple where the first element is an array of our class probabilities and the second element of the tuple is the index of the actuall class.

## Result interpretation

### Baseline thresholds

In [44]:
# List of class probabilities without the target
results_as_probabilities = [x[0] for x in results_as_probabilities_with_target]    

Calculating a baseline threshold for our classes. Threshold will just be a mean of the predictions

In [45]:
import numpy as np

In [46]:
thresholds = np.zeros(len(classes))

for class_probabilities in results_as_probabilities: 
    for idx, value in enumerate(class_probabilities):
        thresholds[idx] += value

thresholds = [prob_sum / images_per_class for prob_sum in thresholds] 

print(thresholds)

[0.8897708259392235, 8.947452093471244e-12, 0.0993574814020403]


### Comparing results to threshold values

*results_as_booleans* is a list which contains lists of boolean values that describe if the class was detected in the image

In [47]:
results_as_booleans = []

for class_probabilities in results_as_probabilities:
    image_booleans = []
    for class_idx, class_probability in enumerate(class_probabilities):
        verdict = class_probability >= thresholds[class_idx]
        image_booleans.append(int(verdict))
    results_as_booleans.append(image_booleans)

*results_as_booleans_with_target* is a list of tuples where the first element is an array of booleans and the second element is the index of the class that we are expecting

In [48]:
results_as_booleans_with_target = []

for idx in range(len(results_as_probabilities_with_target)):
    booleans_with_target = (results_as_booleans[idx], results_as_probabilities_with_target[idx][1])
    results_as_booleans_with_target.append(booleans_with_target) 

### Calculating TP, FP, TN, FN

In [49]:
true_positives = 0
false_positives = 0
true_negatives = 0
false_negatives = 0

In [50]:
for predictions, target_idx in results_as_booleans_with_target:
    for idx, prediction in enumerate(predictions):
        # Positives
        if prediction == 1 and idx == target_idx:
            true_positives += 1
        if prediction == 1 and idx != target_idx:
            false_positives += 1
            
        # Negatives
        if prediction == 0 and idx != target_idx:
            true_negatives += 1
        if prediction == 0 and idx == target_idx:
            false_negatives += 1

### Calculating *accuracy*

**Accuracy = (TP + TN) / (TP + TN + FP + FN)**

In [51]:
accuracy = (true_positives + true_negatives) / \
    (true_positives + true_negatives + false_positives + false_negatives)

### Calculating *precision*

**Precision = TP / (TP + FP)**

In [52]:
precision = true_positives / (true_positives + false_positives)

### Calculating *recall*

**Recall = TP / (TP + FN)**

In [53]:
recall = true_positives / (true_positives + false_negatives)

### Calculating *F1 score*

**F1 score = 2 * (precision * recall) / (precision + recall)**

In [54]:
f1 = 2 * (precision * recall) / (precision + recall)

## Generating report

In [55]:
print("Chosen classes: \t\t\t", classes)
print("\n")

print("Amount of predictions done: \t\t", amount_to_classify)
print("Images per class: \t\t\t", images_per_class)
print("\n")

print("TP: \t\t\t\t\t", true_positives)
print("FP: \t\t\t\t\t", false_positives)
print("TN: \t\t\t\t\t", true_negatives)
print("FN: \t\t\t\t\t", false_negatives)
print("\n")

print("Accuracy: \t\t\t\t", accuracy)
print("Precision: \t\t\t\t", precision)
print("Recall: \t\t\t\t", recall)
print("F1 score: \t\t\t\t", f1)

Chosen classes: 			 ['Broccoli', 'Pizza', 'Banana']


Amount of predictions done: 		 1000
Images per class: 			 334


TP: 					 81
FP: 					 235
TN: 					 1521
FN: 					 797


Accuracy: 				 0.6082004555808656
Precision: 				 0.2563291139240506
Recall: 				 0.09225512528473805
F1 score: 				 0.135678391959799
