# Design and Train an object detector to detect objects

You have to design and implement a Training Pipeline that can train, test and visualize the model using the dataset provided.

## Assignment Protocols

- We expect it to take ~4 hours, with an extra 15 min for clear loom explanation(s)
  - The assessment is timeboxed at 5 hours total in a single block. So please plan accordingly
- You need to use Google Collaboratory to run and edit this notebook
- You can only use Python as a programming Language
- You cannot take help from any other person
- You can use Google to search for references
- You can not search on google for design-related things, like what should be loss function, or what should be model architecture.
  - But you can use pre-trained backbones from PyTorch
- Record a 5-10 mins of code walkthrough of the work you have done. You can use Loom Platform (https://www.loom.com) to record the video.
  - Design Decisions
    - Model Design which layers and activation functions you used and why
    - Loss function, which loss functions you used and why
    - Metrics, which metrics and why
  - Any optimizations you have made to the codebase
  - How you implemented resume functionality, what were the things you thought would be needed to resume training from exact same point
  - Explain what parts of the assessment are completed and what is missing?
  - Make sure to submit the screen recording link in the submission after you are done recording
  - Please note that the free plan on Loom only allows for videos up to 5 minutes in length. As such, you may need to record two separate 5-minute videos.
- [NO SUBMISSION WILL BE ACCEPTED WITHOUT]
  - Trained best model weights
  - Visualize Function in the Notebook
  - Code Walk-through video

## Task Details
Design a Training Pipeline to train a object detector with following specs or assumptions:
- Implement & Design Model
  - You can use any backbone
    - Either from PyTorch (torhvision) or any resource online
    - But you need to design head your self (head means how you will use features of the back bone and get the desired outputs)
  - Model needs to detect one object in each image
  - Model should output following for each image passed as input:
    - Whether we have an object or not
    - Where is the object?
      - The bounding box output format should be xmin, ymin, xmax, ymax
      - It is not necessary the model is trained to output exactly this format but the visualize function which shows output should output in this format
    - Either the object is a cat or dog?
    - And which specie the object belongs to? There are in total 9 species: 
      - Cat [3 species]:
        - Abyssinian
        - Birman
        - Persian
      - Dog [6 species]
        - american_bulldog
        - american_pit_bull_terrier
        - basset_hound
        - beagle
        - chihuahua
        - pomeranian
- Implement Custom Dataloader
  - This is obvious as dataset is in a unique format any predifined dataloader wont work
  - Follow best practices of writing custom dataloaders
  - Details of the format of the dataset are defined in the Dataset Details section below
  - Add needed pre-processing that you think would help train a better model or would help as we are using pre-trained weights as starting point
  - Add augmentations that you think would help train a better model
- Implement Loss Function
  - Design and implement a loss function that can handle all of the outputs we have
  - You can use pytorch built-in loss functions
  - There are many scenarios which you need to handle, which one can understand from the dataset details and the model design
- Implement Test Function
  - The test function should be able to run the model on the validation set and output the metrics for all the outputs of the model
  - Select the metrics carefully, there are many scenarios which can change the selection of a metric
  - Keep in mind there are multiple outputs, you would need a metric for each output
  - [NOTE] You don't need to implement metrics for the bounding box output as it can take more time than provided for this assessment. But please add details of the metrics you would have implemented in your code-walk through loom video.
- Update Resume Training Functionality using the best weights
  - Current script does not have save best weights functionality
  - The code should be able to resume training from exactly same point from where the training was stopped if model weights file is passed
  - Keep in mind you can not resume training from same point by just loading weights of the model
- Implement a visualize function [Most important, without this no submission will be accepted]
  - The input of the function should be path of a folder with images and the weight file
    - Also the output folder path to save outputs
  - This function should return a dictionary of dictionaries with following details for each image:
    - {
        "has_object": True,
        "cat_or_dog": "cat",
        "specie": "persian",
        "xmin": 10,
        "ymin": 10,
        "xmax": 10,
        "ymax": 10
    }
  - And in case there is no object it should have 0 for bbox values, "NA" for "cat_or_dog" and "specie", and False for "has_object".
  - Values of the returned dictionary should be like explained above and keys should be image names including the extension ".jpg" or ".jpeg"
  - Should save output image with bounding box drawn on it, with same name input image but place in the output folder 
- Try to train the best model


## Dataset Details
The dataset has in total 1041 images. Each image has a single object which is either a cat or a dog.
- There are multiple species for both cat and dog.
- The number of images falling in each specie is as follows:
  - basset_hound: 93
  - Birman: 93
  - pomeranian: 93
  - american_pit_bull_terrier: 93
  - american_bulldog: 93
  - Abyssinian: 92
  - beagle: 93
  - Persian: 93
  - chihuahua: 93
  - empty: 142
- The dataset has two folders:
  - images
    - Inside images folder we have 986 images in .jpg folder
  - labels
    - Inside labels folder we have 899 .xml files each file with details of image labels
    - For any image that does not have a cat or dog, there is no corresponding xml file

## Deliverable
- Updated Colab Based Jupyter Notebook:
  - With all the required functionality Implemented
  - Which one can train the model without any errors
  - One should achieve same metrics (Almost same metrics) if I run training using this collab notebook
    - Set default values for everything accordingly in the notebook
  - During evaluation we will just run the notebook and use the best weights the notebook saves automatically
- Best weights you have trained
  - We will Evaluate your weights against hold-out test we have and compare results
  - We will use visualize function to generate outputs for each image
  - Upload weights in an easily downloadable location like, Dropbox, Google Drive, Github, etc
- A video code-walk through explaining your design decisions including but not limited to:
  - Model Design which layers and activation functions you used and why
  - Loss function, which loss functions you used and why
  - Metrics, which metrics and why
  - Any optimizations you have made to the codebase
  - How you implemented resume functionality, what were the things you thought would be needed to resume training from exact same point


## Evaluation Criteria
 - Design Decisions
 - Completeness: Did you include all features?
 - Correctness: Does the solution (all deliverables) work in sensible, thought-out ways?
 - Maintainability: Is the code written in a clean, maintainable way?
 - Testing: Is the solution adequately tested?
 - Documentation: Is the codebase well-documented and has proper steps to run any of the deliverables?

## Extra Points
- Add metrics for the Bounding Box Output
- Any Updates in the notebook (Bugs/Implementation Mistakes etc)

## How to submit
- Please upload the Notebook for this project to GitHub, and post a link to your repository below [repo link box, on the left of submit button].
  - Create a new GitHub repository from scratch
  - Add the final Colab/Jupyter notebook to the repository
- Please upload video and your final best weights on Google Drive or any other platform, and paste the link to the folder with both video and model in the text box just above the submit button.
- Please paste the commit Id of the latest commit of your Github Repo, which should not be later than 5 hours of time when the repo was created.
  - Please note the submission without the commit id will not be considered.

# Install Required Modules

In [1]:
! pip install bs4 lxml kaggle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25l[?25hdone
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1257 sha256=4ab5c1223c75acaf5805e58dabf61aa583b266ac43055c40f2150dd8d110a063
  Stored in directory: /root/.cache/pip/wheels/25/42/45/b773edc52acb16cd2db4cf1a0b47117e2f69bb4eb300ed0e70
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1


# Download Dataset from Kaggle

In [2]:
import os
os.environ['KAGGLE_USERNAME'] = 'bilalyousaf0014'
os.environ['KAGGLE_KEY'] = '11031bc21c5e3ec23585dbe17dc4267d'

In [3]:
!kaggle datasets download -d bilalyousaf0014/ml-engineer-assessment-dataset

Downloading ml-engineer-assessment-dataset.zip to /content
 97% 76.0M/78.6M [00:03<00:00, 28.4MB/s]
100% 78.6M/78.6M [00:03<00:00, 24.8MB/s]


In [4]:
! unzip /content/ml-engineer-assessment-dataset.zip

Archive:  /content/ml-engineer-assessment-dataset.zip
  inflating: assessment_dataset/images/00001.jpeg  
  inflating: assessment_dataset/images/00008.jpeg  
  inflating: assessment_dataset/images/00017.jpeg  
  inflating: assessment_dataset/images/00022.jpeg  
  inflating: assessment_dataset/images/00048.jpeg  
  inflating: assessment_dataset/images/00055.jpeg  
  inflating: assessment_dataset/images/001.jpeg  
  inflating: assessment_dataset/images/1.jpeg  
  inflating: assessment_dataset/images/1001524.jpeg  
  inflating: assessment_dataset/images/1005343.jpeg  
  inflating: assessment_dataset/images/1049854.jpeg  
  inflating: assessment_dataset/images/1072860.jpeg  
  inflating: assessment_dataset/images/1120419.jpeg  
  inflating: assessment_dataset/images/1146885.jpeg  
  inflating: assessment_dataset/images/2.jpeg  
  inflating: assessment_dataset/images/3.jpeg  
  inflating: assessment_dataset/images/4.jpeg  
  inflating: assessment_dataset/images/5.jpeg  
  inflating: assessm

# MODEL IMPLEMENTATION:

In [5]:
import os
import torch
import torch.nn as nn
import numpy as np

from torchvision.models import resnet18, ResNet18_Weights
seed=1998
def seed_everything(seed:int):
    os.environ['CUBLAS_WORKSPACE_CONFIG'] = ":4096:8"
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.backends.cudnn.enabled = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    torch.use_deterministic_algorithms(True,warn_only=True)
    return 
seed_everything(seed)

In [6]:
class Model(nn.Module):

  def __init__(self):
    super(Model, self).__init__()
    pretrained_model = resnet18(weights=ResNet18_Weights.DEFAULT)
    self.backbone = nn.Sequential(*list(pretrained_model.children())[:-2])
    ### Initialize the required Layers
    self.gap=nn.AdaptiveAvgPool2d(1)
    self.have_object = nn.Sequential(
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,1),
        nn.Sigmoid()
    )
    self.type_and_species_neck=nn.Sequential(
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,512),
        nn.LeakyReLU(),
    )
    self.cat_or_dog_classifier = nn.Sequential(
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,1),
        nn.Sigmoid()
    )
    self.specie_classifier = nn.Sequential(
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,10)
    )
    self.bbox_reg = nn.Sequential(
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,512),
        nn.LeakyReLU(),
        nn.Linear(512,4),
        nn.Sigmoid()
    )
    ### Initialize the required Layers

  def forward(self, input):
      out_backbone = self.backbone(input)
      ### Write Forward Calls for the Model
      global_avg_pool_out=self.gap(out_backbone).reshape((-1,512))
      objectiveness_out=self.have_object(global_avg_pool_out)
      type_and_species_context_out= self.type_and_species_neck(global_avg_pool_out)
      cat_or_dog_out=self.cat_or_dog_classifier(type_and_species_context_out)
      specie_out=self.specie_classifier(type_and_species_context_out)
      pred_bbox=self.bbox_reg(global_avg_pool_out)
      return {
          "bbox": pred_bbox,
          "object": objectiveness_out,
          "cat_or_dog": cat_or_dog_out,
          "specie": specie_out
      }

# CUSTOM DATALOADER IMPLEMENTATION

In [7]:
train_list = np.load('/content/assessment_dataset/train_list.npy', allow_pickle=True).tolist()
val_list = np.load('/content/assessment_dataset/val_list.npy', allow_pickle=True).tolist()

In [8]:
from bs4 import BeautifulSoup

def read_xml_file(path):
    with open(path, 'r') as f:
        data = f.read()
    bs_data = BeautifulSoup(data, 'xml')
    return {
        "cat_or_dog": bs_data.find("name").text,
        "xmin": int(bs_data.find("xmin").text),
        "ymin": int(bs_data.find("ymin").text),
        "xmax": int(bs_data.find("xmax").text),
        "ymax": int(bs_data.find("ymax").text),
        "specie": "_".join(path.split(os.sep)[-1].split("_")[:-1])
    }

In [21]:
from PIL import Image
from albumentations import (
    Compose, ShiftScaleRotate, RandomBrightness, HueSaturationValue, RandomContrast, HorizontalFlip,VerticalFlip,
    Rotate, Resize, CLAHE, ColorJitter, RandomBrightnessContrast, GaussianBlur, Blur, MedianBlur,
    Downscale, ChannelShuffle, Normalize, OneOf, GaussNoise,
    RandomScale,Equalize
)
import torchvision.transforms as transforms
import random
from albumentations.core.transforms_interface import ImageOnlyTransform
import numpy as np
from albumentations.pytorch import ToTensorV2
import cv2
from torch.utils.data import Dataset

class Normalization(ImageOnlyTransform):
    """
    this class normalize the input image by dividing image values by 255

    """
    def __init__(self,mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], always_apply: bool = True, p: float = 1):
        super().__init__(always_apply, p)
        self.mean=np.array(mean)
        self.std=np.array(std)

    def apply(self, img, **params) :
        return (img-self.mean)/self.std
def get_heavy_transform_pipeline(width=256,height=128,is_train=True,aug_bounds=[0.3,0.5]):

    if(is_train):
        return Compose([
          OneOf([
              ShiftScaleRotate(rotate_limit=15, p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
              Rotate(limit=15, p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
              RandomScale(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          ], p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          HorizontalFlip(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          VerticalFlip(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          GaussNoise(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          OneOf([
              Blur(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
              GaussianBlur(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
              MedianBlur(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          ], p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          OneOf([
              OneOf([
                  RandomBrightness(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
                  RandomContrast(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
              ], p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),#p=0.75),
              OneOf([
                  CLAHE(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
                  ColorJitter(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
                  HueSaturationValue(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
                  ChannelShuffle(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
              ], p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),#p=0.75),
          ],p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),#0.5),
          Downscale(scale_min=0.25, scale_max=0.25, p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
          Resize(height=height,width=width),
          Normalization(always_apply=True),
          ToTensorV2()


        ])
    else:
        return Compose([
            Resize(height=height,width=width),
            Normalization(always_apply=True),
            ToTensorV2()
        ])

def get_light_transform_pipeline(width=256,height=128,is_train=True,aug_bounds=[0.3,0.5]):
    if(is_train):
        return Compose([
            Rotate(limit=15, p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
            HorizontalFlip(p=random.uniform(low=aug_bounds[0],high=aug_bounds[1])),
            Downscale(),
            Resize(height=height,width=width),
            Normalization(always_apply=True),
            ToTensorV2()

        ])
    else:
        return Compose([
            Resize(height=height,width=width),
            Normalization(always_apply=True),
            ToTensorV2()
        ])
cat_or_dog_to_encoding={
    'cat':0,
    'dog':1
}
species_to_encoding={
    'abyssinian':1,
    'birman':2,
    'persian':3,
    'american_bulldog':4,
    'american_pit_bull_terrier':5,
    'basset_hound':6,
    'beagle':7,
    'chihuahua':8,
    'pomeranian':9,
    'empty':0
}

class CustomDataset(Dataset):

  def __init__(self, dataset_path, images_list, train=False,width=256,height=128):

    self.preprocess = get_light_transform_pipeline(width=width,height=128,is_train=train)

    image_folder_path = os.path.join(dataset_path, "images")
    label_folder_path = os.path.join(dataset_path, "labels")

    self.data=[]
    for path in os.listdir(image_folder_path):
        name = path.split(os.sep)[-1].split(".")[0]
        if name in images_list:
          xml_path = os.path.join(label_folder_path, name+".xml")
          if(os.path.exists(xml_path)):
            xml_data = read_xml_file(xml_path)
            self.data.append((os.path.join(image_folder_path,path),{'is_object':1,**xml_data}))
          else:
            self.data.append((os.path.join(image_folder_path,path),{'is_object':0,'cat_or_dog':'cat','xmin':0,'xmax':0,'ymin':0,'ymax':0,'specie':'empty'}))
  def __len__(self):
    return len(self.data)

  def __getitem__(self, index):
    image_path,label_dict=self.data[index]
    image=cv2.imread(image_path)
    width,height=image.shape[2],image.shape[1]
    is_object=label_dict['is_object']
    cat_or_dog_class=cat_or_dog_to_encoding[str(label_dict['cat_or_dog']).lower()]
    species_class=species_to_encoding[str(label_dict['specie']).lower()]
    xmin,xmax,ymin,ymax=float(label_dict['xmin'])/width,float(label_dict['xmax'])/width,float(label_dict['ymin'])/height,float(label_dict['ymax'])/height

    bbox=[xmin,ymin,xmax-xmin,ymax-ymin]
    transform = self.preprocess(image=image,bbox=bbox)
    image=transform['image']
    bbox=transform['bbox']
    labels = {
        'is_object':torch.tensor(is_object).long(),
        'cat_or_dog':torch.tensor(cat_or_dog_class).long(),
        'species_class':torch.tensor(species_class).long(),
        'bbox': torch.tensor(bbox).float()
    }
    return image, labels
  def collate_fn(self, batch):
    images=[]
    is_objects=[]
    cat_or_dogs=[]
    species_classes=[]
    bboxes=[]
    for image,label in batch:
      images.append(image)
      is_objects.append(label['is_object'])
      cat_or_dogs.append(label['cat_or_dog'])
      species_classes.append(label['species_class'])
      bboxes.append(label['bbox'])
    return torch.stack(images).float(),{
        'is_object':torch.stack(is_objects).unsqueeze(dim=1).float(),
        'cat_or_dog':torch.stack(cat_or_dogs).unsqueeze(dim=1).float(),
        'species_class':torch.stack(species_classes).long(),
        'bbox':torch.stack(bboxes).float()
        }

# TRAINING LOOP IMPLEMENTATION

## Initializations

In [22]:
from torch.utils.data import DataLoader

In [23]:
!pip install torchmetrics

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [24]:
# import our library
import torchmetrics
from tqdm import tqdm
def train(epochs, model_weights):

  # Initialize Model and Optimizer
  LR=0.001
  best_loss=1e9
  start_epoch=0
  model = Model()
  optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
  if(model_weights is not None):
    model_state_dict=torch.load(model_weights)
    optimizer.load_state_dict(model_state_dict['optimizer'])
    model.load_state_dict(model_state_dict['state_dict'])
    best_loss=model_state_dict['best_loss']
    start_epoch=model_state_dict['best_epoch']
    

  # Initialize Loss Functions
  have_object_loss = nn.BCELoss()
  specie_loss = nn.CrossEntropyLoss()
  cat_or_dog_loss = nn.BCELoss()
  bbox_loss = nn.MSELoss() # Not necessary you need to apply function to all coordinates together, You can have separete loss functions for all coordinates too
  # Below or Above
  # xmin_loss = None
  # ymin_loss = None
  # xmax_loss = None
  # ymax_loss = None

  training_dataset = CustomDataset("/content/assessment_dataset", images_list=train_list)
  training_loader = DataLoader(training_dataset,pin_memory=True,batch_size=64,shuffle=True,num_workers=4,collate_fn=training_dataset.collate_fn)

  device='cpu'
  if torch.cuda.is_available():
    device='cuda'
  model = model.to(device)

  def train_one_epoch(epoch_index, tb_writer):
      running_loss = 0.
      last_loss = 0.
      
      model.train()
      # Here, we use enumerate(training_loader) instead of
      # iter(training_loader) so that we can track the batch
      # index and do some intra-epoch reporting
      for i, data in tqdm(enumerate(training_loader),total=len(training_loader),desc="TRAIN LOOP"):
          # Every data instance is an input + label pair
          optimizer.zero_grad()
          inputs, labels = data
          label_is_object= labels['is_object'].to(device)
          label_species_class=labels['species_class'].to(device)
          label_cat_or_dog_class=labels['cat_or_dog'].to(device)
          label_bbox=labels['bbox'].to(device)
          inputs=inputs.to(device)
          # Make predictions for this batch
          outputs = model(inputs)

          pred_is_object=outputs["object"]
          pred_species_class=outputs["specie"]
          pred_cat_or_dog=outputs["cat_or_dog"]
          pred_bbox=outputs["bbox"]
          # Compute the loss and its gradients
          loss_have_object = have_object_loss(pred_is_object,label_is_object)
          loss_specie = specie_loss(pred_species_class, label_species_class)
          loss_cat_or_dog = cat_or_dog_loss(pred_cat_or_dog, label_cat_or_dog_class)
          
          loss_bbox = bbox_loss(pred_bbox, label_bbox)


          loss =  (loss_have_object+loss_cat_or_dog+loss_specie+10*loss_bbox)# Consolidate all individual losses
          loss.backward()
          optimizer.step()
          # Gather data and report
          running_loss += loss.item()

          if i % 10 == 0:
              last_loss = running_loss / 10 # loss per batch
              running_loss = 0.
              
      return last_loss

  
  for i in range(start_epoch,epochs):
    epoch_loss= train_one_epoch(i, None)
    print(f' Epoch {i} Loss : {epoch_loss}')
    if(epoch_loss<best_loss):
        model_state_dict={
            'state_dict':model.state_dict(),
            'optimizer':optimizer.state_dict(),
            'best_epoch':i,
            'best_loss':epoch_loss,
        }
        torch.save(model_state_dict,'best_weights.pt')
        best_loss=epoch_loss
    # torch.save("model.pth", model.state_dict())
    metrics = test(model,val_list)
    print(metrics)

def test(model, val_list):
  device='cpu'
  if torch.cuda.is_available():
    device='cuda'
  model.eval()
  model.to(device)
  def post_process_object(x):
    return (x > 0.5).long()

  def post_process_cat_or_dog(x):
    return (x > 0.5).long()

  def post_process_specie(x):
    return torch.argmax(x,dim=1)

  def post_process_bbox(x):
    return x

  val_dataset = CustomDataset("/content/assessment_dataset", images_list=val_list)
  val_loader = DataLoader(val_dataset,batch_size=1,pin_memory=True,num_workers=4,collate_fn=val_dataset.collate_fn)


  metric_object = torchmetrics.classification.BinaryAccuracy()
  metric_cat_or_dog = torchmetrics.classification.BinaryAccuracy()
  metric_specie = torchmetrics.classification.MulticlassAccuracy(num_classes=len(species_to_encoding.keys()))
  metric_bbox = torchmetrics.regression.MeanSquaredError()
  running_object_acc=0
  running_cat_or_doc_acc=0
  running_specie_acc=0
  running_bbox_mse=0
  for i, data in tqdm(enumerate(val_loader),total=len(val_loader),desc='VAL LOOP'):
    inputs, labels = data
    inputs=inputs.to(device)
    # Make predictions for this batch
    outputs = model(inputs)
    pred_is_object=post_process_object(outputs["object"].cpu())
    pred_species_class=post_process_specie(outputs["specie"].cpu())
    pred_cat_or_dog=post_process_cat_or_dog(outputs["cat_or_dog"].cpu())
    pred_bbox=post_process_bbox(outputs["bbox"].cpu())

    label_is_object= labels['is_object'].cpu()
    label_species_class=labels['species_class'].cpu()
    label_cat_or_dog_class=labels['cat_or_dog'].cpu()
    label_bbox=labels['bbox'].cpu()

    score_object = metric_object(pred_is_object,label_is_object)
    score_cat_or_dog = metric_cat_or_dog(pred_cat_or_dog,label_cat_or_dog_class)
    score_specie = metric_specie(pred_species_class,label_species_class)
    score_bbox = metric_bbox(pred_bbox,label_bbox)
    
    running_object_acc+=score_object.item()
    running_cat_or_doc_acc+=score_cat_or_dog.item()
    running_specie_acc+=score_specie.item()
    running_bbox_mse+=score_bbox.item()

  
  running_object_acc/=len(val_loader)
  running_cat_or_doc_acc/=len(val_loader)
  running_specie_acc/=len(val_loader)
  running_bbox_mse/=len(val_loader)
  return running_object_acc, running_cat_or_doc_acc, running_specie_acc, running_bbox_mse

In [25]:
train(50,'/content/best_weights.pt')

TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.68it/s]


 Epoch 0 Loss : 13676.30732421875


VAL LOOP: 100%|██████████| 194/194 [00:06<00:00, 27.80it/s]

(0.8608247422680413, 0.6907216494845361, 0.023195876634305286, 1316.0516324298283)



TRAIN LOOP: 100%|██████████| 13/13 [00:08<00:00,  1.58it/s]


 Epoch 1 Loss : 12539.84267578125


VAL LOOP: 100%|██████████| 194/194 [00:04<00:00, 45.51it/s]

(0.8969072164948454, 0.845360824742268, 0.030927835512407048, 1316.0451386878171)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.79it/s]


 Epoch 2 Loss : 13128.99404296875


VAL LOOP: 100%|██████████| 194/194 [00:03<00:00, 52.91it/s]

(0.9020618556701031, 0.7938144329896907, 0.03969072224092238, 1316.0476505473716)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.84it/s]


 Epoch 3 Loss : 13461.642578125


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 71.85it/s]

(0.9432989690721649, 0.8608247422680413, 0.03917525831571559, 1316.0434756982572)



TRAIN LOOP: 100%|██████████| 13/13 [00:05<00:00,  2.32it/s]


 Epoch 4 Loss : 12795.92177734375


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 73.52it/s]

(0.9020618556701031, 0.8608247422680413, 0.03917525831571559, 1316.0443782763382)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.98it/s]


 Epoch 5 Loss : 13104.96689453125


VAL LOOP: 100%|██████████| 194/194 [00:03<00:00, 54.99it/s]

(0.9536082474226805, 0.8762886597938144, 0.04948453681985127, 1316.0450882152797)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.84it/s]


 Epoch 6 Loss : 13303.5248046875


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 70.29it/s]

(0.9639175257731959, 0.7783505154639175, 0.047938145044230926, 1316.0448683958077)



TRAIN LOOP: 100%|██████████| 13/13 [00:05<00:00,  2.49it/s]


 Epoch 7 Loss : 13599.24208984375


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 67.47it/s]

(0.9226804123711341, 0.845360824742268, 0.04432989756778343, 1316.046956758831)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.97it/s]


 Epoch 8 Loss : 13049.937109375


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 72.01it/s]

(0.9329896907216495, 0.7577319587628866, 0.03144329943761383, 1316.055577935939)



TRAIN LOOP: 100%|██████████| 13/13 [00:05<00:00,  2.28it/s]


 Epoch 9 Loss : 13044.0033203125


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 70.84it/s]

(0.6752577319587629, 0.6752577319587629, 0.042783505792163085, 1316.04983119006)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.87it/s]


 Epoch 10 Loss : 13101.1251953125


VAL LOOP: 100%|██████████| 194/194 [00:03<00:00, 52.69it/s]

(0.9639175257731959, 0.9226804123711341, 0.06597938242646836, 1316.0426351676897)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.95it/s]


 Epoch 11 Loss : 13663.4982421875


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 71.01it/s]

(0.9484536082474226, 0.9020618556701031, 0.054639176071919114, 1316.0428519295049)



TRAIN LOOP: 100%|██████████| 13/13 [00:05<00:00,  2.22it/s]


 Epoch 12 Loss : 13415.98466796875


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 71.29it/s]

(0.9381443298969072, 0.8865979381443299, 0.06340206280043445, 1316.0422598347836)



TRAIN LOOP: 100%|██████████| 13/13 [00:05<00:00,  2.25it/s]


 Epoch 13 Loss : 12715.20673828125


VAL LOOP: 100%|██████████| 194/194 [00:03<00:00, 57.70it/s]

(0.9639175257731959, 0.865979381443299, 0.062371134950020876, 1316.0466610217832)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.88it/s]


 Epoch 14 Loss : 13219.6390625


VAL LOOP: 100%|██████████| 194/194 [00:06<00:00, 29.18it/s]

(0.9175257731958762, 0.8350515463917526, 0.059278351398780176, 1316.043011381454)



TRAIN LOOP: 100%|██████████| 13/13 [00:06<00:00,  2.12it/s]

 Epoch 15 Loss : 13044.6423828125



VAL LOOP: 100%|██████████| 194/194 [00:03<00:00, 54.51it/s]

(0.8350515463917526, 0.8041237113402062, 0.059793815323986955, 1316.047137725599)



TRAIN LOOP: 100%|██████████| 13/13 [00:07<00:00,  1.81it/s]


 Epoch 16 Loss : 12950.1853515625


VAL LOOP: 100%|██████████| 194/194 [00:04<00:00, 48.11it/s]

(0.979381443298969, 0.9329896907216495, 0.07061855775332943, 1316.0460010899096)



TRAIN LOOP: 100%|██████████| 13/13 [00:06<00:00,  2.13it/s]


 Epoch 17 Loss : 13627.0919921875


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 71.05it/s]


(0.9587628865979382, 0.8608247422680413, 0.056701031772746255, 1316.049199679463)


TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.74it/s]


 Epoch 18 Loss : 13236.30302734375


VAL LOOP: 100%|██████████| 194/194 [00:03<00:00, 53.60it/s]

(0.9432989690721649, 0.8969072164948454, 0.07216494952894978, 1316.0424851944151)



TRAIN LOOP: 100%|██████████| 13/13 [00:04<00:00,  2.89it/s]


 Epoch 19 Loss : 13620.41494140625


VAL LOOP: 100%|██████████| 194/194 [00:02<00:00, 70.38it/s]

(0.9226804123711341, 0.8814432989690721, 0.06082474317440053, 1316.0420530432277)



TRAIN LOOP:   0%|          | 0/13 [00:01<?, ?it/s]
Exception in thread Thread-136 (_pin_memory_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 51, in _pin_memory_loop
    do_one_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 28, in do_one_step
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/reductions.py", line 307, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_c

KeyboardInterrupt: ignored

In [39]:
from PIL import ImageDraw,ImageFont

def visualize(model_weights, image_folder_path, output_folder="output"):
  encoding_to_species={item:key for key,item in species_to_encoding.items()}
  encoding_to_cat_or_dog={item:key for key,item in cat_or_dog_to_encoding.items()}
  def post_process_object(x):
    return bool((x > 0.5).long().squeeze().item())

  def post_process_cat_or_dog(x):
    return encoding_to_cat_or_dog[int((x > 0.5).long().squeeze().item())]

  def post_process_specie(x):
    return encoding_to_species[int(torch.argmax(x,dim=1).squeeze().item())]

  def post_process_bbox(x,original_width,original_height):
    x=x.squeeze().numpy()
    x_offset,y_offset,width,height=x[0],x[1],x[2],x[3]
    return [(int(x_offset*original_width),int(y_offset*original_height)),(int((x_offset+width)*original_width),int((y_offset+height)*original_height))]
  if(not os.path.exists(output_folder)):
    os.makedirs(output_folder)
  model = Model()
  model_state_dict=torch.load(model_weights)
  model.load_state_dict(model_state_dict['state_dict'])
  preprocess = get_heavy_transform_pipeline(is_train=False)
  model.eval()
  return_dict={}
  for image_name in os.listdir(image_folder_path):

    image=Image.open(os.path.join(image_folder_path,image_name))
    image_np=np.array(image)
    original_height,original_width=image_np.shape[:2]
    input=preprocess(image=image_np)['image']
    input=input.unsqueeze(dim=0).float()
    output=model(input)
    pred_is_object=post_process_object(output["object"].detach().cpu())
    pred_species_class=post_process_specie(output["specie"].detach().cpu())
    pred_cat_or_dog=post_process_cat_or_dog(output["cat_or_dog"].detach().cpu())
    pred_bbox=post_process_bbox(output["bbox"].detach().cpu(),original_height=original_height,original_width=original_width)
    image_draw=ImageDraw.Draw(image)
    image_draw.rectangle(pred_bbox,outline ="red")
    image_draw.text(pred_bbox[0],text=f'{pred_cat_or_dog} : {pred_species_class}')
    image.save(os.path.join(output_folder,image_name))
    return_dict[image_name]={ 
        "has_object": pred_is_object,
        "cat_or_dog": pred_cat_or_dog if(pred_is_object) else 'NA',
        "specie": pred_species_class if(pred_is_object) else 'NA',
        "xmin": pred_bbox[0][0] if(pred_is_object) else 0,
        "ymin": pred_bbox[0][1] if(pred_is_object) else 0,
        "xmax": pred_bbox[1][0] if(pred_is_object) else 0,
        "ymax": pred_bbox[1][1] if(pred_is_object) else 0
        }
  return return_dict

In [40]:
visualize(model_weights='/content/best_weights.pt',image_folder_path='/content/assessment_dataset/images')

Abyssinian_117.jpg
401 500
tensor([[1.0000e+00, 3.6730e-10, 1.0000e+00, 4.3795e-01]])
[(500, 0), (1000, 175)]
{'has_object': True, 'cat_or_dog': 'cat', 'specie': 'abyssinian', 'xmin': 40, 'ymin': 40, 'xmax': 300, 'ymax': 300}


{'Abyssinian_117.jpg': {'has_object': True,
  'cat_or_dog': 'cat',
  'specie': 'abyssinian',
  'xmin': 40,
  'ymin': 40,
  'xmax': 300,
  'ymax': 300}}