## Drive Setup

It is to store the dataset and trained models

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
drive_dataset_path = '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset'

## Download Data

We use `coco 2017` dataset from FiftyOne

In [None]:
!pip install fiftyone
!pip install fiftyone-db-ubuntu2204



In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

As agreed in the team discussion, we sample
- 4000 data train
- 500 data validation (splitted from train)
- 500 data test (in validation split to compare performance model)

In [None]:
# Load the COCO-2017 dataset
# This will download it from the FiftyOne Dataset Zoo if necessary

dataset = foz.load_zoo_dataset("coco-2017", split="train", label_types=["detections"], classes=["person"], max_samples=4500, seed=43, dataset_dir=drive_dataset_path)
dataset_test = foz.load_zoo_dataset("coco-2017", split="validation", label_types=["detections"], classes=["person"], max_samples=500, seed=43, dataset_dir=drive_dataset_path)

# Print summary information about the view
print(dataset)

Downloading split 'train' to '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/train' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'train' to '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/train' if necessary


Found annotations at '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/raw/instances_train2017.json'


INFO:fiftyone.utils.coco:Found annotations at '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/raw/instances_train2017.json'


Sufficient images already downloaded


INFO:fiftyone.utils.coco:Sufficient images already downloaded


Existing download of split 'train' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'train' is sufficient


Loading 'coco-2017' split 'train'


INFO:fiftyone.zoo.datasets:Loading 'coco-2017' split 'train'


 100% |███████████████| 4500/4500 [39.0s elapsed, 0s remaining, 126.4 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 4500/4500 [39.0s elapsed, 0s remaining, 126.4 samples/s]      


Dataset 'coco-2017-train-4500' created


INFO:fiftyone.zoo.datasets:Dataset 'coco-2017-train-4500' created


Downloading split 'validation' to '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/validation' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'validation' to '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/validation' if necessary


Found annotations at '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/raw/instances_val2017.json'


INFO:fiftyone.utils.coco:Found annotations at '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/dataset/raw/instances_val2017.json'


Sufficient images already downloaded


INFO:fiftyone.utils.coco:Sufficient images already downloaded


Existing download of split 'validation' is sufficient


INFO:fiftyone.zoo.datasets:Existing download of split 'validation' is sufficient


Loading 'coco-2017' split 'validation'


INFO:fiftyone.zoo.datasets:Loading 'coco-2017' split 'validation'


 100% |█████████████████| 500/500 [3.4s elapsed, 0s remaining, 132.9 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [3.4s elapsed, 0s remaining, 132.9 samples/s]      


Dataset 'coco-2017-validation-500' created


INFO:fiftyone.zoo.datasets:Dataset 'coco-2017-validation-500' created


Name:        coco-2017-train-4500
Media type:  image
Num samples: 4500
Persistent:  False
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


In this project we need the `person` class only for person detection. So, here we do filtering the detection to contain `person` class only

In [None]:
# Iterate over the dataset
for sample in dataset:
    # Get the detections
    detections = sample.ground_truth.detections
    # Filter out non-person detections
    detections = [d for d in detections if d.label == "person"]
    # Update the detections
    sample.ground_truth.detections = detections
    # Save the sample
    sample.save()

In [None]:
# Iterate over the dataset_test
for sample in dataset_test:
    # Get the detections
    detections = sample.ground_truth.detections
    # Filter out non-person detections
    detections = [d for d in detections if d.label == "person"]
    # Update the detections
    sample.ground_truth.detections = detections
    # Save the sample
    sample.save()

Check if the ground truth detections label only `person` class

In [None]:
# Classes list
classes = dataset.distinct("ground_truth.detections.label")
print(len(classes))
classes

1


['person']

## Prepare dependencies

In [None]:
!wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py
!wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py
!wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py
!wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py
!wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py

--2023-11-30 14:53:52--  https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23628 (23K) [text/plain]
Saving to: ‘transforms.py.1’


2023-11-30 14:53:52 (128 MB/s) - ‘transforms.py.1’ saved [23628/23628]

--2023-11-30 14:53:52--  https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4063 (4.0K) [text/plain]
Saving to: ‘engine.py.1’


2023-11-30 14:53:52 (58.2 MB/s) - ‘e

## Import dependencies

In [None]:
import os
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import torchvision
from engine import train_one_epoch, evaluate
import utils
import torch
import json
import fiftyone.utils.coco as fouc
from torchvision.io import read_image, ImageReadMode

## Create data loader

In [None]:
class ObjectDataset(torch.utils.data.Dataset):
    def __init__(
        self,
        fiftyone_dataset,
        transforms=None,
        gt_field="ground_truth",
        classes=None,
    ):
        self.samples = fiftyone_dataset
        self.transforms = transforms
        self.gt_field = gt_field

        self.img_paths = self.samples.values("filepath")

        self.classes = classes
        if not self.classes:
            # Get list of distinct labels that exist in the view
            self.classes = self.samples.distinct(
                "%s.detections.label" % gt_field
            )

        if self.classes[0] != "background":
            self.classes = ["background"] + self.classes

        self.labels_map_rev = {c: i for i, c in enumerate(self.classes)}

    def __getitem__(self, idx):
        img_path = self.img_paths[idx]
        sample = self.samples[img_path]
        metadata = sample.metadata
        img = read_image(img_path, mode=ImageReadMode.RGB )

        boxes = []
        labels = []
        area = []
        iscrowd = []
        detections = sample[self.gt_field].detections
        for det in detections:
            category_id = self.labels_map_rev[det.label]
            coco_obj = fouc.COCOObject.from_label(
                det, metadata, category_id=category_id,
            )
            x, y, w, h = coco_obj.bbox
            boxes.append([x, y, x + w, y + h])
            labels.append(coco_obj.category_id)
            area.append(coco_obj.area)
            iscrowd.append(coco_obj.iscrowd)

        target = {}
        target["boxes"] = torch.as_tensor(boxes, dtype=torch.float32)
        target["labels"] = torch.as_tensor(labels, dtype=torch.int64)
        target["image_id"] = torch.as_tensor([idx])
        target["area"] = torch.as_tensor(area, dtype=torch.float32)
        target["iscrowd"] = torch.as_tensor(iscrowd, dtype=torch.int64)

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.img_paths)

    def get_classes(self):
        return self.classes

In [None]:
from torchvision.transforms import v2 as T

def get_transform(train):
    transforms = []
    transforms.append(T.ToPILImage())
    transforms.append(T.ToTensor())
    transforms.append(T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)))
    return T.Compose(transforms)

In [None]:
train_split = dataset.take(4000)
val_split = dataset.exclude(train_split)

In [None]:
torch_dataset = ObjectDataset(train_split, get_transform(train=True))
torch_dataset_val = ObjectDataset(val_split, get_transform(train=False))
torch_dataset_test = ObjectDataset(dataset_test, get_transform(train=False))
torch_dataset



<__main__.ObjectDataset at 0x78425da15060>

## Prepare Model

In [None]:
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

### Backbone Selection

Choose one either `GoogLeNet (Inception V1)` or `Inception V3` for the backbone of Faster R-CNN

GoogLeNet (Inception V1)

In [None]:
# backbone googlenet
googlenet = torchvision.models.googlenet(weights="DEFAULT")
backbone = torch.nn.Sequential(*list(googlenet.children())[:-3])


backbone.out_channels = 1024

anchor_generator = AnchorGenerator()

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names= ['0'], output_size=7, sampling_ratio=2)

model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler
                   )

Downloading: "https://download.pytorch.org/models/googlenet-1378be20.pth" to /root/.cache/torch/hub/checkpoints/googlenet-1378be20.pth
100%|██████████| 49.7M/49.7M [00:00<00:00, 56.9MB/s]


Inception V3

In [None]:
inception = torchvision.models.inception_v3(pretrained=True)

modules = list(inception.children())[:-3]
modules.pop(-4)
backbone = torch.nn.Sequential(*modules)


backbone.out_channels = 2048

anchor_generator = AnchorGenerator()

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names= ['0'], output_size=7, sampling_ratio=2)

model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)



## Train model

In [None]:
from engine import train_one_epoch, evaluate
import utils

def train(model, torch_dataset, torch_dataset_val, num_epochs=4):
    # train on the GPU or on the CPU, if a GPU is not available
    print("# train on the GPU or on the CPU, if a GPU is not available")
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    # our dataset has two classes only - background and person
    print("# our dataset has two classes only - background and person")
    num_classes = 2

    # define training and validation data loaders
    print("# define training and validation data loaders")
    data_loader = torch.utils.data.DataLoader(
        torch_dataset,
        batch_size=4,
        shuffle=True,
        num_workers=4,
        collate_fn=utils.collate_fn)

    print("data_loader finished")

    data_loader_val = torch.utils.data.DataLoader(
        torch_dataset_val, batch_size=2, shuffle=False, num_workers=4,
        collate_fn=utils.collate_fn)
    print("data_loader_val finished")

    # move model to the right device
    print("# move model to the right device")
    model.to(device)

    # construct an optimizer
    print("# construct an optimizer")
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.0001, momentum=0.9, weight_decay=0.0001)
    # and a learning rate scheduler
    print("# and a learning rate scheduler")
    lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer,
                                                        milestones=[16,22],
                                                #    step_size=3,
                                                   gamma=0.1
                                                        )

    # let's train it for 10 epochs
    print(f"# let's train it for {num_epochs} epochs")

    for epoch in range(num_epochs):
        # train for one epoch, printing every 100 iterations
        print(f"# train for one epoch, printing every 100 iterations")
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=100)
        # update the learning rate
        print("# update the learning rate")
        lr_scheduler.step()
        # evaluate on the test dataset
        print("# evaluate on the val dataset")
        evaluate(model, data_loader_val, device=device)

    print("That's it!")

def evaluate(model, torch_dataset_val, device):
    cpu_device = torch.device("cpu")
    preds = []
    targets = []
    model.eval()
    for img, target in torch_dataset_val:
        with torch.no_grad():
            pred = model([img[0].to(device)])
            pred = [{k: v.to(cpu_device) for k, v in t.items()} for t in pred]
            preds.append(pred)
            targets.append(target)

In [None]:
train(model, torch_dataset, torch_dataset_val, num_epochs=5)

# Evaluation

Load trained model here based on selected backbone

In [None]:
# model_path = '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/models/fasterrcnn_inceptionv3_model.pt'
# state_dict = torch.load(model_path)
# model.load_state_dict(state_dict)

<All keys matched successfully>

### Inference Time

Use it once at first run evaluation

In [None]:
import time
# Measure Inference Time

def print_inference_time(model):
    model.eval()
    model.to(device)
    image_paths = torch_dataset_test.img_paths
    classes = torch_dataset_test.classes
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    for img, targets in torch_dataset_test:
        # Get FiftyOne sample indexed by unique image filepath
        img_id = int(targets["image_id"][0])
        img_path = image_paths[img_id]
        sample = dataset_test[img_path]

        # Measure inference time
        with torch.no_grad():
            start_time = time.time()
            preds = model(img.unsqueeze(0).to(device))[0]
            end_time = time.time()

        inference_time = end_time - start_time
        print(f"Inference time: {inference_time} seconds")
        break

print_inference_time(model)

AssertionError: ignored

### Add predictions

In [None]:
def convert_torch_predictions(preds, det_id, s_id, w, h, classes):
    # Convert the outputs of the torch model into a FiftyOne Detections object
    dets = []

    for bbox, label, score in zip(
        preds["boxes"].cpu().detach().numpy(),
        preds["labels"].cpu().detach().numpy(),
        preds["scores"].cpu().detach().numpy()
    ):
        # Parse prediction into FiftyOne Detection object
        x0,y0,x1,y1 = bbox
        coco_obj = fouc.COCOObject(det_id, s_id, int(label), [x0, y0, x1-x0, y1-y0])
        det = coco_obj.to_detection((w,h), classes)
        det["confidence"] = float(score)
        dets.append(det)
        det_id += 1

    detections = fo.Detections(detections=dets)

    return detections, det_id

def add_detections(model, torch_dataset, view, field_name="predictions"):
    # Run inference on a dataset and add results to FiftyOne
    torch.set_num_threads(1)
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    print("Using device %s" % device)

    model.eval()
    model.to(device)
    image_paths = torch_dataset.img_paths
    classes = torch_dataset.classes
    det_id = 0

    with fo.ProgressBar() as pb:
        for img, targets in pb(torch_dataset):
            # Get FiftyOne sample indexed by unique image filepath
            img_id = int(targets["image_id"][0])
            img_path = image_paths[img_id]
            sample = view[img_path]
            s_id = sample.id
            w = sample.metadata["width"]
            h = sample.metadata["height"]

            # Inference
            preds = model(img.unsqueeze(0).to(device))[0]

            detections, det_id = convert_torch_predictions(
                preds,
                det_id,
                s_id,
                w,
                h,
                classes
            )

            sample[field_name] = detections
            sample.save()


# def convert_torch_predictions(preds, det_id, s_id, w, h, classes, nms_threshold=0.5):
#     # Convert the outputs of the torch model into a FiftyOne Detections object
#     dets = []

#     # Apply NMS to the predictions
#     keep = torchvision.ops.nms(preds["boxes"], preds["scores"], iou_threshold=nms_threshold)
#     preds["boxes"] = preds["boxes"][keep]
#     preds["labels"] = preds["labels"][keep]
#     preds["scores"] = preds["scores"][keep]

#     # # Filter detections based on confidence threshold
#     # conf_mask = preds["scores"] >= conf_threshold
#     # preds["boxes"] = preds["boxes"][conf_mask]
#     # preds["labels"] = preds["labels"][conf_mask]
#     # preds["scores"] = preds["scores"][conf_mask]

#     for bbox, label, score in zip(
#         preds["boxes"].cpu().detach().numpy(),
#         preds["labels"].cpu().detach().numpy(),
#         preds["scores"].cpu().detach().numpy()
#     ):
#         # Parse prediction into FiftyOne Detection object
#         x0, y0, x1, y1 = bbox
#         coco_obj = fouc.COCOObject(det_id, s_id, int(label), [x0, y0, x1 - x0, y1 - y0])
#         det = coco_obj.to_detection((w, h), classes)
#         det["confidence"] = float(score)
#         dets.append(det)
#         det_id += 1

#     detections = fo.Detections(detections=dets)

#     return detections, det_id

# def add_detections(model, torch_dataset, view, field_name="predictions", nms_threshold=0.5):
#     # Run inference on a dataset and add results to FiftyOne
#     torch.set_num_threads(1)
#     device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
#     print("Using device %s" % device)

#     model.eval()
#     model.to(device)
#     image_paths = torch_dataset.img_paths
#     classes = torch_dataset.classes
#     det_id = 0

#     with fo.ProgressBar() as pb:
#         for img, targets in pb(torch_dataset):
#             # Get FiftyOne sample indexed by unique image filepath
#             img_id = int(targets["image_id"][0])
#             img_path = image_paths[img_id]
#             sample = view[img_path]
#             s_id = sample.id
#             w = sample.metadata["width"]
#             h = sample.metadata["height"]

#             # Inference
#             preds = model(img.unsqueeze(0).to(device))[0]

#             detections, det_id = convert_torch_predictions(
#                 preds,
#                 det_id,
#                 s_id,
#                 w,
#                 h,
#                 classes,
#                 nms_threshold
#             )

#             sample[field_name] = detections
#             sample.save()

In [None]:
add_detections(model, torch_dataset_test, dataset_test, field_name="predictions")

Using device cuda
inference: 1.6528136730194092
   0% |/----------------|   0/500 [1.7s elapsed, ? remaining, ? samples/s]   


INFO:eta.core.utils:   0% |/----------------|   0/500 [1.7s elapsed, ? remaining, ? samples/s]   


In [None]:
print('maps(0.5-0.95)', np.mean(np.array(maps)))

maps(0.5-0.95) 0.1329818590102886


In [None]:
results = fo.evaluate_detections(
  dataset_test,
  "predictions",
  classes=["person"],
  eval_key="eval",
  classwise=False,
  compute_mAP=True,
)

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |█████████████████| 500/500 [1.7m elapsed, 0s remaining, 5.7 samples/s]        


INFO:eta.core.utils: 100% |█████████████████| 500/500 [1.7m elapsed, 0s remaining, 5.7 samples/s]        


Performing IoU sweep...


INFO:fiftyone.utils.eval.coco:Performing IoU sweep...


 100% |█████████████████| 500/500 [31.2s elapsed, 0s remaining, 22.5 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 500/500 [31.2s elapsed, 0s remaining, 22.5 samples/s]      


### mAP Score

In [None]:
results.mAP()

0.13298185901028858

### Classification Report

In [None]:
counts = dataset_test.count_values("ground_truth.detections.label")
counts

{'person': 2090}

In [None]:
# Get the 10 most common classes in the dataset
classes_top = sorted(counts, key=counts.get, reverse=True)
results.print_report(classes=classes_top)

              precision    recall  f1-score   support

      person       0.21      0.84      0.34      5012

   micro avg       0.21      0.84      0.34      5012
   macro avg       0.21      0.84      0.34      5012
weighted avg       0.21      0.84      0.34      5012



In [None]:
results.plot_pr_curves(classes=["person"])

In [None]:
results.plot_confusion_matrix()


Interactive plots are currently only supported in Jupyter notebooks. Support outside of notebooks and in Google Colab and Databricks will be included in an upcoming release. In the meantime, you can still use this plot, but note that (i) selecting data will not trigger callbacks, and (ii) you must manually call `plot.show()` to launch a new plot that reflects the current state of an attached session.

See https://docs.voxel51.com/user_guide/plots.html#working-in-notebooks for more information.





Save model

In [None]:
torch.save(model.state_dict(), '/content/drive/MyDrive/cv_bootcamp_indonesiaai/project_2/models/fasterrcnn_inceptionv3_model.pt')

# Test Visualization

In [None]:
from fiftyone import ViewField as F

In [None]:
session = fo.launch_app(dataset_test)


Welcome to

███████╗██╗███████╗████████╗██╗   ██╗ ██████╗ ███╗   ██╗███████╗
██╔════╝██║██╔════╝╚══██╔══╝╚██╗ ██╔╝██╔═══██╗████╗  ██║██╔════╝
█████╗  ██║█████╗     ██║    ╚████╔╝ ██║   ██║██╔██╗ ██║█████╗
██╔══╝  ██║██╔══╝     ██║     ╚██╔╝  ██║   ██║██║╚██╗██║██╔══╝
██║     ██║██║        ██║      ██║   ╚██████╔╝██║ ╚████║███████╗
╚═╝     ╚═╝╚═╝        ╚═╝      ╚═╝    ╚═════╝ ╚═╝  ╚═══╝╚══════╝ v0.22.3

If you're finding FiftyOne helpful, here's how you can get involved:

|
|  ⭐⭐⭐ Give the project a star on GitHub ⭐⭐⭐
|  https://github.com/voxel51/fiftyone
|
|  🚀🚀🚀 Join the FiftyOne Slack community 🚀🚀🚀
|  https://slack.voxel51.com
|



INFO:fiftyone.core.session.session:
Welcome to

███████╗██╗███████╗████████╗██╗   ██╗ ██████╗ ███╗   ██╗███████╗
██╔════╝██║██╔════╝╚══██╔══╝╚██╗ ██╔╝██╔═══██╗████╗  ██║██╔════╝
█████╗  ██║█████╗     ██║    ╚████╔╝ ██║   ██║██╔██╗ ██║█████╗
██╔══╝  ██║██╔══╝     ██║     ╚██╔╝  ██║   ██║██║╚██╗██║██╔══╝
██║     ██║██║        ██║      ██║   ╚██████╔╝██║ ╚████║███████╗
╚═╝     ╚═╝╚═╝        ╚═╝      ╚═╝    ╚═════╝ ╚═╝  ╚═══╝╚══════╝ v0.22.3

If you're finding FiftyOne helpful, here's how you can get involved:

|
|  ⭐⭐⭐ Give the project a star on GitHub ⭐⭐⭐
|  https://github.com/voxel51/fiftyone
|
|  🚀🚀🚀 Join the FiftyOne Slack community 🚀🚀🚀
|  https://slack.voxel51.com
|



## Conclusion

- In this experiment, Faster R-CNN with backbone `GoogLeNet Inception V1` and `Inception V3` produce low metrics with mAP below `0.2` and consume lots of training time (took 2 to 2,5 hours approx for 5 epochs only)
- It took quite long inference time around 1 second for 1 input image
- The prediction seems to be contain many bounding boxes for single ground truth detections
- Although seems to be quite good for predicting single and big ground truth detections, but the model performs very bad on crowd and small ground truth detections