<!-- Autogenerated by `scripts/make_examples.py` -->
<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/voxel51/fiftyone-examples/blob/master/examples/pytorch_detection_training.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791629-6e618700-5769-11eb-857f-d176b37d2496.png" height="32" width="32">
            Try in Google Colab
        </a>
    </td>
    <td>
        <a target="_blank" href="https://nbviewer.jupyter.org/github/voxel51/fiftyone-examples/blob/master/examples/pytorch_detection_training.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791634-6efa1d80-5769-11eb-8a4c-71d6cb53ccf0.png" height="32" width="32">
            Share via nbviewer
        </a>
    </td>
    <td>
        <a target="_blank" href="https://github.com/voxel51/fiftyone-examples/blob/master/examples/pytorch_detection_training.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791633-6efa1d80-5769-11eb-8ee3-4b2123fe4b66.png" height="32" width="32">
            View on GitHub
        </a>
    </td>
    <td>
        <a href="https://github.com/voxel51/fiftyone-examples/raw/master/examples/pytorch_detection_training.ipynb" download>
            <img src="https://user-images.githubusercontent.com/25985824/104792428-60f9cc00-576c-11eb-95a4-5709d803023a.png" height="32" width="32">
            Download notebook
        </a>
    </td>
</table>


# PyTorch object detection model training

[PyTorch](https://pytorch.org/) datasets provide a great starting point for loading complex datasets, letting us define a class to load individual samples from disk and then creating data loaders to efficiently supply the data to our model. Problems arise when we want to start iterating over the dataset itself. PyTorch datasets are fairly rigid and require us to either rewrite them or the underlying data on disk if we want to make any changes to the data we are training or testing our model on. That is where [FiftyOne](http://fiftyone.ai) comes in.

PyTorch datasets can synergize well with [FiftyOne datasets](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#using-fiftyone-datasets) for hard computer vision problems like [classification, object detection, segmentation, and more](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#labels). 
The flexibility of FiftyOne datasets allows us to easily experiment with and finetune the datasets we use for training and testing to create better-performing models, faster.
In this project, We are focusing on object detection since that is one of the most common vision tasks while also being fairly complex. However, these methods work for most machine learning tasks. Specifically, this notebook covers:
* Loading unlabeled dataset into FiftyOne
* Having dataset ground_truth established by applying a model from FiftyOne Zoo
* Writing a PyTorch object detection dataset that utilizes the loaded FiftyOne dataset
* Exploring views into our FiftyOne dataset for training and evaluation
* Training a Torchvision object detection model on our FiftyOne dataset views
* Evaluating our models in FiftyOne to refine the dataset

## Setup

**Before we begin** As we are running this in Colab, we need to select a GPU instance under `Runtime` > `Change runtime type` since this notebook contains some model training.

To start, we need to install fiftyone, pytorch, and torchvision, as well as clone the torchvision GitHub repository to use the training and evaluation utilities provided for the [Torchvision Object Deteciton Tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#defining-the-dataset) that we are using to train a basic object detection model.

In [1]:
!pip install fiftyone
!pip install torch torchvision
!pip install opencv-python-headless==4.5.4.60 fiftyone

Collecting fiftyone
  Downloading fiftyone-0.15.1-py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 5.3 MB/s 
[?25hCollecting pymongo<4,>=3.11
  Downloading pymongo-3.12.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (508 kB)
[K     |████████████████████████████████| 508 kB 62.0 MB/s 
[?25hCollecting fiftyone-brain<0.9,>=0.8
  Downloading fiftyone_brain-0.8.2-py3-none-any.whl (47 kB)
[K     |████████████████████████████████| 47 kB 5.4 MB/s 
[?25hCollecting eventlet
  Downloading eventlet-0.33.1-py2.py3-none-any.whl (226 kB)
[K     |████████████████████████████████| 226 kB 68.5 MB/s 
[?25hCollecting opencv-python-headless
  Downloading opencv_python_headless-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (47.8 MB)
[K     |████████████████████████████████| 47.8 MB 1.2 MB/s 
[?25hCollecting Deprecated
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting Jinja2>=3
  Downloading Jinja2-3.1.2-py3-none-any.

If runnning into a `cv2` error when importing FiftyOne later on, it is an issue with OpenCV in Colab environments. [Follow these instructions to resolve it.](https://github.com/voxel51/fiftyone/issues/1494#issuecomment-1003148448)

In [2]:
%%shell

# Download TorchVision repo to use some files from
# references/detection
git clone https://github.com/pytorch/vision.git
cd vision
git checkout v0.3.0

cp references/detection/utils.py ../
cp references/detection/transforms.py ../
cp references/detection/coco_eval.py ../
cp references/detection/engine.py ../
cp references/detection/coco_utils.py ../


Cloning into 'vision'...
remote: Enumerating objects: 139041, done.[K
remote: Counting objects: 100% (5258/5258), done.[K
remote: Compressing objects: 100% (545/545), done.[K
remote: Total 139041 (delta 4818), reused 5085 (delta 4691), pack-reused 133783[K
Receiving objects: 100% (139041/139041), 274.18 MiB | 36.62 MiB/s, done.
Resolving deltas: 100% (122322/122322), done.
Note: checking out 'v0.3.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at be376084d version check against PyTorch's CUDA version




## Loading our data

Getting our data into FiftyOne is oftentimes actually easier than getting it into a PyTorch dataset. Additionally, once the data is in FiftyOne it is much more flexible allowing you to easily find and access even the most specific subsets of data that you can then use to train or evaluate your model.

As our image data follows a certain format on disk, we need to load it into FiftyOne in [one line of code](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html).

In this notebook, we are going to work with Food-101 and load it from Google Drive.

In [3]:
import torch

torch.manual_seed(1)

<torch._C.Generator at 0x7f9a6ebe56b0>

In [4]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [5]:
#!unzip -uq "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101.zip" -d "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Output"

In [6]:
#Iterate through images in the output folder including all its subdirectories and create a new directory to store all images without hierarchy sub-folders

import glob
import os
import shutil

# image location with subdirectories 
data_path = "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Output/images"

# Destination location to copy all image files
direct_image_path = "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101"
#os.mkdir(direct_image_path)

#for file in glob.iglob('%s/**/*.jpg' % data_path, recursive=True):
    #shutil.move(file, direct_image_path)

In [10]:
from PIL import Image

for images in os.listdir(direct_image_path):
  print(images)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
3644842.jpg
3808954.jpg
2662788.jpg
2609869.jpg
3685.jpg
3398820.jpg
2685231.jpg
3430085.jpg
2778205.jpg
2572431.jpg
2646473.jpg
3081039.jpg
368654.jpg
2397908.jpg
3219655.jpg
3164695.jpg
3465676.jpg
3147052.jpg
3367659.jpg
2812281.jpg
3232542.jpg
3668972.jpg
3181758.jpg
2354083.jpg
3550491.jpg
3500162.jpg
2985731.jpg
3622258.jpg
3053839.jpg
3371455.jpg
3629012.jpg
2646314.jpg
3277203.jpg
3426780.jpg
3807968.jpg
3296872.jpg
3401830.jpg
2612209.jpg
3277479.jpg
3297983.jpg
381208.jpg
3529810.jpg
2449908.jpg
3268468.jpg
3504462.jpg
3338128.jpg
3849048.jpg
2941936.jpg
3805698.jpg
2930992.jpg
3488670.jpg
3769634.jpg
230916.jpg
3217707.jpg
3664242.jpg
3333485.jpg
3485612.jpg
3051591.jpg
318343.jpg
3246685.jpg
3662099.jpg
298196.jpg
3741480.jpg
3568778.jpg
3260120.jpg
3710401.jpg
2985957.jpg
319737.jpg
3384112.jpg
2339036.jpg
2497784.jpg
2551321.jpg
3243342.jpg
347874.jpg
2841442.jpg
3704723.jpg
3345281.jpg
2439161.jpg
3484902.j

In [11]:
# Counting the total amount of images in the dataset
images_list = []
for images in os.listdir(direct_image_path):
  images_list.append(images)

print(len(images_list))

52857


In [12]:
# Image dataset massive size results in extremely long runtime.
# Need to create a new directory to save 5000 randomised images from the existing directory direct_image_path

#import random

#files_list = []

#for root, dirs, files in os.walk(direct_image_path):
#  for file in files:
#    if file.endswith(".jpg") or file.endswith(".jepg"):
#      files_list.append(os.path.join(root,file))

#images_path = "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_images"
#if os.path.isdir(images_path) == False:
#  os.mkdir(images_path)

#selected_images = random.sample(files_list,5000)

#for file in selected_images:
#  shutil.copy(file, images_path)

In [13]:
# As we have unlabeled images, we will first need to generate object labels. We can generate ground truth labels with an existing pretrained model.
# Load unlabeled data into FiftyOne and generate predictions with the FiftyOne Model Zoo, then save dataset in object detection COCO format.

!eta install models

import fiftyone as fo
import fiftyone.zoo as foz

# Load raw images into FiftyOne
fo_dataset = fo.Dataset.from_dir(
    "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_images", 
    dataset_type=fo.types.ImageDirectory, 
    name="Ingredient_Detection_Dataset"
)

Cloning https://github.com/voxel51/models
Cloning into '/usr/local/lib/python3.7/dist-packages/eta/tensorflow/models'...
remote: Enumerating objects: 30876, done.[K
remote: Total 30876 (delta 0), reused 0 (delta 0), pack-reused 30876[K
Receiving objects: 100% (30876/30876), 532.07 MiB | 36.07 MiB/s, done.
Resolving deltas: 100% (19430/19430), done.
Installing protobuf
Found protoc
Compiling protocol buffers
Installing tf_slim
Collecting tf_slim
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
[K     |████████████████████████████████| 352 kB 5.3 MB/s 
Installing collected packages: tf-slim
Successfully installed tf-slim-1.1.0
Installation complete
NumExpr defaulting to 2 threads.
Migrating database to v0.15.1
 100% |███████████████| 5000/5000 [1.0s elapsed, 0s remaining, 4.8K samples/s]         


In [14]:
# Load an object detection model from Fiftyone model zoo and generate predictions
model = foz.load_zoo_model("ssd-mobilenet-v1-fpn-coco-tf")
fo_dataset.apply_model(model, label_field="predictions")


Downloading model from Google Drive ID '1ZvVQTuDIexyfntq8ajBVp5aSndHW6wQk'...
 100% |████|  391.7Mb/391.7Mb [314.5ms elapsed, 0s remaining, 1.2Gb/s]        
 100% |███████████████| 5000/5000 [7.6m elapsed, 0s remaining, 13.3 samples/s]      


In [15]:
# Export labeled dataset in COCO format
fo_dataset.export(
    export_dir="/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_COCO_format",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="predictions",
)

Directory '/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_COCO_format' already exists; export will be merged with existing files
 100% |███████████████| 5000/5000 [4.4m elapsed, 0s remaining, 78.5 samples/s]      


In [16]:
# Now we need to import labeled dataset in COCO format. First we need to provde data_path and labels_path

# The directory containing the source images
coco_data_path = "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_COCO_format/data"

# The path to the COCO labels JSON file
coco_labels_path = "/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_COCO_format/labels.json"

# Import the new dataset
fo_dataset_new = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path=coco_data_path,
    labels_path=coco_labels_path,
)

 100% |███████████████| 5000/5000 [20.0s elapsed, 0s remaining, 257.7 samples/s]      


We will be needing the height and width of images later in this notebook so we need to compute metadata on our dataset.

In [17]:
fo_dataset_new.compute_metadata()

We can create a session and visualize this dataset in the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html).

In [18]:
session = fo.launch_app(fo_dataset_new)

Output hidden; open in https://colab.research.google.com to view.

## PyTorch dataset and training setup

A [PyTorch dataset](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class) is a class that defines how to load a static dataset and its labels from disk via a simple iterator interface. They differ from FiftyOne datasets which are flexible representations of our data geared towards visualization, querying, and understanding.

Every PyTorch model expects data and labels to pass into it in a certain format. Before being able to write up a PyTorch dataset class, we first need to understand the format that the model requires. Namely, we need to know exactly what format the data loader is expected to output when iterating through the dataset so that we can properly define the `__getitem__` method in the PyTorch dataset.
In this project, we follow the [Torchvision object detection tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#defining-the-dataset) and construct a PyTorch dataset to work with their RCNN-based models.

In [19]:
import torch
import fiftyone.utils.coco as fouc
from PIL import Image


class FiftyOneTorchDataset(torch.utils.data.Dataset):
    """A class to construct a PyTorch dataset from a FiftyOne dataset.
    
    Args:
        fiftyone_dataset: a FiftyOne dataset or view that will be used for training or testing
        transforms (None): a list of PyTorch transforms to apply to images and targets when loading
        gt_field ("ground_truth"): the name of the field in fiftyone_dataset that contains the 
            desired labels to load
        classes (None): a list of class strings that are used to define the mapping between
            class names and indices. If None, it will use all classes present in the given fiftyone_dataset.
    """

    def __init__(
        self,
        fiftyone_dataset,
        transforms=None,
        gt_field="ground_truth",
        classes=None,
    ):
        self.samples = fiftyone_dataset
        self.transforms = transforms
        self.gt_field = gt_field

        self.img_paths = self.samples.values("filepath")

        self.classes = classes
        if not self.classes:
            # Get list of distinct labels that exist in the view
            self.classes = self.samples.distinct(
                "%s.detections.label" % gt_field
            )

        if self.classes[0] != "background":
            self.classes = ["background"] + self.classes

        self.labels_map_rev = {c: i for i, c in enumerate(self.classes)}

    def __getitem__(self, idx):
        img_path = self.img_paths[idx]
        sample = self.samples[img_path]
        metadata = sample.metadata
        img = Image.open(img_path).convert("RGB")

        boxes = []
        labels = []
        area = []
        iscrowd = []
        detections = sample[self.gt_field].detections
        for det in detections:
            category_id = self.labels_map_rev[det.label]
            coco_obj = fouc.COCOObject.from_label(
                det, metadata, category_id=category_id,
            )
            x, y, w, h = coco_obj.bbox
            boxes.append([x, y, x + w, y + h])
            labels.append(coco_obj.category_id)
            area.append(coco_obj.area)
            iscrowd.append(coco_obj.iscrowd)

        target = {}
        target["boxes"] = torch.as_tensor(boxes, dtype=torch.float32)
        target["labels"] = torch.as_tensor(labels, dtype=torch.int64)
        target["image_id"] = torch.as_tensor([idx])
        target["area"] = torch.as_tensor(area, dtype=torch.float32)
        target["iscrowd"] = torch.as_tensor(iscrowd, dtype=torch.int64)

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.img_paths)

    def get_classes(self):
        return self.classes

The following code loads Faster-RCNN with a ResNet50 backbone from Torchvision and modifies the classifier for the number of classes we are training on:

In [20]:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def get_model(num_classes):
    # load a model pre-trained pre-trained on COCO
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    
    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model

For this example, we are going to write a simple training loop. This function is going to take a model and our PyTorch datasets as input and use the [`train_one_epoch()`](https://github.com/pytorch/vision/blob/master/references/detection/engine.py) and [`evaluate()`](https://github.com/pytorch/vision/blob/master/references/detection/engine.py) functions from the Torchvision object detection code

In [21]:
# Import functions from the torchvision references we cloned
from engine import train_one_epoch, evaluate
import utils

def do_training(model, torch_dataset, torch_dataset_test, num_epochs=4):
    # define training and validation data loaders
    data_loader = torch.utils.data.DataLoader(
        torch_dataset, batch_size=2, shuffle=True, num_workers=2,
        collate_fn=utils.collate_fn)
    
    data_loader_test = torch.utils.data.DataLoader(
        torch_dataset_test, batch_size=1, shuffle=False, num_workers=2,
        collate_fn=utils.collate_fn)

    # train on the GPU or on the CPU, if a GPU is not available
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    print("Using device %s" % device)

    # move model to the right device
    model.to(device)

    # construct an optimizer
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # and a learning rate scheduler
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                    step_size=3,
                                                    gamma=0.1)

    for epoch in range(num_epochs):
        # train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)

        # update the learning rate
        lr_scheduler.step()
        # evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

## FiftyOne views and datasets

One of the primary ways of interacting with our FiftyOne dataset is through different [views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) into our dataset. These are constructed by applying operations like filtering, sorting, slicing, etc, that result in a specific view into certain labels/samples of the dataset. These operations make it easier to experiment with different subsets of data and continue to finetune our dataset to train better models.



For example, cluttered images make it difficult for models to localize objects. We can use FiftyOne to create a view containing only samples with more than, say, 10 objects. You can perform the same operations on views as datasets, so we can create an instance of our PyTorch dataset from this view:

In [22]:
from fiftyone import ViewField as F

busy_view = fo_dataset_new.match(F("ground_truth.detections").length() > 10)

busy_torch_dataset = FiftyOneTorchDataset(busy_view)

In [23]:
session.view = busy_view

Output hidden; open in https://colab.research.google.com to view.

Another example is if we want to train a model that is used primarily for ingredient detection. We can create training and testing views that only contain the classes of ingredients:

In [24]:
from fiftyone import ViewField as F

ingredients_list = ["apple", "ribs", "bread", "calamari", "crab", "beef", "rice", "noodle", "fish", "carrot", "oyster", "egg", "seaweed", "ice cream", "onion", "chips", 
                    "strawberry","lettuce","lemon","orange","cream","pea","chocolate","donut","sandwich"]
ingredients_view = fo_dataset_new.filter_labels("ground_truth",
        F("label").is_in(ingredients_list))

print(len(ingredients_view))

3108


In [41]:
session.view = ingredients_view

Output hidden; open in https://colab.research.google.com to view.

In [26]:
# From the torchvision references we cloned
import transforms as T

train_transforms = T.Compose([T.ToTensor(), T.RandomHorizontalFlip(0.5)])
test_transforms = T.Compose([T.ToTensor()])

In [27]:
# split the dataset in train and test set
train_view = ingredients_view.take(2500, seed=51)
test_view = ingredients_view.exclude([s.id for s in train_view])

In [28]:
# use our dataset and defined transformations
torch_dataset = FiftyOneTorchDataset(train_view, train_transforms,
        classes=ingredients_list)
torch_dataset_test = FiftyOneTorchDataset(test_view, test_transforms, 
        classes=ingredients_list)

## Training and Evaluation

In this section, we use the functions and datasets we defined above to initialize, train, and evaluate a model continuing with the ingredients. 

In [29]:
model = get_model(len(ingredients_list)+1)

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth


  0%|          | 0.00/160M [00:00<?, ?B/s]

In [30]:
do_training(model, torch_dataset, torch_dataset_test, num_epochs=4)

Using device cuda


  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Epoch: [0]  [   0/1250]  eta: 0:35:30  lr: 0.000010  loss: 3.6097 (3.6097)  loss_classifier: 3.4625 (3.4625)  loss_box_reg: 0.1304 (0.1304)  loss_objectness: 0.0055 (0.0055)  loss_rpn_box_reg: 0.0113 (0.0113)  time: 1.7046  data: 0.4790  max mem: 1625
Epoch: [0]  [  10/1250]  eta: 0:12:07  lr: 0.000060  loss: 3.7445 (3.6711)  loss_classifier: 3.4356 (3.3712)  loss_box_reg: 0.2773 (0.2842)  loss_objectness: 0.0060 (0.0060)  loss_rpn_box_reg: 0.0090 (0.0097)  time: 0.5868  data: 0.0507  max mem: 2823
Epoch: [0]  [  20/1250]  eta: 0:10:51  lr: 0.000110  loss: 3.1018 (3.0550)  loss_classifier: 2.8014 (2.7582)  loss_box_reg: 0.2773 (0.2724)  loss_objectness: 0.0084 (0.0136)  loss_rpn_box_reg: 0.0062 (0.0108)  time: 0.4711  data: 0.0085  max mem: 2823
Epoch: [0]  [  30/1250]  eta: 0:10:27  lr: 0.000160  loss: 1.3755 (2.3985)  loss_classifier: 1.0610 (2.0870)  loss_box_reg: 0.2782 (0.2865)  loss_objectness: 0.0129 (0.0135)  loss_rpn_box_reg: 0.0072 (0.0114)  time: 0.4741  data: 0.0086  max me

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Test:  [  0/608]  eta: 0:04:17  model_time: 0.1940 (0.1940)  evaluator_time: 0.0101 (0.0101)  time: 0.4242  data: 0.2186  max mem: 4398
Test:  [100/608]  eta: 0:00:54  model_time: 0.0884 (0.0949)  evaluator_time: 0.0027 (0.0037)  time: 0.1069  data: 0.0050  max mem: 4398
Test:  [200/608]  eta: 0:00:43  model_time: 0.0864 (0.0945)  evaluator_time: 0.0033 (0.0042)  time: 0.1077  data: 0.0062  max mem: 4398
Test:  [300/608]  eta: 0:00:33  model_time: 0.0889 (0.0949)  evaluator_time: 0.0029 (0.0040)  time: 0.1093  data: 0.0056  max mem: 4398
Test:  [400/608]  eta: 0:00:22  model_time: 0.0863 (0.0950)  evaluator_time: 0.0026 (0.0040)  time: 0.1044  data: 0.0050  max mem: 4398
Test:  [500/608]  eta: 0:00:11  model_time: 0.0871 (0.0948)  evaluator_time: 0.0025 (0.0038)  time: 0.1179  data: 0.0162  max mem: 4398
Test:  [600/608]  eta: 0:00:00  model_time: 0.0858 (0.0948)  evaluator_time: 0.0023 (0.0039)  time: 0.1109  data: 0.0110  max mem: 4398
Test:  [607/608]  eta: 0:00:00  model_time: 0.08

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Epoch: [1]  [   0/1250]  eta: 0:20:26  lr: 0.005000  loss: 0.1838 (0.1838)  loss_classifier: 0.1093 (0.1093)  loss_box_reg: 0.0554 (0.0554)  loss_objectness: 0.0122 (0.0122)  loss_rpn_box_reg: 0.0070 (0.0070)  time: 0.9811  data: 0.4928  max mem: 4398
Epoch: [1]  [  10/1250]  eta: 0:11:40  lr: 0.005000  loss: 0.2967 (0.3179)  loss_classifier: 0.1454 (0.1634)  loss_box_reg: 0.1336 (0.1333)  loss_objectness: 0.0120 (0.0124)  loss_rpn_box_reg: 0.0076 (0.0087)  time: 0.5647  data: 0.0520  max mem: 4398
Epoch: [1]  [  20/1250]  eta: 0:12:03  lr: 0.005000  loss: 0.2967 (0.3369)  loss_classifier: 0.1460 (0.1851)  loss_box_reg: 0.1328 (0.1288)  loss_objectness: 0.0091 (0.0139)  loss_rpn_box_reg: 0.0073 (0.0091)  time: 0.5683  data: 0.0091  max mem: 4398
Epoch: [1]  [  30/1250]  eta: 0:11:46  lr: 0.005000  loss: 0.3105 (0.3470)  loss_classifier: 0.1706 (0.1821)  loss_box_reg: 0.1313 (0.1419)  loss_objectness: 0.0081 (0.0136)  loss_rpn_box_reg: 0.0073 (0.0093)  time: 0.5864  data: 0.0095  max me

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Test:  [  0/608]  eta: 0:04:12  model_time: 0.1914 (0.1914)  evaluator_time: 0.0036 (0.0036)  time: 0.4152  data: 0.2187  max mem: 4398
Test:  [100/608]  eta: 0:00:56  model_time: 0.0941 (0.0950)  evaluator_time: 0.0018 (0.0047)  time: 0.1170  data: 0.0056  max mem: 4398
Test:  [200/608]  eta: 0:00:43  model_time: 0.0880 (0.0945)  evaluator_time: 0.0023 (0.0040)  time: 0.1048  data: 0.0048  max mem: 4398
Test:  [300/608]  eta: 0:00:32  model_time: 0.0904 (0.0950)  evaluator_time: 0.0025 (0.0036)  time: 0.1092  data: 0.0055  max mem: 4398
Test:  [400/608]  eta: 0:00:22  model_time: 0.0877 (0.0951)  evaluator_time: 0.0024 (0.0034)  time: 0.1051  data: 0.0049  max mem: 4398
Test:  [500/608]  eta: 0:00:11  model_time: 0.0868 (0.0951)  evaluator_time: 0.0026 (0.0033)  time: 0.1080  data: 0.0067  max mem: 4398
Test:  [600/608]  eta: 0:00:00  model_time: 0.0861 (0.0950)  evaluator_time: 0.0020 (0.0032)  time: 0.1048  data: 0.0058  max mem: 4398
Test:  [607/608]  eta: 0:00:00  model_time: 0.08

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Epoch: [2]  [   0/1250]  eta: 0:19:28  lr: 0.005000  loss: 0.1400 (0.1400)  loss_classifier: 0.0589 (0.0589)  loss_box_reg: 0.0726 (0.0726)  loss_objectness: 0.0072 (0.0072)  loss_rpn_box_reg: 0.0013 (0.0013)  time: 0.9344  data: 0.4799  max mem: 4398
Epoch: [2]  [  10/1250]  eta: 0:11:50  lr: 0.005000  loss: 0.2115 (0.2605)  loss_classifier: 0.1066 (0.1329)  loss_box_reg: 0.0726 (0.1110)  loss_objectness: 0.0078 (0.0085)  loss_rpn_box_reg: 0.0078 (0.0081)  time: 0.5727  data: 0.0520  max mem: 4398
Epoch: [2]  [  20/1250]  eta: 0:11:55  lr: 0.005000  loss: 0.2736 (0.2791)  loss_classifier: 0.1242 (0.1354)  loss_box_reg: 0.1174 (0.1239)  loss_objectness: 0.0042 (0.0065)  loss_rpn_box_reg: 0.0101 (0.0133)  time: 0.5645  data: 0.0086  max mem: 4398
Epoch: [2]  [  30/1250]  eta: 0:11:24  lr: 0.005000  loss: 0.2833 (0.2794)  loss_classifier: 0.1235 (0.1322)  loss_box_reg: 0.1312 (0.1271)  loss_objectness: 0.0034 (0.0070)  loss_rpn_box_reg: 0.0108 (0.0131)  time: 0.5550  data: 0.0084  max me

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Test:  [  0/608]  eta: 0:04:44  model_time: 0.1955 (0.1955)  evaluator_time: 0.0033 (0.0033)  time: 0.4672  data: 0.2666  max mem: 4398
Test:  [100/608]  eta: 0:00:54  model_time: 0.0900 (0.0951)  evaluator_time: 0.0020 (0.0029)  time: 0.1061  data: 0.0049  max mem: 4398
Test:  [200/608]  eta: 0:00:43  model_time: 0.0880 (0.0948)  evaluator_time: 0.0026 (0.0031)  time: 0.1063  data: 0.0059  max mem: 4398
Test:  [300/608]  eta: 0:00:32  model_time: 0.0903 (0.0953)  evaluator_time: 0.0021 (0.0029)  time: 0.1092  data: 0.0059  max mem: 4398
Test:  [400/608]  eta: 0:00:21  model_time: 0.0882 (0.0954)  evaluator_time: 0.0021 (0.0028)  time: 0.1047  data: 0.0054  max mem: 4398
Test:  [500/608]  eta: 0:00:11  model_time: 0.0871 (0.0953)  evaluator_time: 0.0019 (0.0027)  time: 0.1076  data: 0.0061  max mem: 4398
Test:  [600/608]  eta: 0:00:00  model_time: 0.0858 (0.0952)  evaluator_time: 0.0021 (0.0027)  time: 0.1061  data: 0.0069  max mem: 4398
Test:  [607/608]  eta: 0:00:00  model_time: 0.08

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Epoch: [3]  [   0/1250]  eta: 0:21:27  lr: 0.000500  loss: 0.1322 (0.1322)  loss_classifier: 0.0799 (0.0799)  loss_box_reg: 0.0430 (0.0430)  loss_objectness: 0.0040 (0.0040)  loss_rpn_box_reg: 0.0052 (0.0052)  time: 1.0299  data: 0.4611  max mem: 4398
Epoch: [3]  [  10/1250]  eta: 0:13:04  lr: 0.000500  loss: 0.1695 (0.2527)  loss_classifier: 0.0964 (0.1359)  loss_box_reg: 0.0695 (0.1036)  loss_objectness: 0.0046 (0.0046)  loss_rpn_box_reg: 0.0055 (0.0086)  time: 0.6327  data: 0.0500  max mem: 4398
Epoch: [3]  [  20/1250]  eta: 0:12:15  lr: 0.000500  loss: 0.2133 (0.2676)  loss_classifier: 0.1152 (0.1389)  loss_box_reg: 0.1019 (0.1168)  loss_objectness: 0.0046 (0.0046)  loss_rpn_box_reg: 0.0055 (0.0073)  time: 0.5761  data: 0.0091  max mem: 4398
Epoch: [3]  [  30/1250]  eta: 0:11:38  lr: 0.000500  loss: 0.2354 (0.2658)  loss_classifier: 0.1196 (0.1340)  loss_box_reg: 0.1137 (0.1191)  loss_objectness: 0.0041 (0.0054)  loss_rpn_box_reg: 0.0055 (0.0072)  time: 0.5396  data: 0.0095  max me

  "MongoClient opened before fork. Create MongoClient only "
  "MongoClient opened before fork. Create MongoClient only "


Test:  [  0/608]  eta: 0:04:20  model_time: 0.1779 (0.1779)  evaluator_time: 0.0025 (0.0025)  time: 0.4280  data: 0.2458  max mem: 4398
Test:  [100/608]  eta: 0:00:53  model_time: 0.0883 (0.0946)  evaluator_time: 0.0017 (0.0024)  time: 0.1053  data: 0.0049  max mem: 4398
Test:  [200/608]  eta: 0:00:42  model_time: 0.0884 (0.0943)  evaluator_time: 0.0022 (0.0028)  time: 0.1052  data: 0.0049  max mem: 4398
Test:  [300/608]  eta: 0:00:32  model_time: 0.0898 (0.0950)  evaluator_time: 0.0022 (0.0028)  time: 0.1080  data: 0.0051  max mem: 4398
Test:  [400/608]  eta: 0:00:21  model_time: 0.0881 (0.0950)  evaluator_time: 0.0021 (0.0027)  time: 0.1049  data: 0.0057  max mem: 4398
Test:  [500/608]  eta: 0:00:11  model_time: 0.0876 (0.0950)  evaluator_time: 0.0019 (0.0026)  time: 0.1073  data: 0.0055  max mem: 4398
Test:  [600/608]  eta: 0:00:00  model_time: 0.0864 (0.0950)  evaluator_time: 0.0016 (0.0026)  time: 0.1035  data: 0.0048  max mem: 4398
Test:  [607/608]  eta: 0:00:00  model_time: 0.08

One of the main draws of FiftyOne is the ability to find failure modes of our model. The [built-in evaluation protocols](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html#evaluating-models) helps us find where our model got things right and where it got things wrong. Before we can evaluate the model, we need to run it on our test set and store the results in FiftyOne. Doing this is fairly simple and just requires us to run inference for the test images, get their corresponding [FiftyOne samples](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#samples), and add a new field called `predictions` to each sample to store the detections.

In [32]:
import fiftyone as fo

def convert_torch_predictions(preds, det_id, s_id, w, h, classes):
    # Convert the outputs of the torch model into a FiftyOne Detections object
    dets = []
    for bbox, label, score in zip(
        preds["boxes"].cpu().detach().numpy(), 
        preds["labels"].cpu().detach().numpy(), 
        preds["scores"].cpu().detach().numpy()
    ):
        # Parse prediction into FiftyOne Detection object
        x0,y0,x1,y1 = bbox
        coco_obj = fouc.COCOObject(det_id, s_id, int(label), [x0, y0, x1-x0, y1-y0])
        det = coco_obj.to_detection((w,h), classes)
        det["confidence"] = float(score)
        dets.append(det)
        det_id += 1
        
    detections = fo.Detections(detections=dets)
        
    return detections, det_id

def add_detections(model, torch_dataset, view, field_name="predictions"):
    # Run inference on a dataset and add results to FiftyOne
    torch.set_num_threads(1)
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    print("Using device %s" % device)

    model.eval()
    model.to(device)
    image_paths = torch_dataset.img_paths
    classes = torch_dataset.classes
    det_id = 0
    
    with fo.ProgressBar() as pb:
        for img, targets in pb(torch_dataset):
            # Get FiftyOne sample indexed by unique image filepath
            img_id = int(targets["image_id"][0])
            img_path = image_paths[img_id]
            sample = view[img_path]
            s_id = sample.id
            w = sample.metadata["width"]
            h = sample.metadata["height"]
            
            # Inference
            preds = model(img.unsqueeze(0).to(device))[0]
            
            detections, det_id = convert_torch_predictions(
                preds, 
                det_id, 
                s_id, 
                w, 
                h, 
                classes,
            )
            
            sample[field_name] = detections
            sample.save()

In [33]:
add_detections(model, torch_dataset_test, fo_dataset_new, field_name="predictions")

Using device cuda
 100% |█████████████████| 608/608 [1.3m elapsed, 0s remaining, 7.5 samples/s]      


In [34]:
results = fo.evaluate_detections(
    test_view, 
    "predictions", 
    classes=["apple", "ribs", "bread", "calamari", "crab", "beef", "rice", "noodle", "fish", "carrot", "oyster", "egg", "seaweed", "ice cream", "onion", "chips", 
                    "strawberry","lettuce","lemon","orange","cream","pea","chocolate","donut","sandwich"], 
    eval_key="eval", 
    compute_mAP=True
)

Evaluating detections...
 100% |█████████████████| 608/608 [11.3s elapsed, 0s remaining, 48.2 samples/s]      
Performing IoU sweep...
 100% |█████████████████| 608/608 [14.9s elapsed, 0s remaining, 39.2 samples/s]      


The [DetectionResults](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.eval.detection.html#fiftyone.utils.eval.detection.DetectionResults) object that is returned stores information like the mAP and contains functions that let you [plot confusion matrices, precision-recall curves, and more](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html). Also, these evaluation runs are tracked in FiftyOne and can be managed through functions like [list_evaluations()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.list_evaluations).

In [35]:
results.mAP()

0.3102371242583314

In [36]:
results.print_report()

              precision    recall  f1-score   support

       apple       0.07      0.78      0.13        63
        ribs       0.00      0.00      0.00         0
       bread       0.00      0.00      0.00         0
    calamari       0.00      0.00      0.00         0
        crab       0.00      0.00      0.00         0
        beef       0.00      0.00      0.00         0
        rice       0.00      0.00      0.00         0
      noodle       0.00      0.00      0.00         0
        fish       0.00      0.00      0.00         0
      carrot       0.13      0.91      0.23       358
      oyster       0.00      0.00      0.00         0
         egg       0.00      0.00      0.00         0
     seaweed       0.00      0.00      0.00         0
   ice cream       0.00      0.00      0.00         0
       onion       0.00      0.00      0.00         0
       chips       0.00      0.00      0.00         0
  strawberry       0.00      0.00      0.00         0
     lettuce       0.00    

By default, objects are only matched with other objects of the same class. In order to get an interesting confusion matrix, we need to match interclass objects by setting `classwise=False`.

In [37]:
results_interclass = fo.evaluate_detections(
    test_view, 
    "predictions", 
    classes=["apple", "ribs", "bread", "calamari", "crab", "beef", "rice", "noodle", "fish", "carrot", "oyster", "egg", "seaweed", "ice cream", "onion", "chips", 
                    "strawberry","lettuce","lemon","orange","cream","pea","chocolate","donut","sandwich"], 
    compute_mAP=True, 
    classwise=False
)

Evaluating detections...
 100% |█████████████████| 608/608 [9.9s elapsed, 0s remaining, 60.7 samples/s]       
Performing IoU sweep...
 100% |█████████████████| 608/608 [13.9s elapsed, 0s remaining, 38.4 samples/s]      


In [38]:
results_interclass.plot_confusion_matrix()


Interactive plots are currently only supported in Jupyter notebooks. Support outside of notebooks and in Google Colab will be included in an upcoming release. In the meantime, you can still use this plot, but note that (i) selecting data will not trigger callbacks, and (ii) you must manually call `plot.show()` to launch a new plot that reflects the current state of an attached session.

See https://voxel51.com/docs/fiftyone/user_guide/plots.html#working-in-notebooks for more information.





Note that there appears to be confusion between carrot, donut and sandwich classes where truth is none but objects are forecasted as carrot, donut and sandwich.

The [detection evaluation](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html#detections) also added the attributes `eval_fp`, `eval_tp`, and `eval_fn` to every predicted detection indicating if it is a false positive, true positive, or false negative. 
Let's create a view to find the worst samples by sorting by `eval_fp` using the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) to visualize the results. 

In [49]:
session.view = test_view.sort_by("eval_fp", reverse=True)

It would be best to get this [data reannotated to fix these mistakes](https://towardsdatascience.com/managing-annotation-mistakes-with-fiftyone-and-labelbox-fc6e87b51102), but in the meantime, we can easily remedy this by simply creating a new view that remaps the labels `carrot`, `donut`, and `sandwich` all to `ingredient` and then retraining the model with that. This is only possible because we are backing our data in FiftyOne and loading views into PyTorch as needed. Without FiftyOne, the PyTorch dataset class or the underlying data would need to be changed to remap these classes.

In [40]:
# map labels to single ingredient class 
ingredients_map = {c: "ingredient" for c in ingredients_list}

train_map_view = train_view.map_labels("ground_truth", ingredients_map)
test_map_view = test_view.map_labels("ground_truth", ingredients_map)

# use our dataset and defined transformations
torch_map_dataset = FiftyOneTorchDataset(train_map_view, train_transforms)
torch_map_dataset_test = FiftyOneTorchDataset(test_map_view, test_transforms)

In [None]:
# Only 2 classes (background and ingredient)
ingredient_model = get_model(2)

In [None]:
do_training(ingredient_model, torch_map_dataset, torch_map_dataset_test)

Using device cuda



MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Epoch: [0]  [   0/1250]  eta: 0:22:55  lr: 0.000010  loss: 0.9718 (0.9718)  loss_classifier: 0.6433 (0.6433)  loss_box_reg: 0.3041 (0.3041)  loss_objectness: 0.0159 (0.0159)  loss_rpn_box_reg: 0.0085 (0.0085)  time: 1.1004  data: 0.4635  max mem: 4398
Epoch: [0]  [  10/1250]  eta: 0:12:34  lr: 0.000060  loss: 0.9718 (0.9853)  loss_classifier: 0.6433 (0.6495)  loss_box_reg: 0.2450 (0.3154)  loss_objectness: 0.0159 (0.0148)  loss_rpn_box_reg: 0.0048 (0.0056)  time: 0.6087  data: 0.0499  max mem: 4398
Epoch: [0]  [  20/1250]  eta: 0:11:41  lr: 0.000110  loss: 0.9068 (0.8736)  loss_classifier: 0.4648 (0.5148)  loss_box_reg: 0.2440 (0.3354)  loss_objectness: 0.0143 (0.0145)  loss_rpn_box_reg: 0.0055 (0.0089)  time: 0.5440  data: 0.0083  max mem: 4398
Epoch: [0]  [  30/1250]  eta: 0:11:20  lr: 0.000160  loss: 0.6045 (0.8001)  loss_classifier: 0.3217 (0.4365)  loss_box_reg: 0.3037 (0.3392)  loss_objectness: 0.0133 (0.0152)  loss_rpn_box_reg: 0.0096 (0.0092)  time: 0.5300  data: 0.0090  max me


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Test:  [  0/608]  eta: 0:04:16  model_time: 0.1884 (0.1884)  evaluator_time: 0.0036 (0.0036)  time: 0.4218  data: 0.2282  max mem: 4719
Test:  [100/608]  eta: 0:00:53  model_time: 0.0873 (0.0937)  evaluator_time: 0.0020 (0.0030)  time: 0.1060  data: 0.0057  max mem: 4719
Test:  [200/608]  eta: 0:00:42  model_time: 0.0858 (0.0931)  evaluator_time: 0.0026 (0.0034)  time: 0.1030  data: 0.0046  max mem: 4719
Test:  [300/608]  eta: 0:00:31  model_time: 0.0877 (0.0933)  evaluator_time: 0.0024 (0.0033)  time: 0.1061  data: 0.0046  max mem: 4719
Test:  [400/608]  eta: 0:00:21  model_time: 0.0865 (0.0936)  evaluator_time: 0.0026 (0.0032)  time: 0.1031  data: 0.0045  max mem: 4719
Test:  [500/608]  eta: 0:00:11  model_time: 0.0863 (0.0935)  evaluator_time: 0.0022 (0.0031)  time: 0.1054  data: 0.0053  max mem: 4719
Test:  [600/608]  eta: 0:00:00  model_time: 0.0847 (0.0935)  evaluator_time: 0.0023 (0.0032)  time: 0.1029  data: 0.0055  max mem: 4719
Test:  [607/608]  eta: 0:00:00  model_time: 0.08


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Epoch: [1]  [   0/1250]  eta: 0:19:07  lr: 0.005000  loss: 0.3107 (0.3107)  loss_classifier: 0.1280 (0.1280)  loss_box_reg: 0.1601 (0.1601)  loss_objectness: 0.0162 (0.0162)  loss_rpn_box_reg: 0.0065 (0.0065)  time: 0.9180  data: 0.4597  max mem: 4719
Epoch: [1]  [  10/1250]  eta: 0:13:23  lr: 0.005000  loss: 0.2589 (0.2776)  loss_classifier: 0.1147 (0.1233)  loss_box_reg: 0.1233 (0.1307)  loss_objectness: 0.0130 (0.0168)  loss_rpn_box_reg: 0.0065 (0.0068)  time: 0.6477  data: 0.0491  max mem: 4719
Epoch: [1]  [  20/1250]  eta: 0:12:00  lr: 0.005000  loss: 0.2906 (0.3044)  loss_classifier: 0.1330 (0.1379)  loss_box_reg: 0.1165 (0.1393)  loss_objectness: 0.0130 (0.0166)  loss_rpn_box_reg: 0.0051 (0.0106)  time: 0.5690  data: 0.0083  max mem: 4719
Epoch: [1]  [  30/1250]  eta: 0:11:27  lr: 0.005000  loss: 0.2073 (0.2787)  loss_classifier: 0.0921 (0.1221)  loss_box_reg: 0.1036 (0.1332)  loss_objectness: 0.0090 (0.0139)  loss_rpn_box_reg: 0.0051 (0.0095)  time: 0.5168  data: 0.0087  max me


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Test:  [  0/608]  eta: 0:04:13  model_time: 0.1738 (0.1738)  evaluator_time: 0.0038 (0.0038)  time: 0.4164  data: 0.2371  max mem: 4719
Test:  [100/608]  eta: 0:00:53  model_time: 0.0865 (0.0933)  evaluator_time: 0.0020 (0.0033)  time: 0.1063  data: 0.0062  max mem: 4719
Test:  [200/608]  eta: 0:00:43  model_time: 0.0856 (0.0930)  evaluator_time: 0.0027 (0.0038)  time: 0.1046  data: 0.0056  max mem: 4719
Test:  [300/608]  eta: 0:00:32  model_time: 0.0878 (0.0933)  evaluator_time: 0.0023 (0.0036)  time: 0.1062  data: 0.0046  max mem: 4719
Test:  [400/608]  eta: 0:00:21  model_time: 0.0866 (0.0935)  evaluator_time: 0.0028 (0.0036)  time: 0.1042  data: 0.0043  max mem: 4719
Test:  [500/608]  eta: 0:00:11  model_time: 0.0867 (0.0935)  evaluator_time: 0.0024 (0.0035)  time: 0.1054  data: 0.0046  max mem: 4719
Test:  [600/608]  eta: 0:00:00  model_time: 0.0846 (0.0935)  evaluator_time: 0.0022 (0.0036)  time: 0.1026  data: 0.0054  max mem: 4719
Test:  [607/608]  eta: 0:00:00  model_time: 0.08


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Epoch: [2]  [   0/1250]  eta: 0:18:48  lr: 0.005000  loss: 0.2941 (0.2941)  loss_classifier: 0.1650 (0.1650)  loss_box_reg: 0.0885 (0.0885)  loss_objectness: 0.0367 (0.0367)  loss_rpn_box_reg: 0.0040 (0.0040)  time: 0.9027  data: 0.4561  max mem: 4719
Epoch: [2]  [  10/1250]  eta: 0:11:27  lr: 0.005000  loss: 0.2941 (0.2863)  loss_classifier: 0.1240 (0.1210)  loss_box_reg: 0.1487 (0.1469)  loss_objectness: 0.0065 (0.0103)  loss_rpn_box_reg: 0.0058 (0.0081)  time: 0.5542  data: 0.0485  max mem: 4719
Epoch: [2]  [  20/1250]  eta: 0:11:17  lr: 0.005000  loss: 0.2512 (0.2623)  loss_classifier: 0.1081 (0.1134)  loss_box_reg: 0.1399 (0.1327)  loss_objectness: 0.0065 (0.0087)  loss_rpn_box_reg: 0.0058 (0.0075)  time: 0.5332  data: 0.0084  max mem: 4719
Epoch: [2]  [  30/1250]  eta: 0:11:20  lr: 0.005000  loss: 0.2209 (0.2600)  loss_classifier: 0.0947 (0.1140)  loss_box_reg: 0.0948 (0.1294)  loss_objectness: 0.0057 (0.0080)  loss_rpn_box_reg: 0.0043 (0.0086)  time: 0.5596  data: 0.0090  max me


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Test:  [  0/608]  eta: 0:04:22  model_time: 0.1894 (0.1894)  evaluator_time: 0.0040 (0.0040)  time: 0.4322  data: 0.2372  max mem: 4719
Test:  [100/608]  eta: 0:00:53  model_time: 0.0869 (0.0940)  evaluator_time: 0.0019 (0.0026)  time: 0.1045  data: 0.0045  max mem: 4719
Test:  [200/608]  eta: 0:00:42  model_time: 0.0859 (0.0934)  evaluator_time: 0.0021 (0.0029)  time: 0.1139  data: 0.0158  max mem: 4719
Test:  [300/608]  eta: 0:00:32  model_time: 0.0878 (0.0938)  evaluator_time: 0.0021 (0.0028)  time: 0.1063  data: 0.0047  max mem: 4719
Test:  [400/608]  eta: 0:00:21  model_time: 0.0858 (0.0939)  evaluator_time: 0.0023 (0.0028)  time: 0.1025  data: 0.0045  max mem: 4719
Test:  [500/608]  eta: 0:00:11  model_time: 0.0875 (0.0938)  evaluator_time: 0.0021 (0.0027)  time: 0.1053  data: 0.0049  max mem: 4719
Test:  [600/608]  eta: 0:00:00  model_time: 0.0849 (0.0938)  evaluator_time: 0.0019 (0.0027)  time: 0.1017  data: 0.0043  max mem: 4719
Test:  [607/608]  eta: 0:00:00  model_time: 0.08


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Epoch: [3]  [   0/1250]  eta: 0:19:06  lr: 0.000500  loss: 0.5591 (0.5591)  loss_classifier: 0.2136 (0.2136)  loss_box_reg: 0.3227 (0.3227)  loss_objectness: 0.0075 (0.0075)  loss_rpn_box_reg: 0.0153 (0.0153)  time: 0.9173  data: 0.4443  max mem: 4719
Epoch: [3]  [  10/1250]  eta: 0:11:57  lr: 0.000500  loss: 0.2276 (0.2396)  loss_classifier: 0.0744 (0.0982)  loss_box_reg: 0.0967 (0.1206)  loss_objectness: 0.0075 (0.0074)  loss_rpn_box_reg: 0.0046 (0.0134)  time: 0.5785  data: 0.0492  max mem: 4719
Epoch: [3]  [  20/1250]  eta: 0:11:44  lr: 0.000500  loss: 0.2120 (0.2594)  loss_classifier: 0.0834 (0.1075)  loss_box_reg: 0.0887 (0.1256)  loss_objectness: 0.0088 (0.0107)  loss_rpn_box_reg: 0.0046 (0.0156)  time: 0.5558  data: 0.0090  max mem: 4719
Epoch: [3]  [  30/1250]  eta: 0:11:46  lr: 0.000500  loss: 0.2058 (0.2506)  loss_classifier: 0.0870 (0.1016)  loss_box_reg: 0.0936 (0.1240)  loss_objectness: 0.0096 (0.0104)  loss_rpn_box_reg: 0.0052 (0.0146)  time: 0.5792  data: 0.0083  max me


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe


MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe



Test:  [  0/608]  eta: 0:04:19  model_time: 0.1867 (0.1867)  evaluator_time: 0.0030 (0.0030)  time: 0.4270  data: 0.2335  max mem: 4719
Test:  [100/608]  eta: 0:00:53  model_time: 0.0876 (0.0936)  evaluator_time: 0.0017 (0.0023)  time: 0.1060  data: 0.0064  max mem: 4719
Test:  [200/608]  eta: 0:00:42  model_time: 0.0863 (0.0931)  evaluator_time: 0.0020 (0.0027)  time: 0.1039  data: 0.0061  max mem: 4719
Test:  [300/608]  eta: 0:00:32  model_time: 0.0877 (0.0934)  evaluator_time: 0.0022 (0.0025)  time: 0.1077  data: 0.0066  max mem: 4719
Test:  [400/608]  eta: 0:00:21  model_time: 0.0857 (0.0936)  evaluator_time: 0.0020 (0.0025)  time: 0.1039  data: 0.0068  max mem: 4719
Test:  [500/608]  eta: 0:00:11  model_time: 0.0866 (0.0935)  evaluator_time: 0.0016 (0.0024)  time: 0.1066  data: 0.0069  max mem: 4719
Test:  [600/608]  eta: 0:00:00  model_time: 0.0846 (0.0934)  evaluator_time: 0.0016 (0.0025)  time: 0.1030  data: 0.0053  max mem: 4719
Test:  [607/608]  eta: 0:00:00  model_time: 0.08

In [None]:
add_detections(ingredient_model, torch_map_dataset_test, test_map_view, field_name="ingredient_predictions")

Using device cuda
 100% |█████████████████| 608/608 [1.5m elapsed, 0s remaining, 6.2 samples/s]      


In [None]:
ingredient_results = fo.evaluate_detections(
    test_map_view, 
    "ingredient_predictions", 
    classes=["ingredient"], 
    eval_key="ingredient_eval", 
    compute_mAP=True
)

Evaluating detections...
 100% |█████████████████| 608/608 [10.8s elapsed, 0s remaining, 54.9 samples/s]      
Performing IoU sweep...
 100% |█████████████████| 608/608 [13.3s elapsed, 0s remaining, 40.2 samples/s]      


In [None]:
ingredient_results.mAP()

0.4398916674240741

In [None]:
print(ingredients_view.first())

<SampleView: {
    'id': '6282374b1d1aacdb009565be',
    'media_type': 'image',
    'filepath': '/content/drive/MyDrive/COMP5425 Multimedia Retrieval/dataset/Food_101_COCO_format/data/100076-2.jpg',
    'tags': BaseList([]),
    'metadata': <ImageMetadata: {
        'size_bytes': None,
        'mime_type': None,
        'width': 512,
        'height': 512,
        'num_channels': None,
    }>,
    'ground_truth': <Detections: {
        'detections': BaseList([
            <Detection: {
                'id': '6282374b1d1aacdb009565b8',
                'attributes': BaseDict({}),
                'tags': BaseList([]),
                'label': 'donut',
                'bounding_box': BaseList([
                    0.23790156841278076,
                    0.03763008117675781,
                    0.278160035610199,
                    0.4430285692214966,
                ]),
                'mask': None,
                'confidence': 0.4021521210670471,
                'index': None,
        

In [None]:
# Print direct ingredient related recipe on Google search 
import google 

try:
    from googlesearch import search
except ImportError:
    print("No module named 'google' found")
 
# to search
query = "donut recipe"
list = []
for j in search(query, tld="co.in", num=10, stop=10, pause=2):
    list.append(j)
    print(j)

https://cooking.nytimes.com/recipes/1017060-doughnuts
https://sallysbakingaddiction.com/how-to-make-homemade-glazed-doughnuts/
https://sallysbakingaddiction.com/homemade-frosted-doughnuts-3-ways/
https://sallysbakingaddiction.com/baked-cinnamon-sugar-donuts/
https://sallysbakingaddiction.com/crumb-cake-donuts/
https://www.youtube.com/watch?v=ijYtLqIDZtk&vl=en
https://www.delish.com/cooking/recipe-ideas/a24788319/how-to-make-donuts-at-home/
https://www.allrecipes.com/recipe/45921/crispy-and-creamy-doughnuts/
https://joythebaker.com/2021/10/classic-yeast-doughnut-recipe/
https://preppykitchen.com/baked-donut-recipe/


In [None]:
%%javascript
url=list[0]
window.open(url, '_blank');


<IPython.core.display.Javascript object>

In [None]:
ingredient_results.print_report()

              precision    recall  f1-score   support

  ingredient       0.19      0.89      0.32      1680

   micro avg       0.19      0.89      0.32      1680
   macro avg       0.19      0.89      0.32      1680
weighted avg       0.19      0.89      0.32      1680



Due to our ability to easily visualize and manage our dataset with FiftyOne, we were able to spot and take action on a dataset issue that would otherwise have gone unnoticed if we only concerned ourselves with dataset-wide evaluation metrics and fixed dataset representations. Through these efforts, we managed to increase the mAP of the model to 44.4%.

Even though this example workflow may not work in all situations, this kind of class-merging strategy can be effective in cases where more fine-grained discrimination is not called for.

## Summary

PyTorch and related frameworks provide quick and easy methods to bootstrap your model development and training pipelines. However, they largely overlook the need to massage and finetune datasets to efficiently improve performance. FiftyOne makes it easy to load your datasets into a flexible format that works well with existing tools allowing you to provide better data for training and testing. As they say, "garbage in, garbage out".