# Training and Evaluating FiftyOne Datasets with Detectron2

FiftyOne has all of the building blocks necessary to develop high-quality datasets to train your models, as well as advanced model evaluation capabilities. To make use of these, FiftyOne easily integrates with your existing model training and inference pipelines. In this walktrhough we'll cover how you can use your FiftyOne datasets to train a model with [Detectron2](https://github.com/facebookresearch/detectron2), Facebook AI Reasearch's library for detection and segmentation algorithms.

This walkthrough is based off of the [official Detectron2 tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5), augmented to load data to and from FiftyOne.


Specifically, this walkthrough covers:

- Loading a dataset from the FiftyOne Zoo, and splitting it into training/validation
- Initializing a segmentation model from the detectron2 model zoo
- Loading ground truth annotations from a FiftyOne dataset into a detectron2 model training pipeline and training the model
- Loading predictions from a detectron2 model into a FiftyOne dataset
- Evaluating model predictions in FiftyOne


**So, what’s the takeaway?**

By writing two simple functions, you can integrate FiftyOne into your Detectron2 model training and inference pipelines.

## Setup

To get started, you need to install [FiftyOne](https://voxel51.com/docs/fiftyone/getting_started/install.html) and [detectron2](https://detectron2.readthedocs.io/en/latest/tutorials/install.html):

In [None]:
!pip install fiftyone

In [4]:
!conda list

# packages in environment at /home/jakhon37/miniconda3:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
absl-py                   1.3.0                    pypi_0    pypi
aiofiles                  22.1.0                   pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
anyio                     3.6.2                    pypi_0    pypi
argcomplete               2.0.0                    pypi_0    pypi
argon2-cffi               21.3.0                   pypi_0    pypi
argon2-cffi-bindings      21.2.0                   pypi_0    pypi
asttokens                 2.1.0                    pypi_0    pypi
attrs                     22.1.0                   pypi_0    pypi
babel                     2.11.0                   pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
beautifulsoup4            4.1

In [1]:
import fiftyone 

Subprocess ['/home/jakhon37/miniconda3/lib/python3.9/site-packages/fiftyone/db/bin/mongod', '--dbpath', '/home/jakhon37/.fiftyone/var/lib/mongo', '--logpath', '/home/jakhon37/.fiftyone/var/lib/mongo/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error 127:
/home/jakhon37/miniconda3/lib/python3.9/site-packages/fiftyone/db/bin/mongod: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory


ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

In [None]:
!python -m pip install pyyaml==5.1

# Detectron2 has not released pre-built binaries for the latest pytorch (https://github.com/facebookresearch/detectron2/issues/4053)
# so we install from source instead. This takes a few minutes.
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

# Install pre-built detectron2 that matches pytorch version, if released:
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
#!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/{CUDA_VERSION}/{TORCH_VERSION}/index.html

In [2]:
import torch, detectron2
!nvcc --version
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
print("detectron2:", detectron2.__version__)

  from .autonotebook import tqdm as notebook_tqdm


nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
torch:  1.13 ; cuda:  cu116
detectron2: 0.6


In [3]:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, cv2

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog, DatasetCatalog

## Train on a FiftyOne dataset

In this section, we show how to use a custom FiftyOne Dataset to train a detectron2 model.
We'll train a license plate segmentation model from an existing model pre-trained on COCO dataset, available in detectron2's model zoo.

Since the COCO dataset doesn't have a "Vehicle registration plates" category, we will be using segmentations of license plates from the Open Images v6 dataset in the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#open-images-v6) to train the model to recognize this new category.


## Prepare the dataset

For this example, we will just use some of the samples from the official "validation" split of the dataset. To improve model performance, we could always add in more data from the official "train" split as well but that will take longer to train so we'll just stick to the "validation" split for this walkthrough.

In [None]:
dataset = foz.load_zoo_dataset(
    "open-images-v6", 
    split="validation", 
    classes=["Vehicle registration plate"], 
    label_types=["segmentations"],
)

Specifying a `classes` when downloading a dataset from the zoo will ensure that only samples with one of the given classes will be present. However, these samples may still contain other labels, so we can use the powerful [filtering capability](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering) of FiftyOne to easily keep only the "Vehicle registration plate" labels.
We will also untag these samples as "validation" and create our own split out of them.

In [None]:
from fiftyone import ViewField as F

# Remove other classes and existing tags
dataset.filter_labels("segmentations", F("label") == "Vehicle registration plate").save()
dataset.untag_samples("validation")

In [None]:
import fiftyone.utils.random as four

four.random_split(dataset, {"train": 0.8, "val": 0.2})

Next we will register the FiftyOne dataset to detectron2, following the [detectron2 custom dataset tutorial](https://detectron2.readthedocs.io/tutorials/datasets.html).
Here, the dataset is in its custom format, therefore we write a function to parse it and prepare it into [detectron2's standard format](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html#standard-dataset-dicts).

Note: In this example, we are specifically parsing the segmentations into bounding boxes and polylines. This function may require tweaks depending on the model being trained and the data it expects.


In [None]:
from detectron2.structures import BoxMode

def get_fiftyone_dicts(samples):
    samples.compute_metadata()

    dataset_dicts = []
    for sample in samples.select_fields(["id", "filepath", "metadata", "segmentations"]):
        height = sample.metadata["height"]
        width = sample.metadata["width"]
        record = {}
        record["file_name"] = sample.filepath
        record["image_id"] = sample.id
        record["height"] = height
        record["width"] = width
      
        objs = []
        for det in sample.segmentations.detections:
            tlx, tly, w, h = det.bounding_box
            bbox = [int(tlx*width), int(tly*height), int(w*width), int(h*height)]
            fo_poly = det.to_polyline()
            poly = [(x*width, y*height) for x, y in fo_poly.points[0]]
            poly = [p for x in poly for p in x]
            obj = {
                "bbox": bbox,
                "bbox_mode": BoxMode.XYWH_ABS,
                "segmentation": [poly],
                "category_id": 0,
            }
            objs.append(obj)

        record["annotations"] = objs
        dataset_dicts.append(record)

    return dataset_dicts

for d in ["train", "val"]:
    view = dataset.match_tags(d)
    DatasetCatalog.register("fiftyone_" + d, lambda view=view: get_fiftyone_dicts(view))
    MetadataCatalog.get("fiftyone_" + d).set(thing_classes=["vehicle_registration_plate"])

metadata = MetadataCatalog.get("fiftyone_train")

To verify the dataset is in correct format, let's visualize the annotations of the training set:



In [None]:
dataset_dicts = get_fiftyone_dicts(dataset.match_tags("train"))
ids = [dd["image_id"] for dd in dataset_dicts]

view = dataset.select(ids)
session = fo.launch_app(view)



In [None]:
session.freeze()  # screenshot the App

## Load the model and train!

Now, let's fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the FiftyOne dataset. It takes ~2 minutes to train 300 iterations on a P100 GPU.


In [4]:
from detectron2.engine import DefaultTrainer

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("fiftyone_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2  # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # The "RoIHead batch size". 128 is faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (Vehicle registration plate). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

[11/10 14:19:29 d2.engine.defaults]: Model:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res

KeyError: "Dataset 'fiftyone_train' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_train_panoptic, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_panoptic, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, coco_2017_val_100_panoptic, lvis_v1_train, lvis_v1_val, lvis_v1_test_dev, lvis_v1_test_challenge, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, lvis_v0.5_train_cocofied, lvis_v0.5_val_cocofied, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, cityscapes_fine_panoptic_train, cityscapes_fine_panoptic_val, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, ade20k_sem_seg_train, ade20k_sem_seg_val"

In [None]:
# Look at training curves in tensorboard:
%load_ext tensorboard
%tensorboard --logdir output

![tensorboard](https://github.com/voxel51/fiftyone/blob/v0.17.2/docs/source/tutorials/images/detectron2_tensorboard.png?raw=1)

## Inference & evaluation using the trained model
Now, let's run inference with the trained model on the license plate validation dataset. First, let's create a predictor using the model we just trained:



In [4]:
from detectron2.data import DatasetCatalog, MetadataCatalog, build_detection_test_loader
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.utils.logger import setup_logger
from detectron2.data.datasets import register_coco_instances
from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg, CfgNode
from detectron2 import model_zoo
import torch
import os

In [7]:
# Inference should use the config with parameters that are used in training
# cfg now already contains everything we've set previously. We changed it a little bit for inference:
config_file_path= "COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml" #faster_rcnn_R_50_C4.yaml",  #Cityscapes/mask_rcnn_R_50_FPN.yaml",             
cfg = get_cfg()
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.merge_from_file(model_zoo.get_config_file(config_file_path))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

Some model parameters or buffers are not found in the checkpoint:
backbone.fpn_lateral2.{bias, weight}
backbone.fpn_lateral3.{bias, weight}
backbone.fpn_lateral4.{bias, weight}
backbone.fpn_lateral5.{bias, weight}
backbone.fpn_output2.{bias, weight}
backbone.fpn_output3.{bias, weight}
backbone.fpn_output4.{bias, weight}
backbone.fpn_output5.{bias, weight}
proposal_generator.rpn_head.anchor_deltas.{bias, weight}
proposal_generator.rpn_head.conv.{bias, weight}
proposal_generator.rpn_head.objectness_logits.{bias, weight}
roi_heads.box_head.fc1.{bias, weight}
roi_heads.box_head.fc2.{bias, weight}
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}
roi_heads.mask_head.deconv.{bias, weight}
roi_heads.mask_head.mask_fcn1.{bias, weight}
roi_heads.mask_head.mask_fcn2.{bias, weight}
roi_heads.mask_head.mask_fcn3.{bias, weight}
roi_heads.mask_head.mask_fcn4.{bias, weight}
roi_heads.mask_head.predictor.{bias, weight}
The checkpoint state_dict contains 

Then, we generate predictions on each sample in the validation set, and convert the outputs from detectron2 to FiftyOne format, then add them to our FiftyOne dataset.

In [8]:
def detectron_to_fo(outputs, img_w, img_h):
    # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    detections = []
    instances = outputs["instances"].to("cpu")
    for pred_box, score, c, mask in zip(
        instances.pred_boxes, instances.scores, instances.pred_classes, instances.pred_masks,
    ):
        x1, y1, x2, y2 = pred_box
        fo_mask = mask.numpy()[int(y1):int(y2), int(x1):int(x2)]
        bbox = [float(x1)/img_w, float(y1)/img_h, float(x2-x1)/img_w, float(y2-y1)/img_h]
        fo.Detection(label="Vehicle registration plate", confidence=float(score), bounding_box=bbox, mask=fo_mask)
        detections.append(detection)

    return fo.Detections(detections=detections)

In [9]:
val_view = dataset.match_tags("val")
dataset_dicts = get_fiftyone_dicts(val_view)
predictions = {}
for d in dataset_dicts:
    img_w = d["width"]
    img_h = d["height"]
    img = cv2.imread(d["file_name"])
    outputs = predictor(img)
    detections = detectron_to_fo(outputs, img_w, img_h)
    predictions[d["image_id"]] = detections

dataset.set_values("predictions", predictions, key_field="id")

NameError: name 'dataset' is not defined

Let's visualize the predictions and take a look at how the model did. We can click the eye icon next to the "val" tag to view all of the validation samples that we ran inference on.

In [None]:
session = fo.launch_app(dataset)

In [None]:
session.freeze()  # screenshot the App

From here, we can use the built-in [evaluation methods](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html#detections) provided by FiftyOne. The `evaluate_detections()` method can be used to evaluate the instance segmentations using the `use_masks=True` parameter. We can also use this to compute mAP with the options being the [COCO-style](https://voxel51.com/docs/fiftyone/integrations/coco.html#map-protocol) (default) or [Open Images-style](https://voxel51.com/docs/fiftyone/integrations/open_images.html#map-protocol) mAP protocol.

In [10]:
from fiftyone import evaluate_detections

Subprocess ['/home/jakhon37/miniconda3/lib/python3.9/site-packages/fiftyone/db/bin/mongod', '--dbpath', '/home/jakhon37/.fiftyone/var/lib/mongo', '--logpath', '/home/jakhon37/.fiftyone/var/lib/mongo/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error 127:
/home/jakhon37/miniconda3/lib/python3.9/site-packages/fiftyone/db/bin/mongod: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory


ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port

In [14]:
def reg_dataset(train_data):
    with open(train_data, 'r') as file:
        line = json.load(file)    
    register_coco_instances(name=line["train_dataset_name"], metadata={}, json_file=line["train_label_path"], image_root=line["train_image_path"])
    register_coco_instances(name=line["validation_dataset_name"], metadata={}, json_file=line["validation_label_path"], image_root=line["validation_image_path"])
    register_coco_instances(name=line["test_dataset_name"], metadata={}, json_file=line["test_label_path"], image_root=line["test_image_path"])
    return line["train_dataset_name"], line["validation_dataset_name"], line["test_dataset_name"]

import json
train_data= "./input/data_path.json"
reg_dataset(train_data)    

('dataset_train', 'dataset_validation', 'dataset_test')

In [15]:
results = dataset_train.evaluate_detections(
    "predictions",
    gt_field="segmentations",
    eval_key="eval",
    use_masks=True,
    compute_mAP=True,
)

NameError: name 'dataset_train' is not defined

We can use this results object to view the mAP, print an evaluation report, plot PR curves, plot confusion matrices, and more.

In [None]:
results.mAP()

0.12387340239495186

In [None]:
results.print_report()

                            precision    recall  f1-score   support

Vehicle registration plate       0.72      0.18      0.29       292

                 micro avg       0.72      0.18      0.29       292
                 macro avg       0.72      0.18      0.29       292
              weighted avg       0.72      0.18      0.29       292



In [None]:
results.plot_pr_curves()

![pr-curve](https://github.com/voxel51/fiftyone/blob/v0.17.2/docs/source/tutorials/images/detectron2_pr.png?raw=1)

From the PR curve we can see that the model is not generating many predictions---resulting in many false negatives---but the predictions that are generated are often fairly accurate.

We can also create a view into the dataset looking at high-confidence false positive predictions to understand where the model is going wrong and how to potentially improve it in the future.

In [None]:
from fiftyone import ViewField as F

session.view = dataset.filter_labels("predictions", (F("eval") == "fp") & (F("confidence") > 0.8))

In [None]:
session.freeze()  # screenshot the App

There are a few samples with false positives like this one that contain plates with characters not from the Latin alphabet indicating we may want to introduce images from a wider range of countries into the training set.