<a href="https://colab.research.google.com/github/rahiakela/modern-computer-vision-with-pytorch/blob/main/10-applications-of-object-detection-and-segmentation/1_multi_object_instance_segmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Multi-object instance segmentation

**Detectron2** is a platform built by the Facebook team. **Detectron2** includes high-quality implementations of state-of-the-art object detection algorithms, including DensePose of the Mask R-CNN model family. The original Detectron framework was written in Caffe2, while the **Detectron2** framework is written using PyTorch.

Detectron2 supports a range of tasks related to object detection. 

Like the original Detectron, it supports object detection with boxes and instance segmentation masks, as well as human pose prediction. 

Beyond that, Detectron2 adds support for semantic segmentation and panoptic segmentation (a task that combines both semantic and instance segmentation). By leveraging Detectron2, we are able to build object detection, segmentation, and pose estimation in a few lines of code.

## Setup

In [None]:
!pip install -qU openimages torch_snippets

In [2]:
from torch_snippets import *

Exception: No module named 'sklego'


In [3]:
!wget -O train-annotations-object-segmentation.csv -q https://storage.googleapis.com/openimages/v5/train-annotations-object-segmentation.csv
!wget -O classes.csv -q https://raw.githubusercontent.com/openimages/dataset/master/dict.csv

## Fetching and preparing data

We will be working on the images that are available in the Open Images dataset provided by Google at https://storage.googleapis.com/openimages/web/index.html.

We will learn about fetching only the required images and not
the entire dataset. Note that this step is required, as the dataset size prohibits a typical user who might not have extensive resources from building a model.

Let's specify the classes that we want our model to predict.

In [4]:
required_classes = "person,dog,bird,car,elephant,football,jug,laptop,Mushroom,Pizza,Rocket,Shirt,Traffic sign,Watermelon,Zebra"
required_classes = [c.lower() for c in required_classes.lower().split(",")]

classes = pd.read_csv("classes.csv", header=None)
classes.columns = ["class", "class_name"]
classes = classes[classes["class_name"].map(lambda x: x in required_classes)]

In [7]:
classes.head()

Unnamed: 0,class,class_name
43,/m/01226z,football
224,/m/015p6,bird
601,/m/01c648,laptop
753,/m/01g317,person
1022,/m/01mqdt,traffic sign


In [20]:
classes_dup = classes[classes.duplicated()]
classes_dup.head()

Unnamed: 0,class,class_name


Let's fetch the image IDs and masks corresponding to required_classes.

In [5]:
df = pd.read_csv("train-annotations-object-segmentation.csv")
df.head()

Unnamed: 0,MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicks
0,677c122b0eaa5d16_m04yx4_9a041d52.png,677c122b0eaa5d16,/m/04yx4,9a041d52,0.8875,0.960938,0.454167,0.720833,0.86864,0.95498 0.65197 1;0.89370 0.56579 1;0.94701 0....
1,05529ae018130c68_m09j2d_b1115fd0.png,05529ae018130c68,/m/09j2d,b1115fd0,0.086875,0.254375,0.504708,0.79096,0.8025,0.16388 0.50114 1;0.25069 0.75425 1;0.13478 0....
2,96e7ee70b428a54e_m04yx4_05580497.png,96e7ee70b428a54e,/m/04yx4,05580497,0.45625,0.603125,0.222013,0.903104,0.5585,0.52271 0.46625 0;0.52695 0.70150 0;0.59151 0....
3,76084f166740d78a_m09j2d_557dfcf5.png,76084f166740d78a,/m/09j2d,557dfcf5,0.01875,0.145625,0.313333,0.754167,0.62394,0.08756 0.34082 0;0.03971 0.34195 1;0.06705 0....
4,ebaccfc70c721055_m02p0tk3_b39109c0.png,ebaccfc70c721055,/m/02p0tk3,b39109c0,0.0975,0.2125,0.291667,0.930833,0.84223,0.19847 0.85413 1;0.18916 0.34751 1;0.18636 0....


In [22]:
df_dup = df[df.duplicated()]
df_dup.head()

Unnamed: 0,MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicks


In [24]:
data = df.merge(df, classes, left_on="LabelName", right_on="class")

# we are only fetching 500 images per class
subset_data = data.groupby("class_name").agg({"ImageID": lambda x: list(x)[:500]})
subset_data = flatten(subset_data.ImageID.tolist())
subset_data = subset_data[data["ImageID"].map(lambda x: x in subset_data)]
subset_masks = subset_data["MaskPath"].tolist()

KeyError: ignored