# "[Mask2Former](https://arxiv.org/abs/2112.01527)" Tutorial

<img src="https://bowenc0221.github.io/images/maskformerv2_teaser.png" width="500"/>

Welcome to the [Mask2Former](https://github.com/facebookresearch/Mask2Former) in detectron2! In this tutorial, we will go through some basics usage of Mask2Former, including the following:
* Run inference on images or videos, with an existing Mask2Former model

You can make a copy of this tutorial or use "File -> Open in playground mode" to play with it yourself. **DO NOT** request access to this tutorial.


# Install detectron2

In [2]:
import torch, detectron2
!nvcc --version
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
print("detectron2:", detectron2.__version__)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
torch:  1.10 ; cuda:  cu111
detectron2: 0.6


In [3]:
import torchvision
TORCHVISION_VERSION = ".".join(torchvision.__version__.split(".")[:2])
print("torchvision: ", TORCHVISION_VERSION)

torchvision:  0.11


# Install Mask2Former

In [3]:
%pwd

'/home/ipanigra/Documents'

In [5]:
# for local use
%cd Mask2Former/mask2former/modeling/pixel_decoder/ops
!python setup.py build install
%cd ../../../../

/home/ipanigra/Documents/Mask2Former/mask2former/modeling/pixel_decoder/ops
running build
running build_py
running build_ext
running install
running bdist_egg
running egg_info
writing MultiScaleDeformableAttention.egg-info/PKG-INFO
writing dependency_links to MultiScaleDeformableAttention.egg-info/dependency_links.txt
writing top-level names to MultiScaleDeformableAttention.egg-info/top_level.txt
reading manifest file 'MultiScaleDeformableAttention.egg-info/SOURCES.txt'
writing manifest file 'MultiScaleDeformableAttention.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/modules
copying build/lib.linux-x86_64-3.7/modules/__init__.py -> build/bdist.linux-x86_64/egg/modules
copying build/lib.linux-x86_64-3.7/modules/ms_deform_attn.py -> build/bdist.linux-x86_64/egg/modules
copying build/lib.linux-x86_64-3.7/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.

In [4]:
# You may need to restart your runtime prior to this, to let your installation take effect

# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
setup_logger(name="mask2former")

# import some common libraries
import numpy as np
import cv2
import torch
# from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.data import MetadataCatalog
from detectron2.projects.deeplab import add_deeplab_config
coco_metadata = MetadataCatalog.get("mapillary_vistas_panoptic_val")

# import Mask2Former project
from mask2former import add_maskformer2_config

ModuleNotFoundError: No module named 'mask2former'

# Run a pre-trained Mask2Former model

We first download an image from the COCO dataset:

In [7]:
window_name = 'Image'

In [8]:
# !wget http://images.cocodataset.org/val2017/000000005477.jpg -q -O input.jpg
im = cv2.imread("./input.jpg")
cv2.imshow(window_name,im)
cv2.waitKey(0)
cv2.destroyAllWindows()

Then, we create a detectron2 config and a detectron2 `DefaultPredictor` to run inference on this image.

In [9]:
cfg = get_cfg()
add_deeplab_config(cfg)
add_maskformer2_config(cfg)
# coco
# cfg.merge_from_file("configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml")
# cfg.MODEL.WEIGHTS = 'https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/panoptic/maskformer2_swin_large_IN21k_384_bs16_100ep/model_final_f07440.pkl'
# mapillary vistas
cfg.merge_from_file("configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml")
cfg.MODEL.WEIGHTS = 'https://dl.fbaipublicfiles.com/maskformer/mask2former/mapillary_vistas/panoptic/maskformer2_swin_large_IN21k_384_bs16_300k/model_final_132c71.pkl'
# cityscapes
# cfg.merge_from_file("configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml")
# cfg.MODEL.WEIGHTS = 'https://dl.fbaipublicfiles.com/maskformer/mask2former/cityscapes/semantic/maskformer2_swin_tiny_bs16_90k/model_final_2d58d4.pkl'
cfg.MODEL.MASK_FORMER.TEST.SEMANTIC_ON = True
cfg.MODEL.MASK_FORMER.TEST.INSTANCE_ON = True
cfg.MODEL.MASK_FORMER.TEST.PANOPTIC_ON = True
predictor = DefaultPredictor(cfg)
outputs = predictor(im)

Loading config configs/mapillary-vistas/panoptic-segmentation/swin/../Base-MapillaryVistas-PanopticSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]




  dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
  topk_indices = topk_indices // self.sem_seg_head.num_classes


In [29]:
outputs["sem_seg"].argmax(0)==13

tensor([[False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        ...,
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False]], device='cuda:0')

In [38]:
stuff_classes = MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).stuff_classes
classes = (np.arange(len(stuff_classes))==13)
classes
len(outputs["sem_seg"][classes][0])

720

In [30]:
# Show panoptic/instance/semantic predictions: 
# v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW)
# panoptic_result = v.draw_panoptic_seg(outputs["panoptic_seg"][0].to("cpu"), outputs["panoptic_seg"][1]).get_image()
# v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW)
# instance_result = v.draw_instance_predictions(outputs["instances"].to("cpu")).get_image()
v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW)
semantic_result = v.draw_sem_seg((outputs["sem_seg"].argmax(0)==13).to("cpu")*13).get_image()
print("Panoptic segmentation (top), instance segmentation (middle), semantic segmentation (bottom)")
# cv2.imshow(window_name, np.concatenate((panoptic_result, instance_result, semantic_result), axis=0)[:, :, ::-1])
cv2.imshow(window_name, semantic_result[:, :, ::-1])
cv2.waitKey(0)
cv2.destroyAllWindows()

Panoptic segmentation (top), instance segmentation (middle), semantic segmentation (bottom)
