# DEtection TRansformer (DETR)

The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes.

In [1]:
!pip install transformers
!pip install timm

Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 14.7 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.5.1-py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 8.6 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 46.4 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 44.8 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 46.4 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml

In [2]:
from transformers import DetrFeatureExtractor, DetrForSegmentation
from PIL import Image
import requests
import timm

In [3]:
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

In [4]:
feature_extractor = DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50-panoptic")
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50-panoptic")

Downloading:   0%|          | 0.00/273 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/11.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/164M [00:00<?, ?B/s]

Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1_0-14fe96d1.pth" to /root/.cache/torch/hub/checkpoints/resnet50_a1_0-14fe96d1.pth


In [5]:
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)


In [6]:
logits = outputs.logits
bboxes = outputs.pred_boxes
masks = outputs.pred_masks

In [7]:
print(logits)


tensor([[[-18.1565,  -1.7568, -13.5029,  ..., -18.1412, -17.5666,  13.0900],
         [-16.8888,  -1.4138, -14.1028,  ..., -16.3841, -16.2431,  13.3381],
         [-17.5709,  -2.5080, -11.8654,  ..., -16.9370, -16.9015,  13.5036],
         ...,
         [-18.4712,  -2.3563, -15.8813,  ..., -18.4648, -18.1001,  13.1995],
         [-18.8968,  -0.3910, -12.1402,  ..., -18.2435, -18.6408,  13.1864],
         [-16.9737,  -3.5816, -14.8692,  ..., -17.2663, -16.6699,  13.0434]]],
       grad_fn=<AddBackward0>)
