# DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

https://github.com/IDEA-Research/DINO

[Papers With Code Link](https://paperswithcode.com/paper/focal-modulation-networks)

In [1]:
!pip install --quiet pylance duckdb torch torchvision transforms

## Build and install [DINO]() Model

In [2]:
!git -C DINO pull || git clone https://github.com/IDEACVR/DINO
%cd DINO

!pip install --quiet -r requirements.txt \
  && cd models/dino/ops \
  && python setup.py -q build install

Already up to date.
/content/DINO
zip_safe flag not set; analyzing archive contents...
__pycache__.MultiScaleDeformableAttention.cpython-38: module references __file__


In [3]:
# See https://github.com/IDEA-Research/DINO/blob/main/inference_and_visualization.ipynb
# for instruction to load model
from util.slconfig import SLConfig
from main import build_model_main
model_config_path = "config/DINO/DINO_4scale.py"

args = SLConfig.fromfile(model_config_path) 
args.device = 'cuda' 
model, criterion, postprocessors = build_model_main(args)

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth


  0%|          | 0.00/97.8M [00:00<?, ?B/s]

In [4]:
# Downloads weights

# Download DINO-4scale weights
! [[ -f /tmp/model.pt ]] || gsutil cp gs://eto-public/models/dino/checkpoint0033_4scale.pth /tmp/model.pt
import torch
model_checkpoint_path = "/tmp/model.pt"
checkpoint = torch.load(model_checkpoint_path)
model.load_state_dict(checkpoint['model'])
_ = model.cuda().eval()

Copying gs://eto-public/models/dino/checkpoint0033_4scale.pth...
\ [1 files][535.8 MiB/535.8 MiB]   48.0 MiB/s                                   
Operation completed over 1 objects/535.8 MiB.                                    


In [10]:
!nvidia-smi

Wed Dec  7 18:44:14 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   69C    P0    29W /  70W |   2764MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Prepare COCO validation dataset

In [5]:
! gsutil cp gs://eto-public/datasets/coco/coco_val.lance.tar.gz /tmp/
! tar -C /tmp -xzf /tmp/coco_val.lance.tar.gz && rm /tmp/coco_val.lance.tar.gz

Copying gs://eto-public/datasets/coco/coco_val.lance.tar.gz...
- [1 files][771.6 MiB/771.6 MiB]   80.0 MiB/s                                   
Operation completed over 1 objects/771.6 MiB.                                    


In [None]:
from lance.pytorch import Dataset
import torchvision.transforms as T
import pandas as pd

transform = T.Compose([
    T.Resize(400),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

THRESHOLD = 0.5

dataset = Dataset(
  "/tmp/coco_val.lance",
  columns=["image", "split", "image_id"],
  mode="batch",
  batch_size=8
)
results = []
with torch.no_grad():
  for batch in dataset:
    image_ids = batch[2].cpu()
    imgs = [transform(img).cuda() for img in batch[0]]
    # print(batch, batch.shape)
    output = model(imgs)
    output = postprocessors['bbox'](
        output, torch.Tensor([[1.0, 1.0]] * len(imgs)).cuda())
    for image_id, out in zip(image_ids, output):
      mask = out["scores"] > THRESHOLD
      pred = {
          "image_id": image_id,
          "dino": {
            "boxes": out["boxes"][mask].cpu().numpy(),
            "labels": out["labels"][mask].cpu().numpy(),
            "scores": out["scores"][mask].cpu().numpy(),
          }
      }
      results.append(pred)

df = pd.DataFrame(data=results)
df