# DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

https://github.com/IDEA-Research/DINO

[Papers With Code Link](https://paperswithcode.com/paper/focal-modulation-networks)

## Build and install [DINO](https://github.com/IDEA-Research/DINO) Model.

DINO model requires building CUDA ops. After this step, we need to ***restart the runtime***.

In [None]:
!git -C DINO pull || git clone https://github.com/IDEACVR/DINO
!cd DINO \
  && pip install --quiet -r requirements.txt \
  && cd models/dino/ops \
  && python setup.py -q build install


Already up to date.


In [None]:
!pip install --quiet -U pylance duckdb torch torchvision transforms numpy pyarrow

In [None]:
# See https://github.com/IDEA-Research/DINO/blob/main/inference_and_visualization.ipynb
# for instruction to load model
from util.slconfig import SLConfig
from main import build_model_main
model_config_path = "DINO/config/DINO/DINO_4scale.py"

args = SLConfig.fromfile(model_config_path) 
args.device = 'cuda' 
model, criterion, postprocessors = build_model_main(args)

In [None]:
# Downloads weights

# Download DINO-4scale weights
! [[ -f /tmp/model.pt ]] || gsutil cp gs://eto-public/models/dino/checkpoint0033_4scale.pth /tmp/model.pt
import torch
model_checkpoint_path = "/tmp/model.pt"
checkpoint = torch.load(model_checkpoint_path)
model.load_state_dict(checkpoint['model'])
_ = model.cuda().eval()

## Prepare COCO validation dataset

In [None]:
! gsutil cp gs://eto-public/datasets/coco/coco_val.lance.tar.gz /tmp/
! tar -C /tmp -xzf /tmp/coco_val.lance.tar.gz && rm /tmp/coco_val.lance.tar.gz

In [None]:
from lance.pytorch import Dataset
import torchvision.transforms as T
import pandas as pd

transform = T.Compose([
    T.Resize(400),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

THRESHOLD = 0.5

dataset = Dataset(
  "/tmp/coco_val.lance",
  columns=["image", "image_id"],
  mode="batch",
  batch_size=8
)
results = []
with torch.no_grad():
  for batch in dataset:
    image_ids = batch[1].cpu()
    imgs = [transform(img).cuda() for img in batch[0]]
    # print(batch, batch.shape)
    output = model(imgs)
    output = postprocessors['bbox'](
        output, torch.Tensor([[1.0, 1.0]] * len(imgs)).cuda())
    for image_id, out in zip(image_ids, output):
      mask = out["scores"] > THRESHOLD
      pred = {
          "image_id": image_id.item(),
          "dino": {
            "boxes": out["boxes"][mask].cpu().numpy(),
            "labels": out["labels"][mask].cpu().numpy(),
            "scores": out["scores"][mask].cpu().numpy(),
          }
      }
      del imgs, output
      results.append(pred)
del model

df = pd.DataFrame(data=results)
df

# We can now add the dino inferene results into the dataset for later reference

In [None]:
!pip install -U numpy pyarrow

In [None]:
# We can now add the dino inferene results into the dataset for later reference

import pyarrow as pa
table = pa.Table.from_pandas(
    df, 
    schema=pa.schema([
        pa.field("image_id", pa.int64()), 
        pa.field("dino", pa.struct([
            pa.field("boxes", pa.list_(pa.list_(pa.float32(), 4))),
            pa.field("labels", pa.list_(pa.utf8())),
        ])),
    ]),
)
table