# EfficientDet Transfer Learning on TACO Dataset

This notebook demonstrates how to perform transfer learning using EfficientDet-D0 on the TACO (Trash Annotations in Context) dataset.

## 0. Setup and Requirements

First, let's install the required packages.

In [1]:
!pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors matplotlib

Collecting tensorboardX
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.7/101.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tensorboardX
Successfully installed tensorboardX-2.6.2.2


## 1. Import Dependencies and Set Up Environment

In [6]:
import os
import sys
import torch
from torch.backends import cudnn

# Add the project directory to the Python path
if "Yet-Another-EfficientDet-Pytorch" not in os.getcwd():
    !git clone --depth 1 https://github.com/markintoshplus/Yet-Another-EfficientDet-Pytorch.git
    os.chdir('Yet-Another-EfficientDet-Pytorch')
    sys.path.append('.')
else:
    !git pull

remote: Enumerating objects: 11, done.[K
remote: Counting objects:   9% (1/11)[Kremote: Counting objects:  18% (2/11)[Kremote: Counting objects:  27% (3/11)[Kremote: Counting objects:  36% (4/11)[Kremote: Counting objects:  45% (5/11)[Kremote: Counting objects:  54% (6/11)[Kremote: Counting objects:  63% (7/11)[Kremote: Counting objects:  72% (8/11)[Kremote: Counting objects:  81% (9/11)[Kremote: Counting objects:  90% (10/11)[Kremote: Counting objects: 100% (11/11)[Kremote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (1/1)[Kremote: Compressing objects: 100% (1/1), done.[K
remote: Total 6 (delta 5), reused 6 (delta 5), pack-reused 0 (from 0)[K
Unpacking objects:  16% (1/6)Unpacking objects:  33% (2/6)Unpacking objects:  50% (3/6)Unpacking objects:  66% (4/6)Unpacking objects:  83% (5/6)Unpacking objects: 100% (6/6)Unpacking objects: 100% (6/6), 542 bytes | 36.00 KiB/s, done.
From https://github.com/markintoshplus/Yet-Anoth

## 2. Prepare Dataset and Weights

Let's verify the locations of the TACO dataset, pre-trained weights, and the project YAML file.

In [7]:
# Verify TACO dataset location
print("TACO dataset contents:")
print(os.listdir('datasets/taco'))

# Verify annotations
print("\nAnnotations:")
print(os.listdir('datasets/taco/annotations'))

# Display a sample of image files in train folder
print("\nSample of training images:")
print(os.listdir('datasets/taco/train')[:5])

TACO dataset contents:
['val', 'test', 'annotations', 'train']

Annotations:
['val_annotations.coco.json', 'test_annotations.coco.json', 'train_annotations.coco.json']

Sample of training images:
['batch_5000019_JPG.rf.cdd805cb4d7442374369fc4490bc011e.jpg', 'batch_14000013_jpg.rf.92b58a620abd39c5271fd5e4e889b325.jpg', 'batch_9000063_jpg.rf.5a8cb241c0f4c32dcb6b5d2ae09a7097.jpg', 'batch_4000061_JPG.rf.8172b5b335e2bc90cb5d0f1c0cb2d877.jpg', 'batch_3IMG_4919_JPG.rf.41a41ac787b91b783cef67d54f3d888b.jpg']


## 3. Training

We'll perform transfer learning in two steps:
1. Train only the head of the network
2. Fine-tune the entire model

In [8]:
# Train head only (transfer learning)
!python train.py -c 0 -p taco --head_only True --lr 1e-3 --batch_size 16 --load_weights pre-trained_weights/efficientdet-d0.pth --num_epochs 50 --save_interval 5

loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
loading annotations into memory...
Done (t=0.09s)
creating index...
index created!
  ret = model.load_state_dict(torch.load(weights_path), strict=False)
	size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([810, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([540, 64, 1, 1]).
	size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([810]) from checkpoint, the shape in current model is torch.Size([540]).
[Info] loaded weights: efficientdet-d0.pth, resuming checkpoint from step: 0
[Info] freezed backbone
Step: 4. Epoch: 0/50. Iteration: 5/65. Cls loss: 133484.70312. Reg loss: 3.45049. Total loss: 133488.15625:   6% 4/65 [00:15<02:14,  2.20s/it]checkpoint...
Step: 9. Epoch: 0/50. Iteration: 10/65. Cls loss: 100449.68750. Reg loss: 2.41907. Total loss: 100452.10938:  14% 9/65 [00:17<00:41,  1.3

In [None]:
# Fine-tune the entire model
!python train.py -c 0 -p taco --head_only False --lr 1e-4 --batch_size 8 --load_weights logs/taco/efficientdet-d0_49_3250.pth --num_epochs 100 --save_interval 100

loading annotations into memory...
Done (t=0.05s)
creating index...
index created!
loading annotations into memory...
Done (t=0.11s)
creating index...
index created!
  ret = model.load_state_dict(torch.load(weights_path), strict=False)
[Info] loaded weights: efficientdet-d0_49_3250.pth, resuming checkpoint from step: 3250
Step: 3274. Epoch: 24/100. Iteration: 131/131. Cls loss: 1.71755. Reg loss: 1.74694. Total loss: 3.46449: 100% 131/131 [00:41<00:00,  3.14it/s]
Val. Epoch: 24/100. Classification loss: 1.76011. Regression loss: 1.93943. Total loss: 3.69954
Step: 3299. Epoch: 25/100. Iteration: 25/131. Cls loss: 1.42823. Reg loss: 1.15557. Total loss: 2.58380:  18% 24/131 [00:20<01:02,  1.72it/s]checkpoint...
Step: 3399. Epoch: 25/100. Iteration: 125/131. Cls loss: 0.85621. Reg loss: 0.98388. Total loss: 1.84009:  95% 124/131 [01:20<00:03,  1.80it/s]checkpoint...
Step: 3405. Epoch: 25/100. Iteration: 131/131. Cls loss: 0.90836. Reg loss: 0.61582. Total loss: 1.52418: 100% 131/131 [01:2

## 4. Evaluation

Now, let's evaluate the trained model on the TACO dataset.

In [None]:
# Get the latest weight file
%cd logs/taco
weight_file = !ls -Art | grep efficientdet
%cd ../..

# Evaluate the model
!python coco_eval.py -c 0 -p taco -w "logs/taco/{weight_file[-1]}"

## 5. Visualization

Finally, let's visualize the model's predictions on a sample image from the TACO dataset.

In [None]:
import torch
from torch.backends import cudnn
from backbone import EfficientDetBackbone
import cv2
import matplotlib.pyplot as plt
import numpy as np
import json

from efficientdet.utils import BBoxTransform, ClipBoxes
from utils.utils import preprocess, invert_affine, postprocess

compound_coef = 0
force_input_size = None  # set None to use default size
img_path = 'datasets/taco/val/batch_7000008_JPG.rf.14f30006571dfa762fd80920a1f333fe.jpg'

threshold = 0.2
iou_threshold = 0.2

use_cuda = True
use_float16 = False
cudnn.fastest = True
cudnn.benchmark = True

# Load TACO category names
with open('datasets/taco/annotations/test_annotations.coco.json') as f:
    annotations = json.load(f)
obj_list = [category['name'] for category in annotations['categories']]

# tf bilinear interpolation is different from any other's, just make do
input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size

model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list),
                             ratios=[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)],
                             scales=[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])

model.load_state_dict(torch.load(f'weights/efficientdet-d{compound_coef}.pth'))
model.requires_grad_(False)
model.eval()

if use_cuda:
    model = model.cuda()
if use_float16:
    model = model.half()

ori_imgs, framed_imgs, framed_metas = preprocess(img_path, max_size=input_size)

if use_cuda:
    x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
else:
    x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)

x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)

with torch.no_grad():
    features, regression, classification, anchors = model(x)

    regressBoxes = BBoxTransform()
    clipBoxes = ClipBoxes()

    out = postprocess(x,
                      anchors, regression, classification,
                      regressBoxes, clipBoxes,
                      threshold, iou_threshold)

out = invert_affine(framed_metas, out)

for i in range(len(ori_imgs)):
    if len(out[i]['rois']) == 0:
        continue
    ori_imgs[i] = ori_imgs[i].copy()
    for j in range(len(out[i]['rois'])):
        (x1, y1, x2, y2) = out[i]['rois'][j].astype(np.int)
        cv2.rectangle(ori_imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
        obj = obj_list[out[i]['class_ids'][j]]
        score = float(out[i]['scores'][j])

        cv2.putText(ori_imgs[i], '{}, {:.3f}'.format(obj, score),
                    (x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    (255, 255, 0), 1)

plt.figure(figsize=(15, 15))
plt.imshow(ori_imgs[0])
plt.axis('off')
plt.show()