# EfficientDet Transfer Learning on TACO Dataset

This notebook demonstrates how to perform transfer learning using EfficientDet-D0 on the TACO (Trash Annotations in Context) dataset.

## 0. Setup and Requirements

First, let's install the required packages.

In [9]:
!pip install pycocotools numpy opencv-python tqdm tensorboard tensorboardX pyyaml webcolors matplotlib



In [10]:
import torch

# Check if CUDA is available
cuda_available = torch.cuda.is_available()
print(f"CUDA available: {cuda_available}")

if cuda_available:
    # Get the name of the current CUDA device
    print(f"Current CUDA device: {torch.cuda.get_device_name(0)}")
    
    # Get the number of available CUDA devices
    print(f"Number of CUDA devices: {torch.cuda.device_count()}")
    
    # Get CUDA version
    print(f"CUDA version: {torch.version.cuda}")
else:
    print("CUDA is not available. The code will run on CPU.")

# Check PyTorch version
print(f"PyTorch version: {torch.__version__}")

CUDA available: True
Current CUDA device: NVIDIA GeForce GTX 1060
Number of CUDA devices: 1
CUDA version: 12.1
PyTorch version: 2.4.1+cu121


## 1. Import Dependencies and Set Up Environment

In [2]:
import os
import sys
import torch
from torch.backends import cudnn

# Add the project directory to the Python path
if "Yet-Another-EfficientDet-Pytorch" not in os.getcwd():
    !git clone --depth 1 https://github.com/markintoshplus/Yet-Another-EfficientDet-Pytorch.git
    os.chdir('Yet-Another-EfficientDet-Pytorch')
    sys.path.append('.')
else:
    !git pull

Already up to date.


## 2. Prepare Dataset and Weights

Let's verify the locations of the TACO dataset, pre-trained weights, and the project YAML file.

In [3]:
# Verify TACO dataset location
print("TACO dataset contents:")
print(os.listdir('datasets/taco'))

# Verify annotations
print("\nAnnotations:")
print(os.listdir('datasets/taco/annotations'))

# Display a sample of image files in train folder
print("\nSample of training images:")
print(os.listdir('datasets/taco/train')[:5])

TACO dataset contents:
['annotations', 'test', 'train', 'val']

Annotations:
['test_annotations.coco.json', 'train_annotations.coco.json', 'val_annotations.coco.json']

Sample of training images:
['batch_10000000_jpg.rf.47f0ceada07b269e85aadb05c68d7ada.jpg', 'batch_10000002_jpg.rf.abff65efa9e8d307bc6a648896062a81.jpg', 'batch_10000003_jpg.rf.a45a33d2a6359e2c70e8277eca63348a.jpg', 'batch_10000005_jpg.rf.fd06561137fc45e96b955e7f4b560557.jpg', 'batch_10000006_jpg.rf.368dd7e27c67d9edaf5e547032b9843a.jpg']


## 3. Training

We'll perform transfer learning in two steps:
1. Train only the head of the network
2. Fine-tune the entire model

In [15]:
# Train head only (transfer learning)
!python train.py -c 0 -p taco --head_only True --lr 1e-3 --batch_size 16 --load_weights pre-trained_weights/efficientdet-d0.pth --num_epochs 50 --save_interval 5 --debug True

^C
loading annotations into memory...
Done (t=0.07s)
creating index...
index created!
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
	size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([810, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([540, 64, 1, 1]).
	size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([810]) from checkpoint, the shape in current model is torch.Size([540]).
[Info] loaded weights: efficientdet-d0.pth, resuming checkpoint from step: 0
[Info] freezed backbone
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
Val. Epoch: 0/50. Classification loss: 3919.46590. Regression loss: 1.84963. Total loss: 3921.31553
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoint...
checkpoi

  ret = model.load_state_dict(torch.load(weights_path), strict=False)

  0%|          | 0/65 [00:00<?, ?it/s]
Step: 0. Epoch: 0/50. Iteration: 1/65. Cls loss: 188307.60938. Reg loss: 1.85111. Total loss: 188309.45312:   0%|          | 0/65 [01:07<?, ?it/s]
Step: 0. Epoch: 0/50. Iteration: 1/65. Cls loss: 188307.60938. Reg loss: 1.85111. Total loss: 188309.45312:   2%|1         | 1/65 [01:07<1:12:01, 67.52s/it]
Step: 1. Epoch: 0/50. Iteration: 2/65. Cls loss: 408237.06250. Reg loss: 1.47369. Total loss: 408238.53125:   2%|1         | 1/65 [01:17<1:12:01, 67.52s/it]
Step: 1. Epoch: 0/50. Iteration: 2/65. Cls loss: 408237.06250. Reg loss: 1.47369. Total loss: 408238.53125:   3%|3         | 2/65 [01:17<35:03, 33.39s/it]  
Step: 2. Epoch: 0/50. Iteration: 3/65. Cls loss: 289193.25000. Reg loss: 2.10847. Total loss: 289195.34375:   3%|3         | 2/65 [01:26<35:03, 33.39s/it]
Step: 2. Epoch: 0/50. Iteration: 3/65. Cls loss: 289193.25000. Reg loss: 2.10847. Total loss: 289195.34375:   5%|4   

In [None]:
# Fine-tune the entire model
!python train.py -c 0 -p taco --head_only False --lr 1e-4 --batch_size 8 --load_weights logs/taco/efficientdet-d0_49_3250.pth --num_epochs 100 --save_interval 50 --debug True

In [None]:
# Install TensorBoard if not already installed
!pip install tensorboard

# Load TensorBoard extension
%load_ext tensorboard

# Start TensorBoard
%tensorboard --logdir logs/taco

## 4. Evaluation

Now, let's evaluate the trained model on the TACO dataset.

In [25]:
# Get the latest weight file
%cd logs/taco
weight_file = !ls -Art | grep efficientdet
%cd ../..

# Evaluate the model
!python coco_eval.py -c 0 -p taco -w "logs/taco/{weight_file[-1]}"

C:\Users\itsd\Yet-Another-EfficientDet-Pytorch\logs\taco
C:\Users\itsd\Yet-Another-EfficientDet-Pytorch
running coco-style evaluation on project taco, weights logs/taco/operable program or batch file....
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
Using weights: logs/taco\efficientdet-d0_49_3250.pth
Processing batch 1/300
out length: 1
Processing image 1/1, image_id: 0
out[0] keys: dict_keys(['rois', 'class_ids', 'scores'])


  model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu')))

  0%|          | 0/300 [00:00<?, ?it/s]
  0%|          | 0/300 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\itsd\Yet-Another-EfficientDet-Pytorch\coco_eval.py", line 192, in <module>
    evaluate_coco(VAL_IMGS, SET_NAME, image_ids, coco_gt, model)
  File "C:\Users\itsd\Yet-Another-EfficientDet-Pytorch\coco_eval.py", line 113, in evaluate_coco
    preds = invert_affine(batch_metas[j], out[j])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\itsd\Yet-Another-EfficientDet-Pytorch\utils\utils.py", line 22, in invert_affine
    if len(preds[i]['rois']) == 0:
           ~~~~~^^^
KeyError: 0


## 5. Visualization

Finally, let's visualize the model's predictions on a sample image from the TACO dataset.

In [None]:
import torch
from torch.backends import cudnn
from backbone import EfficientDetBackbone
import cv2
import matplotlib.pyplot as plt
import numpy as np
import json

from efficientdet.utils import BBoxTransform, ClipBoxes
from utils.utils import preprocess, invert_affine, postprocess

compound_coef = 0
force_input_size = None  # set None to use default size
img_path = 'datasets/taco/val/batch_7000008_JPG.rf.14f30006571dfa762fd80920a1f333fe.jpg'

threshold = 0.2
iou_threshold = 0.2

use_cuda = True
use_float16 = False
cudnn.fastest = True
cudnn.benchmark = True

# Load TACO category names
with open('datasets/taco/annotations/test_annotations.coco.json') as f:
    annotations = json.load(f)
obj_list = [category['name'] for category in annotations['categories']]

# tf bilinear interpolation is different from any other's, just make do
input_sizes = [512, 640, 768, 896, 1024, 1280, 1280, 1536]
input_size = input_sizes[compound_coef] if force_input_size is None else force_input_size

model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list),
                             ratios=[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)],
                             scales=[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])

model.load_state_dict(torch.load(f'weights/efficientdet-d{compound_coef}.pth'))
model.requires_grad_(False)
model.eval()

if use_cuda:
    model = model.cuda()
if use_float16:
    model = model.half()

ori_imgs, framed_imgs, framed_metas = preprocess(img_path, max_size=input_size)

if use_cuda:
    x = torch.stack([torch.from_numpy(fi).cuda() for fi in framed_imgs], 0)
else:
    x = torch.stack([torch.from_numpy(fi) for fi in framed_imgs], 0)

x = x.to(torch.float32 if not use_float16 else torch.float16).permute(0, 3, 1, 2)

with torch.no_grad():
    features, regression, classification, anchors = model(x)

    regressBoxes = BBoxTransform()
    clipBoxes = ClipBoxes()

    out = postprocess(x,
                      anchors, regression, classification,
                      regressBoxes, clipBoxes,
                      threshold, iou_threshold)

out = invert_affine(framed_metas, out)

for i in range(len(ori_imgs)):
    if len(out[i]['rois']) == 0:
        continue
    ori_imgs[i] = ori_imgs[i].copy()
    for j in range(len(out[i]['rois'])):
        (x1, y1, x2, y2) = out[i]['rois'][j].astype(np.int)
        cv2.rectangle(ori_imgs[i], (x1, y1), (x2, y2), (255, 255, 0), 2)
        obj = obj_list[out[i]['class_ids'][j]]
        score = float(out[i]['scores'][j])

        cv2.putText(ori_imgs[i], '{}, {:.3f}'.format(obj, score),
                    (x1, y1 + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    (255, 255, 0), 1)

plt.figure(figsize=(15, 15))
plt.imshow(ori_imgs[0])
plt.axis('off')
plt.show()