In [1]:
from PIL import Image
from ultralytics import RTDETR
from torchvision import transforms
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import cv2
import json
import glob
%matplotlib inline

In [3]:
# Check if CUDA (GPU) is available and set the device
if torch.cuda.is_available():
    device = torch.device("cuda:0") # Use the first GPU
    print(f"Training on GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("CUDA not available. Training on CPU.")

Training on GPU: NVIDIA GeForce RTX 2060 SUPER


In [4]:
torch.cuda.empty_cache()

The modifications to the network architecture were made with the goal of creating a lightweight model specifically optimized for "shrimp fry" detection, aiming to improve performance in terms of speed and resource usage. Here's a breakdown of the reasons for each change:

1.  **`nc: 1` (Number of Classes)**:
    *   The original `nc: 80` indicated the model was designed for 80 different object classes (e.g., COCO dataset).
    *   For "shrimp fry" detection, we are only interested in detecting a single class. Changing `nc` to `1` tailors the model's output layer to predict only one type of object, significantly reducing the complexity of the final detection head and making it more efficient for this specific task.

2.  **`scales: s: [0.33, 0.25, 256]` (Model Scaling Constants)**:
    *   The original `l: [1.00, 1.00, 1024]` corresponded to a "large" model with high depth, width, and maximum channels.
    *   By changing to `s: [0.33, 0.25, 256]`, we are adopting a "small" scale. This means:
        *   `0.33` (depth multiplier): Reduces the number of layers or repeats in the network blocks.
        *   `0.25` (width multiplier): Reduces the number of channels (feature map dimensions) in the network.
        *   `256` (max_channels): Sets an upper limit on the number of channels, further constraining the model's size.
    *   These reductions collectively make the model significantly smaller, faster to train, and quicker at inference, which is crucial for real-time applications or deployment on edge devices.

3.  **`backbone` and `head` layers (Reduced Repeats and Channels)**:
    *   **Reduced `repeats`**: In modules like `HGBlock` and `RepC3`, the `repeats` parameter (e.g., from `6` to `2` or `1`) was decreased. This directly reduces the number of times a particular block structure is repeated, leading to a shallower network with fewer computations.
    *   **Reduced `channels`**: The number of output channels in various `Conv`, `HGStem`, `HGBlock`, and `RepC3` layers was substantially lowered (e.g., `HGStem` from `[32, 48]` to `[8, 12]`, `Conv` in the head from `[256, 1, 1]` to `[64, 1, 1]`). Fewer channels mean fewer feature maps are processed at each stage, which drastically cuts down on the model's parameter count and computational load.
    *   **Overall Impact**: These changes across both the feature extraction (backbone) and detection (head) parts of the network result in a much more compact and efficient model. While a larger model might capture more general features, for a specialized task like shrimp fry detection, a smaller, optimized model can often achieve comparable or even better performance with significantly less computational overhead, making it more practical for deployment.

These modifications collectively aim to create a highly efficient and performant model for the specific task of detecting shrimp fry, balancing detection accuracy with computational cost.

<task_progress>
- [x] Analyze the current `rtdetr-l.yaml` architecture.
- [x] Determine appropriate `depth` and `width` reductions for a lightweight model.
- [x] Update the `scales` parameter to reflect the lightweight architecture.
- [x] Adjust `nc` to 1 for shrimp fry detection.
- [x] Modify `backbone` and `head` layers to reduce complexity (channels, repeats).
- [x] Apply changes to `Lib/site-packages/ultralytics/cfg/models/rt-detr/rtdetr-l.yaml`.
- [x] Confirm the changes.
</task_progress>
</attempt_completion>

In [6]:
model = RTDETR('rtdetr-shrimp-s.yaml')

In [7]:
result = model.train(
    data='Shrimp-larvae-detection-1/data.yaml',
    epochs = 30,
    batch = 16,
    device=0
)

Ultralytics 8.3.203  Python-3.13.7 torch-2.7.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2060 SUPER, 8192MiB)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=Shrimp-larvae-detection-1/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=rtdetr-shrimp-s.yaml, momentum=0.937, mosaic=1.0, multi_scale=False, name=train8, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, pat

  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


[K       1/30      9.28G      2.146     0.1747     0.6841        575        640: 15% ━╸────────── 40/259 0.6it/s 1:46<6:38


RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR_HOST_ALLOCATION_FAILED