[![Labellerr](https://storage.googleapis.com/labellerr-cdn/%200%20Labellerr%20template/notebook.webp)](https://www.labellerr.com)

# **Fine-Tune RT-DETR for Livestock Monitoring**

---

[![labellerr](https://img.shields.io/badge/Labellerr-BLOG-black.svg)](https://www.labellerr.com/blog)
[![Youtube](https://img.shields.io/badge/Labellerr-YouTube-b31b1b.svg)](https://www.youtube.com/@Labellerr)
[![Github](https://img.shields.io/badge/Labellerr-GitHub-green.svg)](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)

## Overview
End-to-end pipeline for **livestock detection and tracking** using RT-DETR (Real-Time Detection Transformer).

## Workflow
1. **Data Preparation**: Extract frames ‚Üí Convert COCO to YOLO ‚Üí Split dataset (70/10/10)
2. **Model Training**: RT-DETR-L model, 100 epochs, 640x640 images
3. **Inference**: Track livestock in videos with trained model
4. **Custom Tracker**: LivestockMonitor class with unique colors per animal + live count

## Technologies
PyTorch ‚Ä¢ Ultralytics RT-DETR ‚Ä¢ OpenCV ‚Ä¢ NumPy

## Setup: Import Required Libraries
Import all necessary libraries for computer vision, deep learning, and data manipulation:
- **OpenCV (cv2)**: Image and video processing
- **NumPy**: Numerical operations
- **Ultralytics**: YOLO/RT-DETR framework
- **PyTorch**: Deep learning backend
- **Pathlib**: File path handling

In [12]:
import cv2
import numpy as np
from ultralytics import RTDETR
import torch
from typing import Dict, Tuple, Optional, List
from pathlib import Path
import random 
%matplotlib inline

## Setup: Clone YOLO Fine-tuning Utilities
Clone the `yolo_finetune_utils` repository which contains helper functions for:
- Frame extraction from videos
- COCO to YOLO format conversion
- Dataset preparation utilities
> **Note**: Uncomment this cell only if you haven't cloned the repository yet.

In [None]:
# !git clone https://github.com/Labellerr/yolo_finetune_utils.git

## Data Preparation: Extract Random Frames from Videos
Extract random frames from video files to create a dataset for annotation.
**Parameters**:
- `paths`: List of video directories
- `total_images`: Number of frames to extract
- `out_dir`: Output directory for extracted frames
- `jpg_quality`: JPEG compression quality (100 = highest)
- `seed`: Random seed for reproducibility

In [None]:
from yolo_finetune_utils.frame_extractor import extract_random_frames

extract_random_frames(
        paths=[r"videos\manufacturing_video_data"],
        total_images=150,
        out_dir="dataset_frames",
        jpg_quality=100,
        seed=42
    )

## Data Preparation: Convert Annotations to YOLO Format
Convert COCO-format annotations to YOLO format and split the dataset into train/val/test sets.
**Configuration**:
- **Train/Val/Test Split**: 70% / 10% / 10%
- **Input**: COCO JSON annotations + image directory
- **Output**: YOLO-format dataset in `model_dataset/`
The converter automatically:
- Creates train/val/test splits
- Generates YOLO-format label files (.txt)
- Organizes images and labels into proper directory structure

In [None]:
from yolo_finetune_utils.coco_yolo_converter.seg_converter import coco_to_yolo_converter

ANNOTATION_JSON = "annotations.json"
IMAGE_DIR = "dataset_frames"


coco_to_yolo_converter(
        json_path=ANNOTATION_JSON,
        images_dir=IMAGE_DIR,
        output_dir="model_dataset",
        use_split=True,
        train_ratio=0.7,
        val_ratio=0.1,
        test_ratio=0.1,
        shuffle=True,
        verbose=False
    )

## System Check: GPU Memory Status
Clear GPU cache and display current memory usage to ensure sufficient resources for training.
**Memory Metrics**:
- **Allocated**: Currently used GPU memory
- **Cached**: Reserved but not actively used
- **Free**: Available GPU memory
> Run this cell before training to free up GPU memory.

In [13]:
import torch
torch.cuda.empty_cache()

# Check GPU memory status
print(f"Allocated: {torch.cuda.memory_allocated(0)/1024**3:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved(0)/1024**3:.2f} GB")
print(f"Free: {torch.cuda.mem_get_info(0)[0]/1024**3:.2f} GB")

Allocated: 0.40 GB
Cached: 0.43 GB
Free: 6.41 GB


## Model Training: RT-DETR for Livestock Detection
Train the RT-DETR-L (Large) model on the livestock dataset.
**Training Configuration**:
| Parameter | Value | Description |
|-----------|-------|-------------|
| `data` | `model_dataset/data.yaml` | Dataset configuration file |
| `epochs` | 100 | Number of training epochs |
| `imgsz` | 640 | Input image size (640x640) |
| `batch` | 4 | Batch size |
| `device` | 0 | GPU device ID (0 = first GPU) |
| `workers` | 1 | Number of dataloader workers |
**Model**: RT-DETR-L (Large variant)
- Pre-trained weights: `rtdetr-l.pt`
- Architecture: Real-Time Detection Transformer
- Optimized for real-time object detection
> **Training Time**: Approximately 30-60 minutes depending on GPU

In [None]:
from ultralytics import RTDETR
# Load a model
model = RTDETR("rtdetr-l.pt")

# Train the model
results = model.train(
    data=r"model_dataset\data.yaml",    # Path to your dataset YAML file
    epochs=100,                        # Number of training epochs
    imgsz=640,                         # Image size
    batch=4,                          # Batch size
    device=0,                          # GPU device (0 for first GPU, 'cpu' for CPU)
    workers=1                          # Number of dataloader workers
)

## Inference: Track Livestock in Video
Run the trained RT-DETR model on a sample video to detect and track livestock.
**Inference Configuration**:
- **Model**: Best weights from training (`runs/detect/train/weights/best.pt`)
- **Confidence Threshold**: 0.5 (50%)
- **IOU Threshold**: 0.5 for NMS (Non-Maximum Suppression)
- **Tracking**: Enabled with `track()` method
- **Labels**: Hidden (`show_labels=False`)
**Output**: 
- Tracked video saved to `runs/detect/track*/`
- Each frame shows detected cows with bounding boxes
- Real-time inference speed displayed per frame

In [15]:
from ultralytics import RTDETR

model = RTDETR(r"runs\detect\train\weights\best.pt")


video_sample = r"livestocks_sample_video\sample2.mp4"
results = model.track(video_sample, conf=0.5, save=True, show_labels=False, stream=True, nms=True, iou=0.5)

for result in results:
    pass


video 1/1 (frame 1/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 42 Cows, 63.4ms
video 1/1 (frame 2/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 41 Cows, 139.2ms
video 1/1 (frame 3/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 41 Cows, 33.4ms
video 1/1 (frame 4/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 41 Cows, 34.4ms
video 1/1 (frame 5/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 41 Cows, 30.6ms
video 1/1 (frame 6/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 41 Cows, 33.1ms
video 1/1 (frame 7/342) d:\Professional\Projects\Livestock_monitoring\livestocks_sample_video\sample2.mp4: 640x640 42 Cows, 37.1ms
video 1/1 (frame 8/342) d:\Professional\Projects\Livestock_monitoring\livestocks_

## Advanced Tracking: LivestockMonitor Class
Define a custom `LivestockMonitor` class for enhanced livestock tracking with the following features:
### Key Features:
1. **Unique Color Assignment**: Each tracked animal gets a persistent random color
2. **Livestock Counter**: Real-time count displayed in upper-right corner
3. **Configurable Parameters**: Adjustable confidence, IOU, and visualization settings
4. **Class-based Architecture**: Reusable and maintainable code structure
### Class Methods:
- `__init__()`: Initialize tracker with model and parameters
- `generate_random_color()`: Create vivid random colors for tracks
- `get_track_color()`: Retrieve or assign color for track ID
- `draw_counter()`: Display livestock count overlay
- `draw_detections()`: Draw bounding boxes without labels
- `process_frame()`: Process single frame with tracking
- `track_video()`: Track livestock in entire video file
- `set_counter_style()`: Customize counter appearance

In [16]:
class LivestockMonitor:
    """
    A class for tracking livestock in videos using RT-DETR model.
    
    Each detected livestock is assigned a unique random color that persists
    across frames. The total count of detected livestock is displayed in the
    upper right corner of each frame.
    
    Attributes:
        model: The loaded RT-DETR model instance.
        track_colors: Dictionary mapping track IDs to BGR colors.
        conf_threshold: Confidence threshold for detections.
        iou_threshold: IOU threshold for NMS.
        line_thickness: Thickness of bounding box lines.
    """
    
    def __init__(
        self,
        model_path: str = r"runs\detect\train\weights\best.pt",
        conf_threshold: float = 0.5,
        iou_threshold: float = 0.5,
        line_thickness: int = 3,
        random_seed: Optional[int] = 42
    ):
        """
        Initialize the LivestockTracker.
        
        Args:
            model_path: Path to the trained RT-DETR model weights.
            conf_threshold: Confidence threshold for detections (0.0-1.0).
            iou_threshold: IOU threshold for NMS (0.0-1.0).
            line_thickness: Thickness of bounding box lines in pixels.
            random_seed: Seed for random color generation (None for random).
        """
        self.model_path = Path(model_path)
        if not self.model_path.exists():
            raise FileNotFoundError(f"Model file not found: {model_path}")
        
        self.conf_threshold = conf_threshold
        self.iou_threshold = iou_threshold
        self.line_thickness = line_thickness
        
        # Set random seed for reproducible colors
        if random_seed is not None:
            random.seed(random_seed)
        
        # Load model
        print(f"Loading RT-DETR model from: {self.model_path}")
        self.model = RTDETR(str(self.model_path))
        
        # Dictionary to store colors for each track ID
        self.track_colors: Dict[int, Tuple[int, int, int]] = {}
        
        # Counter display settings (larger for better visibility)
        self.counter_font = cv2.FONT_HERSHEY_SIMPLEX
        self.counter_font_scale = 3.0  # Increased from 1.2
        self.counter_font_thickness = 4  # Increased from 3
        self.counter_padding = 25  # Increased from 15
    
    @staticmethod
    def generate_random_color() -> Tuple[int, int, int]:
        """
        Generate a random vivid BGR color using HSV color space.
        
        Returns:
            Tuple of (B, G, R) values.
        """
        h = random.randint(0, 179)
        s = random.randint(180, 255)
        v = random.randint(180, 255)
        
        hsv_color = np.array([[[h, s, v]]], dtype=np.uint8)
        bgr_color = cv2.cvtColor(hsv_color, cv2.COLOR_HSV2BGR)[0][0]
        return (int(bgr_color[0]), int(bgr_color[1]), int(bgr_color[2]))
    
    def get_track_color(self, track_id: int) -> Tuple[int, int, int]:
        """
        Get the color for a specific track ID, generating a new one if needed.
        
        Args:
            track_id: The unique track identifier.
        
        Returns:
            BGR color tuple for the track.
        """
        if track_id not in self.track_colors:
            self.track_colors[track_id] = self.generate_random_color()
        return self.track_colors[track_id]
    
    def reset_tracks(self):
        """Clear all stored track colors."""
        self.track_colors.clear()
    
    def draw_counter(self, frame: np.ndarray, count: int) -> np.ndarray:
        """
        Draw the livestock count in the upper right corner of the frame.
        
        Args:
            frame: Input BGR frame.
            count: Number of detected livestock.
        
        Returns:
            Frame with counter overlay.
        """
        height, width = frame.shape[:2]
        count_text = f"Livestock: {count}"
        
        # Get text size for positioning
        (text_width, text_height), baseline = cv2.getTextSize(
            count_text, 
            self.counter_font, 
            self.counter_font_scale, 
            self.counter_font_thickness
        )
        
        # Position in upper right corner with padding
        text_x = width - text_width - self.counter_padding
        text_y = text_height + self.counter_padding
        
        # Background rectangle coordinates
        bg_x1 = text_x - 15
        bg_y1 = text_y - text_height - 15
        bg_x2 = width - self.counter_padding + 15
        bg_y2 = text_y + baseline + 10
        
        # Draw semi-transparent background
        overlay = frame.copy()
        cv2.rectangle(overlay, (bg_x1, bg_y1), (bg_x2, bg_y2), (0, 0, 0), -1)
        cv2.addWeighted(overlay, 0.7, frame, 0.3, 0, frame)
        
        # Draw border around counter
        cv2.rectangle(frame, (bg_x1, bg_y1), (bg_x2, bg_y2), (255, 255, 255), 2)
        
        # Draw count text
        cv2.putText(
            frame, count_text, (text_x, text_y),
            self.counter_font, self.counter_font_scale,
            (0, 255, 255), self.counter_font_thickness, cv2.LINE_AA  # Cyan color for visibility
        )
        
        return frame
    
    def draw_detections(
        self, 
        frame: np.ndarray, 
        boxes: np.ndarray, 
        track_ids: List[int]
    ) -> np.ndarray:
        """
        Draw bounding boxes for detected livestock.
        
        Args:
            frame: Input BGR frame.
            boxes: Array of bounding box coordinates (x1, y1, x2, y2).
            track_ids: List of track IDs corresponding to each box.
        
        Returns:
            Frame with drawn bounding boxes.
        """
        for box, track_id in zip(boxes, track_ids):
            color = self.get_track_color(track_id)
            x1, y1, x2, y2 = map(int, box[:4])
            
            # Draw bounding box (no label)
            cv2.rectangle(frame, (x1, y1), (x2, y2), color, self.line_thickness)
        
        return frame
    
    def process_frame(
        self, 
        frame: np.ndarray,
        result
    ) -> Tuple[np.ndarray, int]:
        """
        Process a single frame with tracking results.
        
        Args:
            frame: Input BGR frame.
            result: YOLO tracking result for the frame.
        
        Returns:
            Tuple of (annotated frame, livestock count).
        """
        boxes = result.boxes
        livestock_count = len(boxes) if boxes is not None else 0
        
        annotated_frame = frame.copy()
        
        if boxes is not None and len(boxes) > 0:
            xyxy = boxes.xyxy.cpu().numpy()
            
            if boxes.id is not None:
                track_ids = boxes.id.cpu().numpy().astype(int).tolist()
            else:
                track_ids = list(range(len(boxes)))
            
            # Draw detections
            annotated_frame = self.draw_detections(annotated_frame, xyxy, track_ids)
        
        # Draw counter
        annotated_frame = self.draw_counter(annotated_frame, livestock_count)
        
        return annotated_frame, livestock_count
    
    def track_video(
        self,
        video_path: str,
        output_dir: str = "results",
        show_preview: bool = False
    ) -> str:
        """
        Track livestock in a video file.
        
        Args:
            video_path: Path to the input video file.
            output_dir: Directory to save the output video.
            show_preview: Whether to show a preview window during processing.
        
        Returns:
            Path to the output video file.
        """
        # Validate video path
        video_path = Path(video_path)
        if not video_path.exists():
            raise FileNotFoundError(f"Video file not found: {video_path}")
        
        # Create output directory
        output_dir = Path(output_dir)
        output_dir.mkdir(parents=True, exist_ok=True)
        
        # Generate output filename
        output_filename = f"{video_path.stem}_tracked{video_path.suffix}"
        output_path = output_dir / output_filename
        
        # Reset track colors for new video
        self.reset_tracks()
        
        # Get video properties
        cap = cv2.VideoCapture(str(video_path))
        fps = int(cap.get(cv2.CAP_PROP_FPS))
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        cap.release()
        
        print(f"Video properties: {width}x{height} @ {fps}fps, {total_frames} frames")
        
        # Initialize video writer
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(str(output_path), fourcc, fps, (width, height))
        
        # Run tracking
        print("Starting livestock tracking...")
        results = self.model.track(
            source=str(video_path),
            conf=self.conf_threshold,
            iou=self.iou_threshold,
            stream=True,
            persist=True,
            verbose=False
        )
        
        # Reopen video for frame reading
        cap = cv2.VideoCapture(str(video_path))
        
        frame_count = 0
        for result in results:
            ret, frame = cap.read()
            if not ret:
                break
            
            frame_count += 1
            
            # Process frame
            annotated_frame, _ = self.process_frame(frame, result)
            
            # Write frame to output
            out.write(annotated_frame)
            
            # Show preview if requested
            if show_preview:
                # Resize for display if too large
                display_frame = annotated_frame
                if width > 1920:
                    scale = 1920 / width
                    display_frame = cv2.resize(
                        annotated_frame, 
                        (int(width * scale), int(height * scale))
                    )
                cv2.imshow("Livestock Tracking", display_frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
            
            # Progress update
            if frame_count % 30 == 0:
                print(f"Processed {frame_count}/{total_frames} frames...")
        
        # Cleanup
        cap.release()
        out.release()
        if show_preview:
            cv2.destroyAllWindows()
        
        print(f"\nTracking complete!")
        print(f"Output saved to: {output_path}")
        print(f"Total unique tracks: {len(self.track_colors)}")
        
        return str(output_path)
    
    def track_single_frame(
        self,
        frame: np.ndarray
    ) -> Tuple[np.ndarray, int]:
        """
        Track livestock in a single frame.
        
        Note: For continuous tracking across frames, use track_video() instead.
        This method is useful for custom video processing pipelines.
        
        Args:
            frame: Input BGR frame.
        
        Returns:
            Tuple of (annotated frame, livestock count).
        """
        results = self.model.track(
            source=frame,
            conf=self.conf_threshold,
            iou=self.iou_threshold,
            persist=True,
            verbose=False,
            track_high_thresh=0.5,         # Primary detection threshold
            track_low_thresh=0.1,          # Secondary low-score matching
            track_buffer=120,              # Frames to keep lost tracks (4s @30fps)
            match_thresh=0.8               # IoU threshold for associations
        )
        
        return self.process_frame(frame, results[0])
    
    def set_counter_style(
        self,
        font_scale: float = 2.0,
        font_thickness: int = 4,
        padding: int = 25
    ):
        """
        Customize the counter display style.
        
        Args:
            font_scale: Scale factor for the font size.
            font_thickness: Thickness of the font.
            padding: Padding from the edge of the frame.
        """
        self.counter_font_scale = font_scale
        self.counter_font_thickness = font_thickness
        self.counter_padding = padding
    
    def __repr__(self) -> str:
        return (
            f"LivestockTracker("
            f"model='{self.model_path.name}', "
            f"conf={self.conf_threshold}, "
            f"iou={self.iou_threshold}, "
            f"tracks={len(self.track_colors)})"
        )


In [17]:
monitor = LivestockMonitor(
        model_path=r"runs\detect\train\weights\best.pt",
        conf_threshold=0.5,
        iou_threshold=0.8,
        line_thickness=3
    )

Loading RT-DETR model from: runs\detect\train\weights\best.pt


In [18]:
output = monitor.track_video(
        video_path=r"livestocks_sample_video\sample3.mp4",
        output_dir="results",
        show_preview=False
    )
    
print(f"\nOutput video: {output}")
print(f"Tracker info: {monitor}")

Video properties: 3840x2160 @ 23fps, 592 frames
Starting livestock tracking...
Processed 30/592 frames...
Processed 60/592 frames...
Processed 90/592 frames...
Processed 120/592 frames...
Processed 150/592 frames...
Processed 180/592 frames...
Processed 210/592 frames...
Processed 240/592 frames...
Processed 270/592 frames...
Processed 300/592 frames...
Processed 330/592 frames...
Processed 360/592 frames...
Processed 390/592 frames...
Processed 420/592 frames...
Processed 450/592 frames...
Processed 480/592 frames...
Processed 510/592 frames...
Processed 540/592 frames...
Processed 570/592 frames...

Tracking complete!
Output saved to: results\sample3_tracked.mp4
Total unique tracks: 217

Output video: results\sample3_tracked.mp4
Tracker info: LivestockTracker(model='best.pt', conf=0.5, iou=0.8, tracks=217)


---

## üë®‚Äçüíª About Labellerr's Hands-On Learning in Computer Vision

Thank you for exploring this **Labellerr Hands-On Computer Vision Cookbook**! We hope this notebook helped you learn, prototype, and accelerate your vision projects.  
Labellerr provides ready-to-run Jupyter/Colab notebooks for the latest models and real-world use cases in computer vision, AI agents, and data annotation.

---
## üßë‚Äçüî¨ Check Our Popular Youtube Videos

Whether you're a beginner or a practitioner, our hands-on training videos are perfect for learning custom model building, computer vision techniques, and applied AI:

- [How to Fine-Tune YOLO on Custom Dataset](https://www.youtube.com/watch?v=pBLWOe01QXU)  
  Step-by-step guide to fine-tuning YOLO for real-world use‚Äîenvironment setup, annotation, training, validation, and inference.
- [Build a Real-Time Intrusion Detection System with YOLO](https://www.youtube.com/watch?v=kwQeokYDVcE)  
  Create an AI-powered system to detect intruders in real time using YOLO and computer vision.
- [Finding Athlete Speed Using YOLO](https://www.youtube.com/watch?v=txW0CQe_pw0)  
  Estimate real-time speed of athletes for sports analytics.
- [Object Counting Using AI](https://www.youtube.com/watch?v=smsjBBQcIUQ)  
  Learn dataset curation, annotation, and training for robust object counting AI applications.
---

## üé¶ Popular Labellerr YouTube Videos

Level up your skills and see video walkthroughs of these tools and notebooks on the  
[Labellerr YouTube Channel](https://www.youtube.com/@Labellerr/videos):

- [How I Fixed My Biggest Annotation Nightmare with Labellerr](https://www.youtube.com/watch?v=hlcFdiuz_HI) ‚Äì Solving complex annotation for ML engineers.
- [Explore Your Dataset with Labellerr's AI](https://www.youtube.com/watch?v=LdbRXYWVyN0) ‚Äì Auto-tagging, object counting, image descriptions, and dataset exploration.
- [Boost AI Image Annotation 10X with Labellerr's CLIP Mode](https://www.youtube.com/watch?v=pY_o4EvYMz8) ‚Äì Refine annotations with precision using CLIP mode.
- [Boost Data Annotation Accuracy and Efficiency with Active Learning](https://www.youtube.com/watch?v=lAYu-ewIhTE) ‚Äì Speed up your annotation workflow using Active Learning.

> üëâ **Subscribe** for Labellerr's deep learning, annotation, and AI tutorials, or watch videos directly alongside notebooks!

---

## ü§ù Stay Connected

- **Website:** [https://www.labellerr.com/](https://www.labellerr.com/)
- **Blog:** [https://www.labellerr.com/blog/](https://www.labellerr.com/blog/)
- **GitHub:** [Labellerr/Hands-On-Learning-in-Computer-Vision](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)
- **LinkedIn:** [Labellerr](https://in.linkedin.com/company/labellerr)
- **Twitter/X:** [@Labellerr1](https://x.com/Labellerr1)

*Happy learning and building with Labellerr!*
