# People Detection on Bus using RF-DETR Nano

This notebook demonstrates a complete computer vision pipeline for detecting people on buses using the RF-DETR (Real-time DEtection TRansformer) Nano model. The pipeline includes:

1. **Environment Setup**: Installing required dependencies with compatible versions
2. **Dataset Integration**: Downloading a labeled bus dataset from Roboflow
3. **Model Training**: Training an RF-DETR Nano model for people detection
4. **Model Export**: Packaging the trained model for deployment
5. **Video Inference**: Running real-time detection on bus video footage

**Use Case**: This system can be used for:
- Passenger counting in public transportation
- Safety monitoring and capacity management
- Analytics for bus route optimization
- Automated surveillance systems

**Model Architecture**: RF-DETR Nano is a lightweight, real-time object detection model that balances speed and accuracy, making it ideal for edge deployment scenarios.

## 1. Install suitable modules

In [None]:
!pip uninstall numpy scipy supervision -y

**Version Compatibility Notice**: We need compatible versions of numpy and scipy for proper model training. The RF-DETR framework requires specific versions to avoid dependency conflicts and ensure stable training performance.

In [None]:
!pip install numpy==1.23.5 scipy==1.10.1 supervision==0.17.1

In [None]:
!pip install "rfdetr[metrics,onnxexport]" roboflow

In [None]:
!pip show numpy scipy supervision | grep -E 'Name:|Version:'

# 2. Install the labeled dataset from Roboflow

**Dataset Overview**: We're using a custom-labeled dataset from Roboflow that contains bus images with people annotations. The dataset is in COCO format, which is compatible with RF-DETR training requirements.

**Dataset Details**:
- **Version**: 4 (latest annotated version)
- **Format**: COCO (Common Objects in Context)

In [None]:
from roboflow import Roboflow
rf = Roboflow(api_key="YOUR-API-KEY")
project = rf.workspace("test-yiwln").project("bus-labeling-zedej")
version = project.version(4)
dataset = version.download("coco")

In [None]:
DATASET_DIR_NAME = dataset.location.split('/')[-1]
!ls -l {DATASET_DIR_NAME}

# 3. Training the RF-DETR Nano Model

**Training Configuration**:
- **Model**: RF-DETR Nano (optimized for speed and efficiency)
- **Epochs**: 20 (with early stopping for optimal convergence)
- **Batch Size**: 4 (adjusted for GPU memory constraints)
- **Learning Rate**: 1e-4 (fine-tuned for stable convergence)
- **Gradient Accumulation**: 4 steps (effective batch size = 16)
- **Early Stopping**: Enabled with 3-epoch patience to prevent overfitting

**Features**:
- TensorBoard logging for training monitoring
- CUDA acceleration for faster training
- Automatic best checkpoint saving

In [None]:
from rfdetr import RFDETRNano
import os

# Setting the path
DATASET_NAME = DATASET_DIR_NAME
DATASET_PATH = f"/kaggle/working/{DATASET_NAME}"
OUTPUT_PATH = "/kaggle/working/rfdetr_results"

os.makedirs(OUTPUT_PATH, exist_ok=True)

# We use the Nano version of RF-DETR
model = RFDETRNano()

# Training phase
try:
    model.train(
        dataset_dir=DATASET_PATH,
        output_dir=OUTPUT_PATH,
        epochs=20,
        batch_size=4,
        grad_accum_steps=4,
        lr=1e-4,
        early_stopping=True,
        early_stopping_patience=3,
        tensorboard=True,         
        device="cuda"             
    )
    print("Done Training!")

except Exception as e:
    print(f"Error: {e}")

# 4. Export Training Results

**Model Packaging**: Compress the training results including:
- Best model checkpoint (`checkpoint_best_total.pth`)
- Training logs and metrics
- TensorBoard event files
- Model configuration files

This ZIP file can be downloaded for deployment or further analysis.

In [None]:
!zip -r /kaggle/working/rfdetr_results.zip /kaggle/working/rfdetr_results

# 5. Running Video Inference

**Real-time Detection Pipeline**: Apply the trained model to process bus video footage for people detection.

**Pipeline Components**:
1. **Model Loading**: Load the best trained checkpoint
2. **Class Mapping**: Extract class names from COCO annotations
3. **Video Processing**: Frame-by-frame detection with confidence threshold (0.5)
4. **Visualization**: Draw bounding boxes and confidence scores
5. **Output**: Generate annotated video showing detected people

**Input Requirements**:
- Source video: Bus footage for analysis
- Trained model checkpoint
- COCO annotation file for class names

**Output**: Processed video with bounding boxes around detected people, including class labels and confidence scores.

In [None]:
import os
import json
import supervision as sv
import cv2
from rfdetr import RFDETRNano 
from supervision.draw.color import Color
from supervision.draw.utils import draw_text

# Setting paths
CHECKPOINT_PATH = "/kaggle/working/rfdetr_results/checkpoint_best_total.pth"
ANNOTATION_FILE_PATH = "/kaggle/working/Bus-Labeling-4/train/_annotations.coco.json"
SOURCE_VIDEO_PATH = "/kaggle/input/test-video-1/2015_05_09_21_38_22BackColor.mp4"
TARGET_VIDEO_PATH = "/kaggle/working/result_video.mp4"

# Checking paths
if not os.path.exists(SOURCE_VIDEO_PATH):
    print(f"Error: Video doesn't exist at '{SOURCE_VIDEO_PATH}'")
else:
    class_names = []
    if not os.path.exists(ANNOTATION_FILE_PATH):
        print(f"Cannot find: {ANNOTATION_FILE_PATH}")
    else:
        with open(ANNOTATION_FILE_PATH, 'r') as f:
            coco_data = json.load(f)
        categories = sorted(coco_data['categories'], key=lambda x: x['id'])
        class_names = [cat['name'] for cat in categories]

    # Load pretrained model
    model = RFDETRNano(pretrain_weights=CHECKPOINT_PATH)

    # Box drawing
    box_annotator = sv.BoxAnnotator()

    def process_frame(frame: cv2.Mat, _) -> cv2.Mat:
        detections = model.predict(frame, confidence=0.5)
        labels = []
        if class_names:
            labels = [
                f"{class_names[class_id]} {confidence:.2f}"
                for class_id, confidence in zip(detections.class_id, detections.confidence)
            ]
        
        annotated_frame = box_annotator.annotate(
            scene=frame.copy(),
            detections=detections,
        )
        return annotated_frame

    # Run inference
    sv.process_video(
        source_path=SOURCE_VIDEO_PATH,
        target_path=TARGET_VIDEO_PATH,
        callback=process_frame
    )
    print("Complete!!!")