# 🌊 YOLO-UDD v2.0 - Underwater Debris Detection

**Complete Training Pipeline on Google Colab with GPU** ⚡ **OPTIMIZED**

## 🚀 Quick Start:
1. **Upload Dataset**: Upload your TrashCAN dataset to Google Drive (COCO format supported)
2. **Enable GPU**: Runtime → Change runtime type → GPU (T4 or better)
3. **Update Path**: Dataset paths are pre-configured for `/content/drive/My Drive/trashcan_dataset/`
4. **Run All**: Runtime → Run all (or run cells sequentially)
5. **Monitor**: Training takes ~30-60 minutes depending on GPU and dataset size

## ⚡ Performance Optimizations:
- **No dataset copying**: Uses symlinks (saves 5-10 minutes setup time)
- **Direct Drive training**: Results saved directly to Drive in real-time
- **Auto-save checkpoints**: Every epoch saved automatically
- **Session-safe**: Results preserved even if Colab disconnects

## 📋 Prerequisites:
- TrashCAN dataset uploaded to Google Drive at: `/content/drive/My Drive/trashcan_dataset/`
- Dataset structure: COCO format with `instances_train_trashcan.json` and `instances_val_trashcan.json`
- Images in `original_data/images/` folder
- Sufficient Drive storage (~2-3 GB for dataset + checkpoints)

---

## Step 1: Setup Environment

In [None]:
# Clone repository
import os
import sys

# Ensure we're in /content directory first
try:
    os.chdir('/content')
except:
    pass

# Remove existing directory if present
if os.path.exists('/content/YOLO-UDD-v2.0'):
    import shutil
    shutil.rmtree('/content/YOLO-UDD-v2.0')
    print("✓ Cleaned existing directory")

# Clone fresh
print("Cloning repository...")
!git clone https://github.com/kshitijkhede/YOLO-UDD-v2.0.git /content/YOLO-UDD-v2.0

# Verify clone succeeded
if not os.path.exists('/content/YOLO-UDD-v2.0'):
    raise FileNotFoundError("Failed to clone repository. Please check your internet connection.")

# Change to repo directory
os.chdir('/content/YOLO-UDD-v2.0')
sys.path.insert(0, '/content/YOLO-UDD-v2.0')

print("\n" + "="*60)
print("✓ Repository cloned successfully!")
print(f"✓ Working directory: {os.getcwd()}")
print("="*60)

Cloning into 'YOLO-UDD-v2.0'...
remote: Enumerating objects: 113, done.[K
remote: Counting objects: 100% (113/113), done.[K
remote: Compressing objects: 100% (79/79), done.[K
remote: Total 113 (delta 43), reused 99 (delta 31), pack-reused 0 (from 0)[K
Receiving objects: 100% (113/113), 83.24 KiB | 3.62 MiB/s, done.
Resolving deltas: 100% (43/43), done.
/content/YOLO-UDD-v2.0

✓ Repository cloned successfully!


In [None]:
# Verify repository structure
import os

print("="*60)
print("📂 Repository Structure")
print("="*60)

required_dirs = ['models', 'scripts', 'data', 'utils', 'configs']
required_files = ['requirements.txt', 'models/__init__.py', 'scripts/train.py']

for dir_name in required_dirs:
    status = "✓" if os.path.exists(dir_name) else "✗"
    print(f"{status} {dir_name}/")

print()
for file_name in required_files:
    status = "✓" if os.path.exists(file_name) else "✗"
    print(f"{status} {file_name}")

print("="*60)

# List models package
if os.path.exists('models'):
    print("\n📦 Models package contents:")
    for item in os.listdir('models'):
        print(f"  - {item}")
print("="*60)

In [2]:
# Check GPU availability
import torch

print("="*60)
print("GPU Status Check")
print("="*60)

if torch.cuda.is_available():
    print(f"✓ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"✓ CUDA Version: {torch.version.cuda}")
    print(f"✓ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("⚠️  No GPU detected!")
    print("   Go to: Runtime → Change runtime type → GPU")

!nvidia-smi --query-gpu=name,driver_version,memory.total --format=csv

GPU Status Check
✓ GPU Available: Tesla T4
✓ CUDA Version: 12.6
✓ GPU Memory: 15.83 GB
name, driver_version, memory.total [MiB]
Tesla T4, 550.54.15, 15360 MiB


## Step 1.5: Mount Google Drive & Download Dataset

In [None]:
# Mount Google Drive and download dataset
from google.colab import drive
import os
import json

# Mount Drive
print("="*60)
print("Mounting Google Drive...")
print("="*60)
drive.mount('/content/drive')

# Download dataset from Google Drive using gdown
GDRIVE_FILE_ID = '10PCbGqgVi0-XQn0EfGTTfSjwNS0JXR99'

print("\n" + "="*60)
print("📦 Downloading TrashCAN Dataset from Google Drive...")
print("="*60)

!pip install -q gdown
!gdown --id {GDRIVE_FILE_ID} -O /content/trashcan.zip

# Verify download
if os.path.exists('/content/trashcan.zip'):
    file_size = os.path.getsize('/content/trashcan.zip') / 1024 / 1024
    print(f"\n✓ Downloaded: {file_size:.1f} MB")
    
    # Extract dataset
    print("\n" + "="*60)
    print("📂 Extracting dataset...")
    print("="*60)
    !unzip -q /content/trashcan.zip -d /content/
    
    # Verify extraction
    if os.path.exists('/content/trashcan'):
        print("✅ Dataset extracted successfully!")
        
        # Show structure
        print("\n" + "="*60)
        print("📋 Dataset Structure:")
        print("="*60)
        for item in sorted(os.listdir('/content/trashcan')):
            path = f'/content/trashcan/{item}'
            if os.path.isdir(path):
                count = len(os.listdir(path))
                print(f"  📁 {item}/ ({count} items)")
            else:
                size = os.path.getsize(path) / 1024
                print(f"  📄 {item} ({size:.1f} KB)")
        
        # Verify annotations
        print("\n" + "="*60)
        print("🔍 Verifying Annotations:")
        print("="*60)
        
        for split in ['train', 'val']:
            json_path = f'/content/trashcan/annotations/{split}.json'
            if os.path.exists(json_path):
                with open(json_path) as f:
                    data = json.load(f)
                
                imgs = len(data.get('images', []))
                anns = len(data.get('annotations', []))
                cats = len(data.get('categories', []))
                
                print(f"\n  {split.upper()}:")
                print(f"    Images:      {imgs:,}")
                print(f"    Annotations: {anns:,}")
                print(f"    Categories:  {cats}")
                
                if anns > 0:
                    print(f"    ✅ Ready for training!")
                else:
                    print(f"    ❌ ERROR: No annotations found!")
            else:
                print(f"\n  ❌ {split.upper()}: File not found!")
        
        print("\n" + "="*60)
        print("✅ Dataset setup complete!")
        print("="*60)
    else:
        print("❌ Extraction failed!")
        print("   Please check the ZIP file structure")
else:
    print("❌ Download failed!")
    print("   Please check the File ID and internet connection")

In [None]:
# Move dataset to correct location for training scripts
import os
import shutil

print("="*60)
print("📁 Moving dataset to correct location...")
print("="*60)

# Create the expected directory structure
os.makedirs('/content/YOLO-UDD-v2.0/data', exist_ok=True)

# Check if dataset is in /content/trashcan
if os.path.exists('/content/trashcan'):
    print("✓ Found dataset in /content/trashcan")
    
    # Check if target exists and remove it
    target = '/content/YOLO-UDD-v2.0/data/trashcan'
    if os.path.exists(target):
        print("✓ Removing old dataset at target location...")
        shutil.rmtree(target)
    
    # Move dataset to correct location
    print("✓ Moving dataset to correct location...")
    shutil.move('/content/trashcan', target)
    
    # Verify the move
    if os.path.exists(target):
        print(f"\n✅ Dataset moved successfully to: {target}")
        
        # Verify structure
        if os.path.exists(f'{target}/images'):
            imgs = [f for f in os.listdir(f'{target}/images') if f.endswith('.jpg')]
            print(f"   📸 Images: {len(imgs):,} files")
        
        if os.path.exists(f'{target}/annotations'):
            anns = os.listdir(f'{target}/annotations')
            print(f"   📋 Annotations: {len(anns)} files")
            
            # Verify JSON contents
            import json
            for split in ['train', 'val']:
                json_path = f'{target}/annotations/{split}.json'
                if os.path.exists(json_path):
                    with open(json_path) as f:
                        data = json.load(f)
                    print(f"   ✓ {split}.json: {len(data.get('annotations', [])):,} annotations")
        
        print("\n✅ Dataset is ready for training!")
    else:
        print("❌ Move failed!")
        
elif os.path.exists('/content/YOLO-UDD-v2.0/data/trashcan'):
    print("✅ Dataset already in correct location!")
    target = '/content/YOLO-UDD-v2.0/data/trashcan'
    
    # Verify it's good
    if os.path.exists(f'{target}/images'):
        imgs = [f for f in os.listdir(f'{target}/images') if f.endswith('.jpg')]
        print(f"   📸 Images: {len(imgs):,} files")
    
    if os.path.exists(f'{target}/annotations/train.json'):
        print(f"   ✓ train.json exists")
    if os.path.exists(f'{target}/annotations/val.json'):
        print(f"   ✓ val.json exists")
    
    print("\n✅ Dataset is ready for training!")
else:
    print("❌ Dataset not found!")
    print("   Please re-run the dataset download cell above")

print("="*60)

## Step 2: Install Dependencies

In [3]:
# Install all required packages
print("Installing dependencies...")

!pip install -q torch torchvision torchaudio
!pip install -q albumentations opencv-python-headless
!pip install -q tensorboard pyyaml tqdm

# Verify installations
import torch
import albumentations as A
import cv2
import yaml
from tqdm import tqdm

print("\n" + "="*60)
print("✓ All dependencies installed successfully!")
print(f"✓ PyTorch: {torch.__version__}")
print(f"✓ Albumentations: {A.__version__}")
print(f"✓ OpenCV: {cv2.__version__}")
print("="*60)

Installing dependencies...

✓ All dependencies installed successfully!
✓ PyTorch: 2.8.0+cu126
✓ Albumentations: 2.0.8
✓ OpenCV: 4.12.0


## Step 4: Test Model Architecture

## Step 3.5: Check GPU Memory & Clear Cache (Important for Training!)

In [None]:
# Check GPU memory status and clear cache
import torch
import gc
import subprocess

print("="*60)
print("🔍 GPU Memory Status Check")
print("="*60)

if torch.cuda.is_available():
    # Get GPU info
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory_total = torch.cuda.get_device_properties(0).total_memory / 1024**3
    gpu_memory_allocated = torch.cuda.memory_allocated(0) / 1024**3
    gpu_memory_reserved = torch.cuda.memory_reserved(0) / 1024**3
    gpu_memory_free = gpu_memory_total - (gpu_memory_reserved)
    
    print(f"GPU Device:       {gpu_name}")
    print(f"Total Memory:     {gpu_memory_total:.2f} GB")
    print(f"Allocated:        {gpu_memory_allocated:.2f} GB")
    print(f"Reserved:         {gpu_memory_reserved:.2f} GB")
    print(f"Free:             {gpu_memory_free:.2f} GB")
    
    # Check for other processes using GPU
    print("\n" + "-"*60)
    print("Checking for other GPU processes...")
    try:
        result = subprocess.run(['nvidia-smi', '--query-compute-apps=pid,used_memory', '--format=csv,noheader,nounits'], 
                              capture_output=True, text=True, timeout=5)
        if result.stdout.strip():
            print("⚠️  Other processes using GPU:")
            print(result.stdout)
        else:
            print("✓ No other GPU processes detected")
    except:
        print("✓ Could not check GPU processes (this is OK)")
    
    # Clear GPU cache
    print("\n" + "-"*60)
    print("Clearing GPU cache...")
    torch.cuda.empty_cache()
    gc.collect()
    
    # Show memory after clearing
    gpu_memory_allocated_after = torch.cuda.memory_allocated(0) / 1024**3
    gpu_memory_reserved_after = torch.cuda.memory_reserved(0) / 1024**3
    gpu_memory_free_after = gpu_memory_total - gpu_memory_reserved_after
    
    print(f"After clearing:")
    print(f"  Allocated:      {gpu_memory_allocated_after:.2f} GB (freed {gpu_memory_allocated - gpu_memory_allocated_after:.2f} GB)")
    print(f"  Reserved:       {gpu_memory_reserved_after:.2f} GB (freed {gpu_memory_reserved - gpu_memory_reserved_after:.2f} GB)")
    print(f"  Free:           {gpu_memory_free_after:.2f} GB")
    
    # Warning if low memory
    if gpu_memory_free_after < 8.0:
        print("\n⚠️  WARNING: Low GPU memory available!")
        print("   You may encounter out-of-memory errors during training.")
        print("   Solutions:")
        print("   1. Restart runtime (Runtime → Restart runtime)")
        print("   2. Use smaller batch size (batch_size=2 or 4)")
        print("   3. Close other GPU-heavy notebooks")
    else:
        print("\n✓ GPU memory looks good for training!")
    
else:
    print("⚠️  GPU not available - will use CPU (slower)")

print("="*60)

In [None]:
# Import and test the model
import sys
import os

# Ensure we're in the repo directory and add to path
repo_root = '/content/YOLO-UDD-v2.0'
os.chdir(repo_root)
sys.path.insert(0, repo_root)

from models import build_yolo_udd
import torch

print("Building YOLO-UDD v2.0 model...\n")

# Build model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = build_yolo_udd(num_classes=22, pretrained=None)
model = model.to(device)

# Get model info
model_info = model.get_model_info()

print("="*60)
print("YOLO-UDD v2.0 Model Information")
print("="*60)
for key, value in model_info.items():
    print(f"{key}: {value}")
print("="*60)

# Test forward pass
print("\nTesting forward pass...")
dummy_input = torch.randn(2, 3, 640, 640).to(device)

with torch.no_grad():
    predictions, turb_score = model(dummy_input)

print(f"\n✓ Forward pass successful!")
print(f"✓ Number of detection scales: {len(predictions)}")
print(f"✓ Turbidity score shape: {turb_score.shape}")
print(f"✓ Device: {device}")
print("\n" + "="*60)

Building YOLO-UDD v2.0 model...

YOLO-UDD v2.0 Model Information
Architecture: YOLO-UDD v2.0
Backbone: YOLOv9c
Neck: PSEM-enhanced PANet + TAFM
Head: SDWH
Total Parameters: 60,627,051
Trainable Parameters: 60,627,051
Input Size: 640x640
Output Classes: 3

Testing forward pass...

✓ Forward pass successful!
✓ Number of detection scales: 3
✓ Turbidity score shape: torch.Size([2, 1, 1, 1])
✓ Device: cuda



## Step 5: Configure Training Parameters

In [None]:
# Training configuration - Save to Drive
import os
from datetime import datetime
import torch
import gc

# Clear GPU cache before training
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    gc.collect()
    print("✓ Cleared GPU cache")

# OPTIMIZED SETTINGS for faster training
BATCH_SIZE = 8       # Increased from 4 (faster training, still safe for most GPUs)
EPOCHS = 100         # Reduced for faster training (still good results)
LEARNING_RATE = 0.01
NUM_WORKERS = 4      # Increased from 2 for faster data loading

# Save directly to Google Drive (preserves results even if session disconnects!)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
SAVE_DIR = f'/content/drive/MyDrive/YOLO-UDD-Results/run_{timestamp}'
os.makedirs(SAVE_DIR, exist_ok=True)

print("="*60)
print("🚀 Training Configuration (SPEED OPTIMIZED)")
print("="*60)
print(f"Batch Size:     {BATCH_SIZE} (balanced for speed & memory)")
print(f"Epochs:         {EPOCHS}")
print(f"Learning Rate:  {LEARNING_RATE}")
print(f"Num Workers:    {NUM_WORKERS} (increased for faster data loading)")
print(f"Save Directory: {SAVE_DIR}")
print(f"Device:         {'GPU (CUDA)' if torch.cuda.is_available() else 'CPU'}")
if torch.cuda.is_available():
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    gpu_allocated = torch.cuda.memory_allocated(0) / 1024**3
    gpu_reserved = torch.cuda.memory_reserved(0) / 1024**3
    gpu_free = gpu_memory - gpu_reserved
    print(f"GPU Memory:     {gpu_memory:.2f} GB total, {gpu_free:.2f} GB free")
    
    # Auto-adjust batch size based on available memory
    if gpu_free < 6.0:
        print("\n⚠️  Low GPU memory detected! Reducing batch size to 4...")
        BATCH_SIZE = 4
    elif gpu_free > 10.0:
        print("\n✓ High GPU memory available! Batch size 8 is optimal")
    
print("\n💾 Intermediate results will be saved to Drive automatically!")
print("   ✓ Checkpoints saved after each epoch")
print("   ✓ TensorBoard logs updated in real-time")
print("   ✓ Results preserved even if session disconnects")
print("\n⚡ Speed Optimizations Applied:")
print("   ✓ Batch size: 8 (faster than 4, safer than 16)")
print("   ✓ Workers: 4 (parallel data loading)")
print("   ✓ GPU cache cleared before training")
print("\n⏱️  Estimated Training Time:")
iterations_per_epoch = 5769 // BATCH_SIZE
total_time_minutes = (iterations_per_epoch * EPOCHS * 0.5) / 60  # ~0.5s per iteration
print(f"   • {iterations_per_epoch} iterations/epoch")
print(f"   • ~{total_time_minutes:.1f} minutes for {EPOCHS} epochs on GPU")
print("="*60)

🚀 Training Configuration
Batch Size:     16
Epochs:         10
Learning Rate:  0.01
Num Workers:    2
Save Directory: runs/train
Device:         GPU (CUDA)


## Step 6: Start Training 🎯

**This will take approximately:**
- GPU (T4): ~90-150 minutes for 300 epochs (full training)
- GPU (V100): ~45-75 minutes for 300 epochs
- GPU (A100): ~20-40 minutes for 300 epochs

## Step 5.75: Pre-Training Checklist (Run this before training to avoid slowdowns!) ⚡

## Step 5.5: Verify Dataset (Optional - Run if training fails)

In [None]:
# Diagnose Dataset Issues
import os
import json

print("="*60)
print("🔍 Dataset Diagnostic")
print("="*60)

# Check dataset structure
target_path = '/content/YOLO-UDD-v2.0/data/trashcan'
print(f"\n📂 Dataset location: {target_path}")
print(f"   Exists: {os.path.exists(target_path)}")

if os.path.exists(target_path):
    print(f"\n📋 Contents:")
    for item in os.listdir(target_path):
        path = os.path.join(target_path, item)
        if os.path.isdir(path):
            count = len(os.listdir(path))
            print(f"   📁 {item}/ ({count} items)")
        else:
            size = os.path.getsize(path) / 1024
            print(f"   📄 {item} ({size:.1f} KB)")

# Check JSON files
print(f"\n🔎 Analyzing COCO JSON files:")
for split in ['train', 'val']:
    # Try the correct path first: annotations/train.json or annotations/val.json
    json_path = os.path.join(target_path, 'annotations', f'{split}.json')
    
    # If not found, try the old naming convention
    if not os.path.exists(json_path):
        json_path = os.path.join(target_path, f'instances_{split}_trashcan.json')
    
    if os.path.exists(json_path):
        with open(json_path, 'r') as f:
            data = json.load(f)
        
        images = data.get('images', [])
        annotations = data.get('annotations', [])
        categories = data.get('categories', [])
        
        print(f"\n  {split.upper()}:")
        print(f"    JSON file: {os.path.basename(json_path)} ✅")
        print(f"    Images:      {len(images):,}")
        print(f"    Annotations: {len(annotations):,}")
        print(f"    Categories:  {len(categories)}")
        
        if len(annotations) == 0:
            print(f"    ⚠️  WARNING: No annotations found!")
            if len(images) > 0:
                print(f"       Dataset has {len(images)} images but 0 annotations")
                print(f"       This will cause training to fail")
        else:
            print(f"    ✅ Ready for training!")
        
        if len(categories) > 0:
            cat_names = [c.get('name', 'unknown') for c in categories[:5]]
            print(f"    Sample categories: {cat_names}")
        
        # Check image paths
        if len(images) > 0:
            sample_img = images[0]
            img_filename = sample_img.get('file_name', '')
            img_path = os.path.join(target_path, 'images', img_filename)
            print(f"    Sample image: {img_filename}")
            print(f"    Image exists: {os.path.exists(img_path)}")
    else:
        print(f"\n  {split.upper()}: ❌ JSON file not found")
        print(f"    Expected: annotations/{split}.json")
        print(f"    Or: instances_{split}_trashcan.json")

# Check images directory
images_dir = os.path.join(target_path, 'images')
if os.path.exists(images_dir):
    all_files = os.listdir(images_dir)
    image_files = [f for f in all_files if f.endswith(('.jpg', '.png', '.jpeg'))]
    print(f"\n📸 Images directory:")
    print(f"    Total files: {len(all_files)}")
    print(f"    Image files: {len(image_files)}")
    if len(image_files) > 0:
        print(f"    Samples: {image_files[:3]}")

print("\n" + "="*60)
print("💡 Diagnostic complete!")
print("   ✓ If you see 'Ready for training!' above, you're good to go!")
print("   ⚠️  If annotations = 0, check your annotation files")
print("="*60)

In [None]:
# Run training with progress monitoring
import subprocess
import sys
import time

# Build command - CORRECTED VERSION
cmd = [
    sys.executable, 'scripts/train.py',
    '--config', 'configs/train_config.yaml',
    '--data-dir', 'data/trashcan',
    '--batch-size', str(BATCH_SIZE),
    '--epochs', str(EPOCHS),
    '--lr', str(LEARNING_RATE),  # ← Use --lr not --learning-rate
    '--save-dir', SAVE_DIR
]
# NOTE: NO --num-workers argument - it's set in the config file!

print("\n" + "="*60)
print("🚀 Starting Training...")
print("="*60)
print(f"Command: {' '.join(cmd)}")
print("="*60 + "\n")

# Track start time
start_time = time.time()

# Run training with full output (to see any errors)
try:
    result = subprocess.run(cmd, check=True)
    
    # Calculate elapsed time
    elapsed_time = time.time() - start_time
    elapsed_minutes = elapsed_time / 60
    
    print("\n" + "="*60)
    print("✓ Training completed successfully!")
    print(f"✓ Total time: {elapsed_minutes:.1f} minutes ({elapsed_time:.0f} seconds)")
    print("="*60)
    
except subprocess.CalledProcessError as e:
    elapsed_time = time.time() - start_time
    elapsed_minutes = elapsed_time / 60
    
    print("\n" + "="*60)
    print("⚠️  Training ended with errors")
    print(f"⚠️  Exit code: {e.returncode}")
    print(f"⚠️  Time before error: {elapsed_minutes:.1f} minutes")
    print("="*60)
    
    # Provide specific troubleshooting based on common errors
    print("\n💡 Troubleshooting Tips:")
    if "out of memory" in str(e).lower() or e.returncode == 1:
        print("   1. Reduce batch size to 4 or 2")
        print("   2. Restart runtime: Runtime → Restart runtime")
        print("   3. Check GPU status")
    elif "file not found" in str(e).lower():
        print("   1. Run diagnostic cell to verify dataset")
        print("   2. Check dataset paths")
    else:
        print("   1. Scroll up to see the full error message")
        print("   2. Check GPU memory")
        print("   3. Verify dataset")
    
    print("\n💡 Tip: If taking too long, check:")
    print("   • GPU is enabled (Runtime → Change runtime type → GPU)")
    print("   • No other processes using GPU")
    print("   • Dataset images are loading correctly")
    
    raise


🚀 Starting Training...
Command: /usr/bin/python3 scripts/train.py --config configs/train_config.yaml --data-dir data/trashcan --batch-size 16 --epochs 10 --lr 0.01 --save-dir runs/train


⚠️  Training ended with errors


## Step 7: View Training Results 📊

In [8]:
import os
import glob

# Check for checkpoints
checkpoint_dir = f'{SAVE_DIR}/checkpoints'

print("="*60)
print("Training Results")
print("="*60)

if os.path.exists(checkpoint_dir):
    checkpoints = glob.glob(f"{checkpoint_dir}/*.pt")
    if checkpoints:
        print("\n📦 Available Checkpoints:")
        for ckpt in sorted(checkpoints):
            size_mb = os.path.getsize(ckpt) / (1024 * 1024)
            print(f"   {os.path.basename(ckpt)}: {size_mb:.2f} MB")
    else:
        print("⚠️  No checkpoints found")
else:
    print("⚠️  Checkpoint directory not found")

# Show directory structure
print("\n📁 Output Directory Structure:")
!ls -lh {SAVE_DIR}/ 2>/dev/null || echo "Directory not found"

print("\n" + "="*60)

Training Results
⚠️  Checkpoint directory not found

📁 Output Directory Structure:
Directory not found



## Step 8: Launch TensorBoard

In [None]:
# Load TensorBoard
%load_ext tensorboard
%tensorboard --logdir {SAVE_DIR}/logs

## Step 9: Download Trained Model

In [None]:
from google.colab import files
import os

best_model = f'{SAVE_DIR}/checkpoints/best.pt'
latest_model = f'{SAVE_DIR}/checkpoints/latest.pt'

print("="*60)
print("Download Models")
print("="*60)

if os.path.exists(best_model):
    size_mb = os.path.getsize(best_model) / (1024 * 1024)
    print(f"\n📥 Downloading best.pt ({size_mb:.2f} MB)...")
    files.download(best_model)
    print("✓ Download complete!")
else:
    print("\n⚠️  best.pt not found")

if os.path.exists(latest_model):
    size_mb = os.path.getsize(latest_model) / (1024 * 1024)
    print(f"\n📥 Downloading latest.pt ({size_mb:.2f} MB)...")
    files.download(latest_model)
    print("✓ Download complete!")
else:
    print("\n⚠️  latest.pt not found")

print("\n" + "="*60)

## Step 10: View Results in Google Drive ✅

In [None]:
# Results are already on Google Drive!
import os
import glob

print("="*60)
print("📁 Your Results on Google Drive")
print("="*60)

# SAVE_DIR is already pointing to Drive from Step 5
print(f"\n✓ All results saved to: {SAVE_DIR}")
print("\n📦 Contents:")

# List checkpoints
checkpoint_dir = f'{SAVE_DIR}/checkpoints'
if os.path.exists(checkpoint_dir):
    checkpoints = glob.glob(f"{checkpoint_dir}/*.pt")
    if checkpoints:
        print("\n  Checkpoints:")
        for ckpt in sorted(checkpoints):
            size_mb = os.path.getsize(ckpt) / (1024 * 1024)
            print(f"    • {os.path.basename(ckpt)}: {size_mb:.2f} MB")

# List logs
log_dir = f'{SAVE_DIR}/logs'
if os.path.exists(log_dir):
    print(f"\n  TensorBoard Logs: {log_dir}")

print("\n💡 Access anytime from Google Drive:")
print("   My Drive → YOLO-UDD-Results → run_[timestamp]")
print("\n" + "="*60)

---

## 🎉 Training Complete!

### What You've Done:
- ✅ Trained YOLO-UDD v2.0 on GPU
- ✅ Generated model checkpoints
- ✅ Created TensorBoard logs
- ✅ Downloaded trained models

### Next Steps:
1. **Use the model** for inference on new images
2. **Evaluate performance** on test dataset
3. **Fine-tune** by adjusting hyperparameters
4. **Deploy** for real-world applications

### Repository:
https://github.com/kshitijkhede/YOLO-UDD-v2.0

---

**Need help?** Open an issue on GitHub or check the documentation.
