# 🌊 YOLO-UDD v2.0 Training on Kaggle

**Turbidity-Adaptive Underwater Debris Detection**

This notebook trains YOLO-UDD v2.0 on Kaggle with GPU acceleration.

---

## 📋 Setup Instructions:

1. **Enable GPU**: Settings → Accelerator → GPU P100 or T4
2. **Enable Internet**: Settings → Internet → ON
3. **Run all cells** in order
4. **Training saves checkpoints** automatically - you can resume later!

---

## 1️⃣ Clone Repository & Install Dependencies

In [None]:
# Clone your repository
!git clone https://github.com/kshitijkhede/YOLO-UDD-v2.0.git
%cd YOLO-UDD-v2.0

In [None]:
# Install required packages
!pip install -q -r requirements.txt
!pip install -q tensorboard

## 2️⃣ Upload Dataset Annotations

**Option A: Upload from your computer (Recommended for first time)**

In [None]:
# Upload your annotation files
# You need to upload:
# - train.json (22 MB)
# - val.json (5.6 MB)

from google.colab import files
import os

# Create annotations directory
os.makedirs('data/trashcan/annotations', exist_ok=True)

print("📤 Please upload train.json...")
uploaded = files.upload()
for filename in uploaded.keys():
    !mv {filename} data/trashcan/annotations/train.json
    
print("📤 Please upload val.json...")
uploaded = files.upload()
for filename in uploaded.keys():
    !mv {filename} data/trashcan/annotations/val.json

print("✅ Annotations uploaded!")

**Option B: Load from Kaggle Dataset (After first upload)**

After training once, save your dataset as a Kaggle Dataset, then use it in future runs.

## 3️⃣ Download TrashCAN Images

Download the TrashCAN dataset images from the external source.

In [None]:
# Option A: If you have dataset on Google Drive
# Uncomment and use this:

# from google.colab import drive
# drive.mount('/content/drive')
# !cp -r /content/drive/MyDrive/trashcan_images/* data/trashcan/images/

# Option B: Download from Kaggle dataset (recommended)
# First, upload your images as a Kaggle dataset, then:

!kaggle datasets download -d YOUR_USERNAME/trashcan-images
!unzip -q trashcan-images.zip -d data/trashcan/

print("✅ Images downloaded!")

## 4️⃣ Verify Dataset

In [None]:
# Verify dataset structure
!python scripts/verify_dataset.py --dataset-dir data/trashcan

## 5️⃣ Configure Training

Optimized for Kaggle GPU (30 hours/week limit)

In [None]:
import yaml

# Create optimized config for Kaggle
config = {
    'data_dir': './data/trashcan',
    'num_classes': 3,
    'img_size': 640,
    'batch_size': 16,  # Adjust based on GPU memory
    'epochs': 100,  # Start with 100, resume later for more
    'num_workers': 2,
    'lr': 0.001,
    'weight_decay': 0.0001,
    'lr_scheduler': 'cosine',
    'min_lr': 1e-6,
    'device': 'cuda',
    'save_interval': 10,  # Save checkpoint every 10 epochs
    'early_stopping_patience': 30,
}

# Save config
with open('configs/kaggle_config.yaml', 'w') as f:
    yaml.dump(config, f)

print("✅ Config created!")
print(yaml.dump(config, default_flow_style=False))

## 6️⃣ Start Training 🚀

**Important:** Training will save checkpoints every 10 epochs. You can stop and resume anytime!

In [None]:
# Start training from scratch
!python scripts/train.py --config configs/kaggle_config.yaml

### 📊 Resume Training (if interrupted)

If your session times out, run this to resume:

In [None]:
# Find the latest checkpoint
import os
import glob

checkpoints = glob.glob('runs/*/checkpoints/last.pth')
if checkpoints:
    latest = max(checkpoints, key=os.path.getctime)
    print(f"📂 Resuming from: {latest}")
    !python scripts/train.py --config configs/kaggle_config.yaml --resume {latest}
else:
    print("❌ No checkpoint found. Starting fresh training...")
    !python scripts/train.py --config configs/kaggle_config.yaml

## 7️⃣ Monitor Training with TensorBoard

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs

## 8️⃣ Evaluate Model

In [None]:
# Evaluate the best model
!python scripts/evaluate.py \
    --checkpoint runs/*/checkpoints/best.pth \
    --data-dir data/trashcan \
    --split val

## 9️⃣ Test Detection on Sample Images

In [None]:
# Run detection on validation images
!python scripts/detect.py \
    --checkpoint runs/*/checkpoints/best.pth \
    --source data/trashcan/images/val/ \
    --output results/ \
    --max-images 10

In [None]:
# Display results
import matplotlib.pyplot as plt
from PIL import Image
import glob

result_images = glob.glob('results/*.jpg')[:6]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, img_path in enumerate(result_images):
    img = Image.open(img_path)
    axes[idx].imshow(img)
    axes[idx].axis('off')
    axes[idx].set_title(f'Detection {idx+1}')

plt.tight_layout()
plt.show()

## 🔟 Download Trained Model

In [None]:
# Download the best model to your computer
from google.colab import files
import glob

best_model = glob.glob('runs/*/checkpoints/best.pth')[0]
print(f"📥 Downloading: {best_model}")
files.download(best_model)

## 💾 Save Checkpoint to Kaggle Output

This saves your checkpoint so you can resume in the next session.

In [None]:
# Copy checkpoints to Kaggle output (persists between sessions)
!mkdir -p /kaggle/working/checkpoints
!cp -r runs/*/checkpoints/* /kaggle/working/checkpoints/
print("✅ Checkpoints saved to Kaggle output!")

---

## 📊 Training Summary

After training, you'll see:
- ✅ Training/Validation loss curves
- ✅ mAP (mean Average Precision) metrics
- ✅ Sample detection results
- ✅ Saved model checkpoint

**Expected Results (after 100 epochs):**
- mAP@50: 50-60%
- mAP@50:95: 30-35%

**For better results:**
- Resume training for 200-300 total epochs
- Expected final mAP@50: 70-75%

---

## 🔄 Multi-Session Training Strategy

Since Kaggle gives 30 hours/week:

1. **Session 1** (6 hours): Train 0-100 epochs → Save checkpoint
2. **Session 2** (6 hours): Resume 100-200 epochs → Save checkpoint  
3. **Session 3** (6 hours): Resume 200-300 epochs → Final model!

Each session saves progress automatically!

---