# 🌊 YOLO-UDD v2.0 - Kaggle Training

**Simple 6-Step Training - No Crashes, No Loops!** ⚡

## 📋 Before You Start:
1. **Enable GPU**: Settings → Accelerator → **GPU T4 x2** → Save
2. **Dataset**: Google Drive link already configured (automatic download)
3. **Run**: Execute cells 1-6 in order OR click "Run All"

## ⏱️ Training Info:
- **Time**: ~10 hours (100 epochs)
- **Expected mAP**: 70-72%
- **No restarts needed!** ✅

---

## Cell 1: Environment Setup

In [None]:
# Complete environment setup
import os
import sys

print("="*70)
print("🔧 CELL 1: Environment Setup")
print("="*70)

# Check and fix NumPy version FIRST (before any other imports)
print("\n[Step 1/3] Checking NumPy version...")
try:
    import numpy as np
    numpy_ver = np.__version__
    
    if numpy_ver.startswith('2.'):
        print(f"  ⚠️  NumPy {numpy_ver} detected (will cause TensorFlow crashes)")
        print("  🔧 Installing NumPy 1.26.4...")
        
        # Use pip directly with quiet mode
        !pip uninstall -y numpy > /dev/null 2>&1
        !pip install -q numpy==1.26.4
        
        print("  ✅ NumPy 1.26.4 installed")
        print("  ℹ️  If you see import errors later, just re-run this cell")
    else:
        print(f"  ✅ NumPy {numpy_ver} OK")
except Exception as e:
    print(f"  ⚠️  NumPy check issue: {e}")
    print("  📦 Installing NumPy 1.26.4...")
    !pip install -q numpy==1.26.4
    print("  ✅ Installed")

# Setup directories
print("\n[Step 2/3] Setting up directories...")
WORK_DIR = '/kaggle/working'
REPO_DIR = f'{WORK_DIR}/YOLO-UDD-v2.0'

os.chdir(WORK_DIR)
print(f"  ✅ Working directory: {WORK_DIR}")

# Clone repository
print("\n[Step 3/3] Cloning repository...")
if os.path.exists(REPO_DIR):
    import shutil
    shutil.rmtree(REPO_DIR)

!git clone -q https://github.com/kshitijkhede/YOLO-UDD-v2.0.git

if os.path.exists(REPO_DIR):
    os.chdir(REPO_DIR)
    if REPO_DIR not in sys.path:
        sys.path.insert(0, REPO_DIR)
    print(f"  ✅ Repository ready: {REPO_DIR}")
else:
    raise Exception("Clone failed!")

print("\n" + "="*70)
print("✅ Cell 1 Complete - Environment Ready!")
print("="*70)

## Cell 2: Verify & Install Dependencies

In [None]:
# Verify setup and install dependencies
import os
import torch

print("="*70)
print("📦 CELL 2: Verification & Dependencies")
print("="*70)

# Verify repository structure
print("\n[Step 1/3] Verifying repository...")
required = ['models/', 'scripts/', 'utils/', 'configs/', 'scripts/train.py']
all_ok = True
for item in required:
    if os.path.exists(item):
        print(f"  ✅ {item}")
    else:
        print(f"  ❌ {item} MISSING")
        all_ok = False

if not all_ok:
    raise Exception("Repository incomplete! Re-run Cell 1")

# Check GPU
print("\n[Step 2/3] Checking GPU...")
if torch.cuda.is_available():
    print(f"  ✅ GPU: {torch.cuda.get_device_name(0)}")
    print(f"  ✅ Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("  ❌ NO GPU! Enable: Settings → GPU T4 x2")
    raise RuntimeError("GPU required!")

# Install dependencies
print("\n[Step 3/3] Installing dependencies (this takes ~2 min)...")
!pip install -q torch>=2.0.0 torchvision>=0.15.0 albumentations>=1.3.0 \
    opencv-python-headless>=4.7.0 pycocotools>=2.0.6 tensorboard>=2.12.0 \
    tqdm pyyaml scikit-learn matplotlib seaborn

print("  ✅ Dependencies installed")

print("\n" + "="*70)
print("✅ Cell 2 Complete - System Ready!")
print("="*70)

## Cell 3: Setup Dataset

**Dataset will download automatically from Google Drive (~170 MB, takes 2-3 min)**

Alternative: Upload your own dataset to Kaggle and set `USE_KAGGLE_DATASET = True`

In [None]:
# Dataset setup
import os

print("="*70)
print("📦 CELL 3: Dataset Setup")
print("="*70)

# ============================================
# CONFIGURATION
# ============================================
USE_KAGGLE_DATASET = False  # Set True if you added dataset to Kaggle
KAGGLE_DATASET_PATH = '/kaggle/input/trashcan-dataset'

USE_GDRIVE = True  # ✅ Default: Download from Google Drive
GDRIVE_FILE_ID = '17oRYriPgBnW9zowwmhImxdUpmHwOjgIp'  # Your uploaded dataset
# ============================================

DATASET_PATH = None

if USE_KAGGLE_DATASET:
    print("\n📂 Using Kaggle Dataset...")
    if os.path.exists(KAGGLE_DATASET_PATH):
        if os.path.isfile(KAGGLE_DATASET_PATH):
            print("  📦 Extracting...")
            !unzip -o -q {KAGGLE_DATASET_PATH} -d /kaggle/working/
            DATASET_PATH = '/kaggle/working/trashcan'
        else:
            trashcan = os.path.join(KAGGLE_DATASET_PATH, 'trashcan')
            DATASET_PATH = trashcan if os.path.exists(trashcan) else KAGGLE_DATASET_PATH
        print(f"  ✅ Dataset: {DATASET_PATH}")
    else:
        print(f"  ❌ NOT FOUND: {KAGGLE_DATASET_PATH}")

elif USE_GDRIVE:
    print("\n☁️  Downloading from Google Drive...")
    print("  📦 Installing gdown...")
    !pip install -q gdown
    
    print("  ⬇️  Downloading dataset (~170 MB, 2-3 min)...")
    !gdown --id {GDRIVE_FILE_ID} -O /kaggle/working/trashcan.zip --quiet
    
    if os.path.exists('/kaggle/working/trashcan.zip'):
        size = os.path.getsize('/kaggle/working/trashcan.zip') / 1024 / 1024
        print(f"  ✅ Downloaded: {size:.1f} MB")
        
        print("  📦 Extracting (auto-overwrite enabled)...")
        !unzip -o -q /kaggle/working/trashcan.zip -d /kaggle/working/
        
        if os.path.exists('/kaggle/working/trashcan'):
            DATASET_PATH = '/kaggle/working/trashcan'
            print(f"  ✅ Dataset: {DATASET_PATH}")
        else:
            print("  ❌ Extraction failed")
    else:
        print("  ❌ Download failed")
else:
    print("\n❌ No method selected! Set USE_KAGGLE_DATASET or USE_GDRIVE = True")

# Verify dataset
print("\n" + "="*70)
if DATASET_PATH and os.path.exists(DATASET_PATH):
    print(f"✅ DATASET READY: {DATASET_PATH}")
    
    # Count images
    train_imgs = len([f for f in os.listdir(f"{DATASET_PATH}/images/train") if f.endswith(('.jpg', '.png'))])
    val_imgs = len([f for f in os.listdir(f"{DATASET_PATH}/images/val") if f.endswith(('.jpg', '.png'))])
    
    print(f"  📊 Training images: {train_imgs:,}")
    print(f"  📊 Validation images: {val_imgs:,}")
    
    # Check annotations
    train_ann = f"{DATASET_PATH}/instances_train_trashcan.json"
    val_ann = f"{DATASET_PATH}/instances_val_trashcan.json"
    
    if os.path.exists(train_ann) and os.path.exists(val_ann):
        print(f"  ✅ Annotations found")
    else:
        print(f"  ❌ Annotations missing!")
    
    print("\n" + "="*70)
    print("✅ Cell 3 Complete - Dataset Ready!")
    print("="*70)
else:
    print("❌ DATASET NOT READY!")
    raise Exception("Dataset setup failed!")


## Cell 4: Build & Test Model

In [None]:
# Build and test YOLO-UDD model
import os
import sys
import torch

print("="*70)
print("🏗️  CELL 4: Build Model")
print("="*70)

# Ensure correct paths
REPO_DIR = '/kaggle/working/YOLO-UDD-v2.0'
os.chdir(REPO_DIR)
if REPO_DIR not in sys.path:
    sys.path.insert(0, REPO_DIR)

print("\n[Step 1/2] Building model...")
from models.yolo_udd import build_yolo_udd

model = build_yolo_udd(num_classes=22)  # TrashCAN has 22 classes
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

total_params = sum(p.numel() for p in model.parameters())
print(f"  ✅ Model: YOLO-UDD v2.0")
print(f"  ✅ Classes: 22")
print(f"  ✅ Device: {device}")
print(f"  ✅ Parameters: {total_params:,}")

# Test forward pass
print("\n[Step 2/2] Testing model...")
x = torch.randn(1, 3, 640, 640).to(device)
with torch.no_grad():
    predictions, turb_score = model(x)

print(f"  ✅ Forward pass successful")
print(f"  ✅ Turbidity score: {turb_score.item():.4f}")

print("\n" + "="*70)
print("✅ Cell 4 Complete - Model Ready!")
print("="*70)

In [None]:
# Start training
import subprocess
import sys
import os

print("="*70)
print("CELL 6: Starting Training")
print("="*70)

# Training parameters
EPOCHS = 100
BATCH_SIZE = 8
LEARNING_RATE = 0.01
SAVE_DIR = '/kaggle/working/runs/train'
DATASET_PATH = '/kaggle/working/trashcan'  # Restructured dataset

# Verify dataset exists
if not os.path.exists(DATASET_PATH):
    print(f"ERROR: Dataset not found at {DATASET_PATH}")
    print("Please run Cell 3 to restructure the dataset!")
    raise FileNotFoundError(f"Dataset not found: {DATASET_PATH}")

print(f"\nTraining Configuration:")
print(f"  Epochs:       {EPOCHS}")
print(f"  Batch Size:   {BATCH_SIZE}")
print(f"  Learning Rate: {LEARNING_RATE}")
print(f"  Dataset:      {DATASET_PATH}")
print(f"  Save Dir:     {SAVE_DIR}")

# Verify annotations exist
train_ann = os.path.join(DATASET_PATH, 'annotations', 'train.json')
val_ann = os.path.join(DATASET_PATH, 'annotations', 'val.json')
print(f"\nChecking annotations:")
print(f"  train.json: {'Found' if os.path.exists(train_ann) else 'NOT FOUND'}")
print(f"  val.json:   {'Found' if os.path.exists(val_ann) else 'NOT FOUND'}")

if not os.path.exists(train_ann) or not os.path.exists(val_ann):
    print("\nERROR: Annotation files missing!")
    print("Please run Cell 3 to restructure the dataset!")
    raise FileNotFoundError("Annotations not found")

# Build training command with absolute path
abs_dataset_path = os.path.abspath(DATASET_PATH)
cmd = [
    sys.executable, 'scripts/train.py',
    '--config', 'configs/train_config.yaml',
    '--data-dir', abs_dataset_path,
    '--batch-size', str(BATCH_SIZE),
    '--epochs', str(EPOCHS),
    '--lr', str(LEARNING_RATE),
    '--save-dir', SAVE_DIR
]

print("\nStarting training...")
print(f"Using absolute dataset path: {abs_dataset_path}")
print(f"Command: {.join(cmd)}")
print("This will take ~10 hours for 100 epochs")
print("="*70 + "\n")

# Run training
result = subprocess.run(cmd)

if result.returncode == 0:
    print("\n" + "="*70)
    print("Training completed successfully!")
    print(f"Results saved to: {SAVE_DIR}")
    print("="*70)
else:
    print("\n" + "="*70)
    print("Training failed - see error above")
    print("="*70)


In [None]:
# Check training results
import os

print("="*70)
print("📊 CELL 6: Results")
print("="*70)

if os.path.exists(SAVE_DIR):
    print(f"\n📁 Results: {SAVE_DIR}\n")
    
    # List files
    for root, dirs, files in os.walk(SAVE_DIR):
        level = root.replace(SAVE_DIR, '').count(os.sep)
        indent = '  ' * level
        print(f"{indent}{os.path.basename(root)}/")
        sub_indent = '  ' * (level + 1)
        for file in files:
            size = os.path.getsize(os.path.join(root, file)) / (1024*1024)
            print(f"{sub_indent}{file} ({size:.1f} MB)")
    
    # Check for best checkpoint
    best_pt = os.path.join(SAVE_DIR, 'best.pt')
    if os.path.exists(best_pt):
        size = os.path.getsize(best_pt) / (1024*1024)
        print("\n" + "="*70)
        print("✅ TRAINING COMPLETE!")
        print("="*70)
        print(f"\n🏆 Best Model: {best_pt}")
        print(f"📦 Size: {size:.1f} MB")
        print(f"\n📥 DOWNLOAD: Check 'Output' section in right sidebar →")
        print(f"🎯 Expected Performance: 70-72% mAP@50:95")
        print(f"\n🎉 Success! Model ready for deployment!")
        print("="*70)
    else:
        print("\n⚠️  best.pt not found - check if training completed")
else:
    print(f"\n❌ Results not found: {SAVE_DIR}")
    print("Training may have failed or not started.")

In [None]:
# Verify repository structure
import os

print("="*60)
print("📂 Repository Structure")
print("="*60)

required_dirs = ['models', 'scripts', 'data', 'utils', 'configs']
required_files = ['requirements.txt', 'models/__init__.py', 'scripts/train.py']

for dir_name in required_dirs:
    status = "✓" if os.path.exists(dir_name) else "✗"
    print(f"{status} {dir_name}/")

print()
for file_name in required_files:
    status = "✓" if os.path.exists(file_name) else "✗"
    print(f"{status} {file_name}")

print("="*60)

In [None]:
# FIX: Force add models to Python path
import os
import sys

print("="*60)
print("🔧 Fixing Module Import Path")
print("="*60)

# Get current directory
current_dir = os.getcwd()
print(f"\nCurrent directory: {current_dir}")

# Check if we're in the repo directory
if 'YOLO-UDD-v2.0' not in current_dir:
    print("\n⚠️  Not in YOLO-UDD-v2.0 directory!")
    
    # Try to find and change to it
    possible_paths = [
        '/kaggle/working/YOLO-UDD-v2.0',
        '/kaggle/YOLO-UDD-v2.0',
        os.path.join(os.getcwd(), 'YOLO-UDD-v2.0')
    ]
    
    for path in possible_paths:
        if os.path.exists(path):
            os.chdir(path)
            current_dir = os.getcwd()
            print(f"✓ Changed to: {current_dir}")
            break
    else:
        print("✗ Could not find YOLO-UDD-v2.0 directory!")
        print("  Please re-run the clone cell (Cell 3)")

# Ensure repo is in Python path
if current_dir not in sys.path:
    sys.path.insert(0, current_dir)
    print(f"✓ Added to sys.path: {current_dir}")

# Verify models can be imported
print("\n🔍 Verifying module availability...")
models_path = os.path.join(current_dir, 'models')
if os.path.exists(models_path):
    print(f"  ✓ models/ exists at: {models_path}")
    
    # Check for required files
    required_files = ['__init__.py', 'yolo_udd.py', 'psem.py', 'sdwh.py', 'tafm.py']
    all_present = True
    for file in required_files:
        file_path = os.path.join(models_path, file)
        if os.path.exists(file_path):
            print(f"  ✓ {file}")
        else:
            print(f"  ✗ {file} MISSING!")
            all_present = False
    
    if all_present:
        print("\n✅ All model files present - import should work!")
    else:
        print("\n❌ Some files missing - clone may be incomplete")
        print("   → Re-run Cell 3 (Clone Repository)")
else:
    print(f"  ✗ models/ NOT FOUND at: {models_path}")
    print("\n  Available directories:")
    for item in os.listdir(current_dir):
        if os.path.isdir(os.path.join(current_dir, item)):
            print(f"    📁 {item}/")
    print("\n❌ Repository clone failed!")
    print("   → Re-run Cell 3 (Clone Repository)")

print("="*60)

In [None]:
# ============================================================
# CRITICAL FIX: Force NumPy 1.x Installation
# ============================================================

print("="*60)
print("🔧 FIXING NumPy Compatibility Issue")
print("="*60)

# Check current NumPy version
import numpy as np
current_version = np.__version__
print(f"\n📌 Current NumPy version: {current_version}")

if current_version.startswith('2.'):
    print("\n⚠️  NumPy 2.x detected - this WILL crash TensorFlow/scikit-learn!")
    print("Forcing downgrade to NumPy 1.x...\n")
    
    # Force uninstall NumPy 2.x
    import sys
    !{sys.executable} -m pip uninstall -y numpy
    
    # Install NumPy 1.x with force reinstall
    !{sys.executable} -m pip install 'numpy==1.26.4' --force-reinstall --no-cache-dir
    
    # Verify the fix worked
    print("\n" + "="*60)
    print("✅ NumPy 1.26.4 Installation Complete!")
    print("="*60)
    print("\n🔴 CRITICAL: YOU MUST RESTART THE KERNEL NOW! 🔴")
    print("\n   Steps:")
    print("   1. Click: Session → Restart Session (or Ctrl+O O)")
    print("   2. Run ALL cells again from Cell 1")
    print("   3. Training will work without crashes!")
    print("\n💡 Why? NumPy is already loaded in memory.")
    print("   Restarting clears memory and loads NumPy 1.26.4")
    print("="*60)
    
    # Stop execution here - user must restart
    raise SystemExit("\n⛔ STOP: Restart kernel now before continuing!")
else:
    print(f"\n✓ NumPy 1.x already installed ({current_version})")
    print("✓ Training should work correctly!")
    print("="*60)

## Step 2: Install Dependencies

## Step 3: Setup Dataset

**Choose ONE of the following methods:**

### **METHOD 1: Kaggle Dataset (Recommended)**
1. Go to: https://www.kaggle.com/datasets
2. Click "New Dataset"
3. Upload TrashCAN dataset ZIP
4. Add to notebook: "Add Data" → Search for your dataset
5. Update `DATASET_PATH` in the cell below

### **METHOD 2: Google Drive (Alternative - Easiest)**
1. Upload your TrashCAN dataset folder to Google Drive
2. Share the folder/file publicly
3. Get the file ID from the share link
4. Update `GDRIVE_FILE_ID` in the cell below
5. Set `USE_GDRIVE = True`

### **METHOD 3: Direct Upload in Notebook**
1. ZIP your dataset locally
2. Upload ZIP to Kaggle notebook directly (< 500MB recommended)
3. Set `USE_LOCAL_UPLOAD = True`

## Step 4: Build Model

## Step 5: Training Configuration

## Step 6: Start Training

**⏱️ Estimated Time**: ~10 hours for 100 epochs on T4 GPU

**💡 Tips**:
- Training will save checkpoints automatically
- You can monitor progress in real-time
- Results saved to `/kaggle/working/runs/train/`
- Download best checkpoint from Output folder after training

## Step 7: Download Results

After training completes, download the trained model checkpoint.

## 🎉 Training Complete!

### Next Steps:
1. **Download Checkpoint**: Download `best.pt` from Output folder
2. **Evaluate Model**: Run evaluation script locally with downloaded checkpoint
3. **Test Detections**: Test on new images

### Expected Results:
- mAP@50:95: **70-72%** (22 classes)
- Training Time: **~10 hours** (100 epochs)
- Checkpoint Size: **~200-300 MB**

---

**📧 Issues?** Check the GitHub repository: https://github.com/kshitijkhede/YOLO-UDD-v2.0