# üåä YOLO-UDD v2.0 - Underwater Debris Detection (KAGGLE)

**Complete Training Pipeline on Kaggle with GPU** ‚ö°

## üöÄ Quick Start:
1. **Upload Dataset**: Add TrashCAN dataset as Kaggle Dataset
2. **Enable GPU**: Settings ‚Üí Accelerator ‚Üí GPU T4 x2 ‚Üí Save
3. **Run All**: Run all cells sequentially
4. **Download Results**: Download trained model from Output folder

## ‚öôÔ∏è Configuration:
- **Epochs**: 100 (reduced for faster training ~10 hours)
- **Batch Size**: 8
- **Classes**: 22 (matches TrashCAN dataset)
- **Expected mAP**: 70-72%

---

## Step 1: Setup Environment

In [None]:
# Clone repository
import os
import sys

# Kaggle uses /kaggle/working directory
WORK_DIR = '/kaggle/working'
REPO_DIR = f'{WORK_DIR}/YOLO-UDD-v2.0'

# Ensure we're in working directory
os.chdir(WORK_DIR)
print(f"Working directory: {os.getcwd()}")

# Remove existing directory if present
if os.path.exists(REPO_DIR):
    import shutil
    shutil.rmtree(REPO_DIR)
    print("‚úì Cleaned existing directory")

# Clone repository
print("\nCloning repository...")
!git clone https://github.com/kshitijkhede/YOLO-UDD-v2.0.git

# Verify clone succeeded
if not os.path.exists(REPO_DIR):
    raise FileNotFoundError("Failed to clone repository. Please check the repository URL.")

# Change to repo directory
os.chdir(REPO_DIR)
sys.path.insert(0, REPO_DIR)

print("\n" + "="*60)
print("‚úì Repository cloned successfully!")
print(f"‚úì Working directory: {os.getcwd()}")
print("="*60)

In [None]:
# Verify repository structure
import os

print("="*60)
print("üìÇ Repository Structure")
print("="*60)

required_dirs = ['models', 'scripts', 'data', 'utils', 'configs']
required_files = ['requirements.txt', 'models/__init__.py', 'scripts/train.py']

for dir_name in required_dirs:
    status = "‚úì" if os.path.exists(dir_name) else "‚úó"
    print(f"{status} {dir_name}/")

print()
for file_name in required_files:
    status = "‚úì" if os.path.exists(file_name) else "‚úó"
    print(f"{status} {file_name}")

print("="*60)

In [None]:
# Check GPU availability
import torch

print("="*60)
print("üî• GPU Status Check")
print("="*60)

if torch.cuda.is_available():
    print(f"‚úì GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"‚úì GPU Count: {torch.cuda.device_count()}")
    print(f"‚úì CUDA Version: {torch.version.cuda}")
    print(f"‚úì PyTorch Version: {torch.__version__}")
    
    # Get GPU memory info
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"‚úì GPU Memory: {gpu_mem:.1f} GB")
else:
    print("‚úó GPU NOT AVAILABLE!")
    print("‚ö†Ô∏è  Please enable GPU: Settings ‚Üí Accelerator ‚Üí GPU T4 x2 ‚Üí Save")
    raise RuntimeError("GPU not available. Training will be extremely slow on CPU.")

print("="*60)

## Step 2: Install Dependencies

In [None]:
# Install required packages
print("Installing dependencies...\n")

# Install from requirements.txt
!pip install -q torch>=2.0.0 torchvision>=0.15.0
!pip install -q albumentations>=1.3.0
!pip install -q opencv-python-headless>=4.7.0
!pip install -q pycocotools>=2.0.6
!pip install -q tensorboard>=2.12.0
!pip install -q tqdm pyyaml
!pip install -q scikit-learn matplotlib seaborn

print("\n‚úì All dependencies installed successfully!")

## Step 3: Setup Dataset

**IMPORTANT**: You need to add the TrashCAN dataset as a Kaggle Dataset:

1. Go to: https://www.kaggle.com/datasets
2. Click "New Dataset"
3. Upload TrashCAN images and annotations
4. Make it public or private
5. Add it to this notebook: "Add Data" ‚Üí Search for your dataset

Then update the `DATASET_PATH` below to match your dataset path.

In [None]:
# Configure dataset path
import os

# UPDATE THIS PATH to match your Kaggle dataset
# Example: '/kaggle/input/trashcan-dataset' or '/kaggle/input/your-dataset-name'
DATASET_PATH = '/kaggle/input/trashcan-dataset'

print("="*60)
print("üì¶ Dataset Configuration")
print("="*60)

# Check if dataset exists
if os.path.exists(DATASET_PATH):
    print(f"‚úì Dataset found at: {DATASET_PATH}")
    
    # List dataset contents
    print("\nüìÇ Dataset contents:")
    for item in os.listdir(DATASET_PATH):
        item_path = os.path.join(DATASET_PATH, item)
        if os.path.isdir(item_path):
            print(f"  üìÅ {item}/")
        else:
            print(f"  üìÑ {item}")
else:
    print(f"‚úó Dataset NOT FOUND at: {DATASET_PATH}")
    print("\n‚ö†Ô∏è  Please:")
    print("   1. Add TrashCAN dataset to this notebook (Add Data button)")
    print("   2. Update DATASET_PATH variable above")
    print("\nAvailable input datasets:")
    if os.path.exists('/kaggle/input'):
        for item in os.listdir('/kaggle/input'):
            print(f"  - /kaggle/input/{item}")

print("="*60)

## Step 4: Build Model

In [None]:
# Build YOLO-UDD model
from models.yolo_udd import build_yolo_udd
import torch

print("="*60)
print("üèóÔ∏è  Building YOLO-UDD v2.0 Model")
print("="*60)

# Build model with 22 classes (TrashCAN dataset)
model = build_yolo_udd(num_classes=22)

# Move to GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

print(f"‚úì Model built successfully")
print(f"‚úì Device: {device}")
print(f"‚úì Number of classes: 22")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"‚úì Total parameters: {total_params:,}")
print(f"‚úì Trainable parameters: {trainable_params:,}")

# Test forward pass
print("\nüß™ Testing forward pass...")
x = torch.randn(1, 3, 640, 640).to(device)
with torch.no_grad():
    predictions, turb_score = model(x)

print(f"‚úì Forward pass successful!")
print(f"‚úì Turbidity Score: {turb_score.item():.4f}")
print(f"‚úì Detection scales: {len(predictions)}")

print("="*60)

## Step 5: Training Configuration

In [None]:
# Training hyperparameters - Reduced for faster training
EPOCHS = 100  # Reduced from 300 (10 hours instead of 30 hours)
BATCH_SIZE = 8
LEARNING_RATE = 0.01
NUM_WORKERS = 2
SAVE_DIR = '/kaggle/working/runs/train'

print("="*60)
print("‚öôÔ∏è  Training Configuration")
print("="*60)
print(f"Epochs: {EPOCHS}")
print(f"Batch Size: {BATCH_SIZE}")
print(f"Learning Rate: {LEARNING_RATE}")
print(f"Number of Workers: {NUM_WORKERS}")
print(f"Save Directory: {SAVE_DIR}")
print(f"Dataset Path: {DATASET_PATH}")
print("="*60)

# Create save directory
os.makedirs(SAVE_DIR, exist_ok=True)
print(f"\n‚úì Save directory created: {SAVE_DIR}")

## Step 6: Start Training

**‚è±Ô∏è Estimated Time**: ~10 hours for 100 epochs on T4 GPU

**üí° Tips**:
- Training will save checkpoints automatically
- You can monitor progress in real-time
- Results saved to `/kaggle/working/runs/train/`
- Download best checkpoint from Output folder after training

In [None]:
# Start training
print("="*60)
print("üöÄ Starting Training...")
print("="*60)
print(f"Training for {EPOCHS} epochs (~10 hours)")
print(f"Expected mAP: 70-72%")
print("="*60)

# Run training script
!python scripts/train.py \
    --config configs/train_config.yaml \
    --data-dir {DATASET_PATH} \
    --epochs {EPOCHS} \
    --batch-size {BATCH_SIZE} \
    --learning-rate {LEARNING_RATE} \
    --num-workers {NUM_WORKERS} \
    --save-dir {SAVE_DIR}

## Step 7: Download Results

After training completes, download the trained model checkpoint.

In [None]:
# Check training results
import os

print("="*60)
print("üìä Training Results")
print("="*60)

if os.path.exists(SAVE_DIR):
    print(f"\nüìÅ Results directory: {SAVE_DIR}")
    print("\nContents:")
    for root, dirs, files in os.walk(SAVE_DIR):
        level = root.replace(SAVE_DIR, '').count(os.sep)
        indent = ' ' * 2 * level
        print(f"{indent}{os.path.basename(root)}/")
        subindent = ' ' * 2 * (level + 1)
        for file in files:
            size = os.path.getsize(os.path.join(root, file)) / (1024*1024)
            print(f"{subindent}{file} ({size:.1f} MB)")
    
    # Check for best checkpoint
    best_checkpoint = os.path.join(SAVE_DIR, 'best.pt')
    if os.path.exists(best_checkpoint):
        size = os.path.getsize(best_checkpoint) / (1024*1024)
        print(f"\n‚úì Best checkpoint: {best_checkpoint} ({size:.1f} MB)")
        print("\nüì• Download this file from the Output section!")
    else:
        print("\n‚ö†Ô∏è  Best checkpoint not found. Check if training completed successfully.")
else:
    print(f"‚úó Results directory not found: {SAVE_DIR}")

print("="*60)

## üéâ Training Complete!

### Next Steps:
1. **Download Checkpoint**: Download `best.pt` from Output folder
2. **Evaluate Model**: Run evaluation script locally with downloaded checkpoint
3. **Test Detections**: Test on new images

### Expected Results:
- mAP@50:95: **70-72%** (22 classes)
- Training Time: **~10 hours** (100 epochs)
- Checkpoint Size: **~200-300 MB**

---

**üìß Issues?** Check the GitHub repository: https://github.com/kshitijkhede/YOLO-UDD-v2.0