# üöÄ YOLO-UDD v2.0 Training on Kaggle - Fixed Version

**Last Updated:** November 2, 2025

## üìã Prerequisites
1. Upload **TrashCAN annotations** dataset to Kaggle
2. Upload **TrashCAN images** dataset to Kaggle
3. Enable **GPU** in notebook settings (T4 or P100)
4. Enable **Internet** in notebook settings

---

## üîß Step 1: Setup and Dependencies

In [None]:
%%bash
# Clone repository
if [ ! -d "YOLO-UDD-v2.0" ]; then
    git clone https://github.com/kshitijkhede/YOLO-UDD-v2.0.git
fi
cd YOLO-UDD-v2.0
echo "‚úÖ Repository cloned"

In [None]:
%cd YOLO-UDD-v2.0

In [None]:
# Install dependencies with correct versions
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q opencv-python-headless pillow pycocotools pyyaml tqdm tensorboard
!pip install -q albumentations timm scikit-learn

print("‚úÖ Dependencies installed")

## üìä Step 2: Setup Dataset Paths

In [None]:
import os
import shutil
import json

print("üîç Setting up dataset paths...\n")

# Create directory structure
os.makedirs('data/trashcan/annotations', exist_ok=True)
os.makedirs('data/trashcan/images', exist_ok=True)

# === MODIFY THESE PATHS TO MATCH YOUR KAGGLE DATASETS ===
ANNOTATIONS_PATH = '/kaggle/input/trashcan-annotations-coco-format/annotations'
IMAGES_PATH = '/kaggle/input/trashcan/images'

# Alternative paths (uncomment and modify if needed)
# ANNOTATIONS_PATH = '/kaggle/input/YOUR-ANNOTATIONS-DATASET-NAME/'
# IMAGES_PATH = '/kaggle/input/YOUR-IMAGES-DATASET-NAME/'

print(f"Annotations source: {ANNOTATIONS_PATH}")
print(f"Images source: {IMAGES_PATH}")
print("\n" + "="*70)

In [None]:
# Link annotations
print("üìã Copying annotations...")

train_json = os.path.join(ANNOTATIONS_PATH, 'train.json')
val_json = os.path.join(ANNOTATIONS_PATH, 'val.json')

if os.path.exists(train_json) and os.path.exists(val_json):
    shutil.copy(train_json, 'data/trashcan/annotations/train.json')
    shutil.copy(val_json, 'data/trashcan/annotations/val.json')
    
    # Verify
    with open('data/trashcan/annotations/train.json', 'r') as f:
        train_data = json.load(f)
    with open('data/trashcan/annotations/val.json', 'r') as f:
        val_data = json.load(f)
    
    print(f"‚úÖ Train: {len(train_data['images'])} images, {len(train_data['annotations'])} annotations")
    print(f"‚úÖ Val: {len(val_data['images'])} images, {len(val_data['annotations'])} annotations")
    print(f"‚úÖ Categories: {len(train_data['categories'])}")
else:
    print(f"‚ùå Annotations not found!")
    print(f"   Looking for: {train_json}")
    print(f"   Please update ANNOTATIONS_PATH in the cell above")

In [None]:
# Link images (symbolic links to save space)
print("üñºÔ∏è  Linking images...")

train_imgs_src = os.path.join(IMAGES_PATH, 'train')
val_imgs_src = os.path.join(IMAGES_PATH, 'val')

train_imgs_dst = 'data/trashcan/images/train'
val_imgs_dst = 'data/trashcan/images/val'

# Remove old links
for path in [train_imgs_dst, val_imgs_dst]:
    if os.path.exists(path):
        if os.path.islink(path):
            os.unlink(path)
        else:
            shutil.rmtree(path)

# Create symbolic links
if os.path.exists(train_imgs_src) and os.path.exists(val_imgs_src):
    os.symlink(train_imgs_src, train_imgs_dst)
    os.symlink(val_imgs_src, val_imgs_dst)
    
    train_count = len([f for f in os.listdir(train_imgs_dst) if f.endswith('.jpg')])
    val_count = len([f for f in os.listdir(val_imgs_dst) if f.endswith('.jpg')])
    
    print(f"‚úÖ Train images: {train_count}")
    print(f"‚úÖ Val images: {val_count}")
    
    if train_count > 0 and val_count > 0:
        print("\nüéâ Dataset is ready for training!")
else:
    print(f"‚ùå Images not found!")
    print(f"   Looking for: {train_imgs_src}")
    print(f"   Please update IMAGES_PATH in the cell above")

## üîç Step 3: Verify GPU and PyTorch

In [None]:
import torch

print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"‚úÖ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    torch.cuda.empty_cache()
else:
    print("‚ö†Ô∏è  WARNING: GPU not available!")
    print("   Go to Settings ‚Üí Accelerator ‚Üí Select GPU T4 or P100")

## ‚öôÔ∏è Step 4: Create Optimized Training Config

In [None]:
import yaml

# Create optimized config for Kaggle
config = {
    'model': {
        'name': 'YOLO-UDD-v2.0',
        'num_classes': 22,
        'pretrained_path': None
    },
    'data': {
        'dataset_name': 'TrashCAN-1.0',
        'data_dir': 'data/trashcan',
        'img_size': 640,
        'class_names': [
            "rov", "plant", "animal_fish", "animal_starfish", "animal_shells",
            "animal_crab", "animal_eel", "animal_etc", "trash_clothing", "trash_pipe",
            "trash_bottle", "trash_bag", "trash_snack_wrapper", "trash_can", "trash_cup",
            "trash_container", "trash_unknown_instance", "trash_branch", "trash_wreckage",
            "trash_tarp", "trash_rope", "trash_net"
        ]
    },
    'training': {
        'epochs': 100,
        'batch_size': 8,           # Optimized for T4 GPU
        'num_workers': 2,
        'optimizer': 'AdamW',
        'learning_rate': 0.001,    # Lower initial LR for stability
        'weight_decay': 0.0005,
        'scheduler': 'CosineAnnealing',
        'lr_min': 0.00001,
        'early_stopping_patience': 30,
        'grad_clip_norm': 10.0,
        'use_amp': True            # Mixed precision
    },
    'loss': {
        'lambda_box': 5.0,
        'lambda_obj': 1.0,
        'lambda_cls': 1.0,
        'focal_loss_gamma': 2.0,
        'iou_type': 'CIoU'
    },
    'augmentation': {
        'use_augmentation': True,
        'horizontal_flip_prob': 0.5,
        'color_jitter': True,
        'gaussian_blur': False,     # Disabled to reduce training time
        'underwater_augmentation': True
    },
    'checkpoints': {
        'save_dir': '/kaggle/working/checkpoints',
        'save_interval': 10,
        'save_best_only': False
    },
    'logging': {
        'use_tensorboard': True,
        'log_dir': '/kaggle/working/runs',
        'log_interval': 50
    },
    'eval': {
        'conf_threshold': 0.001,
        'nms_threshold': 0.6,
        'eval_interval': 5
    }
}

# Save config
os.makedirs('configs', exist_ok=True)
with open('configs/kaggle_config.yaml', 'w') as f:
    yaml.dump(config, f, default_flow_style=False)

print("‚úÖ Training config created!")
print("\nKey settings:")
print(f"  - Batch size: {config['training']['batch_size']}")
print(f"  - Epochs: {config['training']['epochs']}")
print(f"  - Learning rate: {config['training']['learning_rate']}")
print(f"  - Image size: {config['data']['img_size']}")
print(f"  - Mixed precision: {config['training']['use_amp']}")

## üöÄ Step 5: Start Training

In [None]:
import glob
import os

# Check for existing checkpoints to resume
checkpoint_dir = '/kaggle/working/checkpoints'
os.makedirs(checkpoint_dir, exist_ok=True)

checkpoints = glob.glob(f'{checkpoint_dir}/*.pth')

if checkpoints:
    latest_ckpt = max(checkpoints, key=os.path.getctime)
    print(f"üîÑ Found checkpoint: {latest_ckpt}")
    print("   Will resume training from this checkpoint\n")
    resume_flag = f"--resume {latest_ckpt}"
else:
    print("üÜï No previous checkpoint found")
    print("   Starting fresh training\n")
    resume_flag = ""

print("="*70)
print("üöÄ Starting YOLO-UDD v2.0 Training")
print("="*70)

In [None]:
# Run training
!python scripts/train.py --config configs/kaggle_config.yaml {resume_flag}

## üíæ Step 6: Save Checkpoints

In [None]:
import shutil
import glob
import os

print("üíæ Saving checkpoints...\n")

# Create checkpoint directory
os.makedirs('/kaggle/working/checkpoints', exist_ok=True)

# Find all checkpoint files
run_checkpoints = glob.glob('runs/*/checkpoints/*.pth')

if run_checkpoints:
    for ckpt in run_checkpoints:
        dest = os.path.join('/kaggle/working/checkpoints', os.path.basename(ckpt))
        shutil.copy(ckpt, dest)
        size = os.path.getsize(dest) / (1024*1024)
        print(f"‚úÖ Saved: {os.path.basename(ckpt)} ({size:.1f} MB)")
    
    print(f"\nüì¶ Total checkpoints: {len(run_checkpoints)}")
    print("‚úÖ Checkpoints saved to /kaggle/working/checkpoints/")
    print("\nüí° These checkpoints will persist between Kaggle sessions!")
else:
    print("‚ö†Ô∏è  No checkpoints found to save")
    print("   Training may not have started or completed any epochs")

## üìä Step 7: View Training Logs (TensorBoard)

In [None]:
# Load TensorBoard
%load_ext tensorboard
%tensorboard --logdir /kaggle/working/runs

## üéØ Step 8: Evaluate Model (Optional)

In [None]:
# Find best checkpoint
import glob

best_ckpt = glob.glob('/kaggle/working/checkpoints/best.pth')

if best_ckpt:
    print(f"üìä Evaluating model: {best_ckpt[0]}\n")
    !python scripts/evaluate.py \
        --checkpoint {best_ckpt[0]} \
        --data-dir data/trashcan \
        --split val
else:
    print("‚ö†Ô∏è  No 'best.pth' checkpoint found")
    print("   Training may still be in progress")

## üñºÔ∏è Step 9: Run Detection on Sample Images (Optional)

In [None]:
# Run detection on validation images
import glob

best_ckpt = glob.glob('/kaggle/working/checkpoints/best.pth')

if best_ckpt:
    print(f"üéØ Running detection with: {best_ckpt[0]}\n")
    !python scripts/detect.py \
        --checkpoint {best_ckpt[0]} \
        --source data/trashcan/images/val/ \
        --output /kaggle/working/results/ \
        --max-images 10
else:
    print("‚ö†Ô∏è  No checkpoint found for detection")

In [None]:
# Display detection results
import matplotlib.pyplot as plt
from PIL import Image
import glob
import os

result_images = glob.glob('/kaggle/working/results/*.jpg')[:6]

if result_images:
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    axes = axes.flatten()
    
    for idx, img_path in enumerate(result_images):
        img = Image.open(img_path)
        axes[idx].imshow(img)
        axes[idx].axis('off')
        axes[idx].set_title(f'Detection {idx+1}')
    
    # Hide empty subplots
    for idx in range(len(result_images), 6):
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è  No detection results found")
    print("   Run the detection cell above first")

## üì• Step 10: Download Checkpoints (Optional)

In [None]:
# List all available checkpoints
import glob
import os

checkpoints = glob.glob('/kaggle/working/checkpoints/*.pth')

if checkpoints:
    print("üì¶ Available checkpoints:\n")
    for ckpt in sorted(checkpoints):
        size = os.path.getsize(ckpt) / (1024*1024)
        print(f"  - {os.path.basename(ckpt)} ({size:.1f} MB)")
    
    print("\nüí° To download, you can:")
    print("  1. Use Kaggle's file browser (right sidebar)")
    print("  2. Navigate to /kaggle/working/checkpoints/")
    print("  3. Right-click on files to download")
else:
    print("‚ö†Ô∏è  No checkpoints found")

---

## üéâ Training Complete!

### Next Steps:
1. **Download checkpoints** from `/kaggle/working/checkpoints/`
2. **View TensorBoard** logs to analyze training
3. **Run evaluation** to see final metrics
4. **Test on new images** using `detect.py`

### Tips for Better Results:
- Train for more epochs (increase `epochs` in config)
- Adjust learning rate if loss plateaus
- Try different batch sizes based on GPU memory
- Enable more augmentations for better generalization

---