# YOLO11-x Training on VinDr-SpineXR Dataset
## Kaggle GPU Optimized (Tesla P100/T4 16GB)

**Model**: YOLO11-x (65M parameters)

**Expected Performance**: 35-39% mAP@0.5 (vs 32-36% with YOLO11-l)

**Training Time**: ~10-12 hours on Kaggle GPU
---

## Step 1: Setup Environment

In [1]:
# Check GPU availability
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("WARNING: No GPU detected! Training will be extremely slow.")

PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4
GPU Memory: 14.7 GB


In [2]:
# Install/Upgrade Ultralytics (YOLO11)
!pip install -U ultralytics
!pip install -U opencv-python-headless

# Verify installation
from ultralytics import YOLO
print("‚úì Ultralytics installed successfully!")

Collecting ultralytics
  Downloading ultralytics-8.4.7-py3-none-any.whl.metadata (38 kB)
Collecting ultralytics-thop>=2.0.18 (from ultralytics)
  Downloading ultralytics_thop-2.0.18-py3-none-any.whl.metadata (14 kB)
Downloading ultralytics-8.4.7-py3-none-any.whl (1.2 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.2/1.2 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading ultralytics_thop-2.0.18-py3-none-any.whl (28 kB)
Installing collected packages: ultralytics-thop, ultralytics
Successfully installed ultralytics-8.4.7 ultralytics-thop-2.0.18
Collecting opencv-python-headless
  Downloading opencv_python_headless-4.13.0.90-cp37-abi3-manylinux_2_28_x86_64.whl.metadata (19 kB)
Downloading opencv_python_headless-4.13.0.90-cp37-abi3-manylinux_2_28_x86_64.whl (62.5 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

## Step 2: Dataset Preparation

In [3]:
import os
import json
import shutil
from pathlib import Path

# Kaggle dataset paths
KAGGLE_INPUT = '/kaggle/input/complete-vindr-spinexr/vindr-spinexr-a-large-annotated-medical-image-dataset'

# Check if dataset exists
if os.path.exists(KAGGLE_INPUT):
    print("‚úì Dataset found!")
    print("\nDataset structure:")
    for root, dirs, files in os.walk(KAGGLE_INPUT):
        level = root.replace(KAGGLE_INPUT, '').count(os.sep)
        indent = ' ' * 2 * level
        print(f'{indent}{os.path.basename(root)}/')
        subindent = ' ' * 2 * (level + 1)
        for file in files[:3]:  # Show first 3 files only
            print(f'{subindent}{file}')
        if len(files) > 3:
            print(f'{subindent}... and {len(files)-3} more files')
        if level > 2:  # Limit depth
            break
else:
    print("‚ùå Dataset not found!")
    print("Please add the dataset: https://www.kaggle.com/datasets/prosenjitmondol/complete-vindr-spinexr")

‚úì Dataset found!

Dataset structure:
vindr-spinexr-a-large-annotated-medical-image-dataset/
  vindr-spinexr-a-large-annotated-medical-image-dataset/
    SHA256SUMS.txt
    train_meta.csv
    test_meta.csv
    ... and 2 more files
    train_png/
      603991895639eb7b33bf9475b9b7d719.png
      7f9faef52144cb53e35c0b60d6156d05.png
      ed2eb4eee981d1814ab6daff44773ad9.png
      ... and 8159 more files
    train_images/
      9e20054e418c7497cc590c345557f497.dicom
      0ad3d2bf7e1b745151dc4b0f9ddbf96a.dicom
      bd317694f1b3775fe67f0c641a8a1ba5.dicom
      ... and 8386 more files
    test_png/
      f330e28213fe142f0cf8a158fe171282.png
      c4ed963e96cfd445e7eb120066f63a16.png
      bb5c63def3833b9a2c716fbdb083b7a6.png
      ... and 2074 more files
    annotations/
      train.csv
      test.csv
    test_images/
      e879dd6f41cb5909448dc52968d8b690.dicom
      97001c74dbddbb2d279a18752337814b.dicom
      6df4ca68b4e0a57f5b9f43a9cb4e2136.dicom
      ... and 2074 more files


In [4]:
# Create YOLO format dataset structure
WORK_DIR = '/kaggle/working'
DATASET_DIR = f'{WORK_DIR}/vindr_yolo'

# Create directories
os.makedirs(f'{DATASET_DIR}/images/train', exist_ok=True)
os.makedirs(f'{DATASET_DIR}/images/val', exist_ok=True)
os.makedirs(f'{DATASET_DIR}/labels/train', exist_ok=True)
os.makedirs(f'{DATASET_DIR}/labels/val', exist_ok=True)

print("‚úì Directory structure created")

‚úì Directory structure created


## Step 3: Convert COCO to YOLO Format

In [5]:
def coco_to_yolo(coco_json_path, images_dir, output_labels_dir, output_images_dir):
    """
    Convert COCO format annotations to YOLO format
    """
    print(f"Converting {coco_json_path}...")
    
    with open(coco_json_path, 'r') as f:
        coco_data = json.load(f)
    
    # Build image info dict
    images_info = {img['id']: img for img in coco_data['images']}
    
    # Build category mapping (COCO uses 1-indexed, YOLO uses 0-indexed)
    category_mapping = {cat['id']: idx for idx, cat in enumerate(coco_data['categories'])}
    
    # Group annotations by image_id
    annotations_by_image = {}
    for ann in coco_data['annotations']:
        img_id = ann['image_id']
        if img_id not in annotations_by_image:
            annotations_by_image[img_id] = []
        annotations_by_image[img_id].append(ann)
    
    converted_count = 0
    skipped_count = 0
    
    for img_id, img_info in images_info.items():
        file_name = img_info['file_name']
        img_width = img_info['width']
        img_height = img_info['height']
        
        # Check for PNG version (train_png/test_png)
        img_name_base = os.path.splitext(file_name)[0]
        src_image_path = os.path.join(images_dir, f'{img_name_base}.png')
        
        # Fallback to original if PNG doesn't exist
        if not os.path.exists(src_image_path):
            src_image_path = os.path.join(images_dir.replace('_png', '_images'), file_name)
        
        if not os.path.exists(src_image_path):
            skipped_count += 1
            continue
        
        # Copy image
        dst_image_path = os.path.join(output_images_dir, f'{img_name_base}.png')
        if not os.path.exists(dst_image_path):
            shutil.copy2(src_image_path, dst_image_path)
        
        # Convert annotations to YOLO format
        yolo_annotations = []
        if img_id in annotations_by_image:
            for ann in annotations_by_image[img_id]:
                category_id = category_mapping[ann['category_id']]
                bbox = ann['bbox']  # [x, y, width, height] in COCO
                
                # Convert to YOLO format: [class, x_center, y_center, width, height] (normalized)
                x_center = (bbox[0] + bbox[2] / 2) / img_width
                y_center = (bbox[1] + bbox[3] / 2) / img_height
                width = bbox[2] / img_width
                height = bbox[3] / img_height
                
                yolo_annotations.append(f"{category_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}")
        
        # Write YOLO label file
        label_path = os.path.join(output_labels_dir, f'{img_name_base}.txt')
        with open(label_path, 'w') as f:
            f.write('\n'.join(yolo_annotations))
        
        converted_count += 1
        if converted_count % 1000 == 0:
            print(f"  Converted {converted_count} images...")
    
    print(f"‚úì Converted {converted_count} images")
    if skipped_count > 0:
        print(f"‚ö† Skipped {skipped_count} images (not found)")
    
    return converted_count

In [6]:
# Convert training data
# Based on actual Kaggle dataset structure from screenshot
import glob

# Correct paths from your Kaggle dataset structure
train_json = '/kaggle/input/complete-vindr-spinexr/coco format/train_coco.json'
train_images = '/kaggle/input/complete-vindr-spinexr/vindr-spinexr-a-large-annotated-medical-image-dataset/vindr-spinexr-a-large-annotated-medical-image-dataset/train_png'

print("Checking training dataset paths...")
print(f"JSON: {train_json}")
print(f"Images: {train_images}")

# Verify paths exist
if not os.path.exists(train_json):
    print(f"‚ùå ERROR: JSON not found at {train_json}")
    train_count = 0
elif not os.path.exists(train_images):
    print(f"‚ùå ERROR: Images folder not found at {train_images}")
    train_count = 0
else:
    # Count images
    sample_files = glob.glob(os.path.join(train_images, '*.png'))
    print(f"‚úì Found {len(sample_files)} PNG files\n")
    
    # Convert
    train_count = coco_to_yolo(
        train_json,
        train_images,
        f'{DATASET_DIR}/labels/train',
        f'{DATASET_DIR}/images/train'
    )
    print(f"\n‚úì Total training images converted: {train_count}")

Checking training dataset paths...
JSON: /kaggle/input/complete-vindr-spinexr/coco format/train_coco.json
Images: /kaggle/input/complete-vindr-spinexr/vindr-spinexr-a-large-annotated-medical-image-dataset/vindr-spinexr-a-large-annotated-medical-image-dataset/train_png
‚úì Found 8162 PNG files

Converting /kaggle/input/complete-vindr-spinexr/coco format/train_coco.json...
  Converted 1000 images...
  Converted 2000 images...
  Converted 3000 images...
  Converted 4000 images...
  Converted 5000 images...
  Converted 6000 images...
  Converted 7000 images...
  Converted 8000 images...
‚úì Converted 8162 images
‚ö† Skipped 227 images (not found)

‚úì Total training images converted: 8162


In [7]:
# Convert validation/test data
# Based on actual Kaggle dataset structure
val_json = '/kaggle/input/complete-vindr-spinexr/coco format/test_coco.json'
val_images = '/kaggle/input/complete-vindr-spinexr/vindr-spinexr-a-large-annotated-medical-image-dataset/vindr-spinexr-a-large-annotated-medical-image-dataset/test_png'

print("Checking validation dataset paths...")
print(f"JSON: {val_json}")
print(f"Images: {val_images}")

# Verify paths exist
if not os.path.exists(val_json):
    print(f"‚ùå ERROR: JSON not found at {val_json}")
    val_count = 0
elif not os.path.exists(val_images):
    print(f"‚ùå ERROR: Images folder not found at {val_images}")
    val_count = 0
else:
    # Count images
    sample_files = glob.glob(os.path.join(val_images, '*.png'))
    print(f"‚úì Found {len(sample_files)} PNG files\n")
    
    # Convert
    val_count = coco_to_yolo(
        val_json,
        val_images,
        f'{DATASET_DIR}/labels/val',
        f'{DATASET_DIR}/images/val'
    )
    print(f"\n‚úì Total validation images converted: {val_count}")

Checking validation dataset paths...
JSON: /kaggle/input/complete-vindr-spinexr/coco format/test_coco.json
Images: /kaggle/input/complete-vindr-spinexr/vindr-spinexr-a-large-annotated-medical-image-dataset/vindr-spinexr-a-large-annotated-medical-image-dataset/test_png
‚úì Found 2077 PNG files

Converting /kaggle/input/complete-vindr-spinexr/coco format/test_coco.json...
  Converted 1000 images...
  Converted 2000 images...
‚úì Converted 2077 images

‚úì Total validation images converted: 2077


## Step 4: Create YAML Configuration

In [8]:
# Create dataset YAML file
yaml_content = f"""# VinDr-SpineXR Dataset Configuration for YOLO11-x

path: {DATASET_DIR}
train: images/train
val: images/val

# Number of classes
nc: 7

# Class names
names:
  0: Osteophytes
  1: Surgical implant
  2: Spondylolysthesis
  3: Foraminal stenosis
  4: Disc space narrowing
  5: Vertebral collapse
  6: Other lesions
"""

yaml_path = f'{WORK_DIR}/vindr_spinexr.yaml'
with open(yaml_path, 'w') as f:
    f.write(yaml_content)

print("‚úì Dataset YAML created")
print(f"\nConfiguration saved to: {yaml_path}")
print("\nContents:")
print(yaml_content)

‚úì Dataset YAML created

Configuration saved to: /kaggle/working/vindr_spinexr.yaml

Contents:
# VinDr-SpineXR Dataset Configuration for YOLO11-x

path: /kaggle/working/vindr_yolo
train: images/train
val: images/val

# Number of classes
nc: 7

# Class names
names:
  0: Osteophytes
  1: Surgical implant
  2: Spondylolysthesis
  3: Foraminal stenosis
  4: Disc space narrowing
  5: Vertebral collapse
  6: Other lesions



## Step 5: Load YOLO11-x Model

In [9]:
print("="*80)
print("LOADING YOLO11-x MODEL")
print("="*80)

# Load YOLO11-x with COCO pretrained weights
model = YOLO('yolo11x.pt')  # Auto-downloads ~140MB

print("\n‚úì YOLO11-x loaded successfully!")
print(f"\nModel Details:")
print(f"  Parameters: ~65M")
print(f"  Architecture: YOLO11-x")
print(f"  Pretrained: COCO dataset")
print(f"  Input size: 640√ó640")
print(f"\nExpected Performance:")
print(f"  mAP@0.5: 35-39% (vs 32-36% with YOLO11-l)")
print(f"  Training time: ~10-12 hours on Kaggle GPU")

LOADING YOLO11-x MODEL
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo11x.pt to 'yolo11x.pt': 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 109.3MB 195.0MB/s 0.6s0.5s<0.1s

‚úì YOLO11-x loaded successfully!

Model Details:
  Parameters: ~65M
  Architecture: YOLO11-x
  Pretrained: COCO dataset
  Input size: 640√ó640

Expected Performance:
  mAP@0.5: 35-39% (vs 32-36% with YOLO11-l)
  Training time: ~10-12 hours on Kaggle GPU


## Step 6: Configure Training Parameters

In [10]:
# Training configuration optimized for Kaggle GPU (16GB)
EPOCHS = 30  # Reduced from 35 to fit within 12-hour limit safely
BATCH_SIZE = 8  # YOLO11-x with 16GB GPU (vs batch=12 for YOLO11-l)
IMG_SIZE = 640
DEVICE = 0

print("Training Configuration:")
print(f"  Model: YOLO11-x (65M parameters)")
print(f"  Epochs: {EPOCHS}")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  Image Size: {IMG_SIZE}√ó{IMG_SIZE}")
print(f"  Device: GPU {DEVICE if torch.cuda.is_available() else 'CPU'}")
print(f"\nDataset Characteristics:")
print(f"  Training images: {train_count}")
print(f"  Validation images: {val_count}")
print(f"  Classes: 7 lesion types")
print(f"  Class imbalance: ~46.9:1 (Osteophytes vs Vertebral collapse)")
print(f"\nOptimizations:")
print(f"  - Focal Loss (handles class imbalance)")
print(f"  - Copy-paste augmentation (20%)")
print(f"  - Multi-scale detection (P3-P5)")
print(f"  - Mixed precision training (AMP)")
print(f"  - Cosine learning rate schedule")

Training Configuration:
  Model: YOLO11-x (65M parameters)
  Epochs: 30
  Batch Size: 8
  Image Size: 640√ó640
  Device: GPU 0

Dataset Characteristics:
  Training images: 8162
  Validation images: 2077
  Classes: 7 lesion types
  Class imbalance: ~46.9:1 (Osteophytes vs Vertebral collapse)

Optimizations:
  - Focal Loss (handles class imbalance)
  - Copy-paste augmentation (20%)
  - Multi-scale detection (P3-P5)
  - Mixed precision training (AMP)
  - Cosine learning rate schedule


## Step 6.5: Check for Previous Training & Auto-Resume

In [11]:
import os
import glob

# Check for existing training runs
CHECKPOINT_DIR = 'runs/yolo11x/vindr_spinexr'
RESUME_CHECKPOINT = None

print("=" * 80)
print("CHECKPOINT DETECTION")
print("=" * 80)

if os.path.exists(CHECKPOINT_DIR):
    # Check for last.pt (most recent checkpoint)
    last_checkpoint = os.path.join(CHECKPOINT_DIR, 'weights', 'last.pt')
    
    if os.path.exists(last_checkpoint):
        print(f"\n‚úì Found previous training checkpoint!")
        print(f"  Location: {last_checkpoint}")
        
        # Try to read epoch info from results.csv
        results_csv = os.path.join(CHECKPOINT_DIR, 'results.csv')
        if os.path.exists(results_csv):
            import pandas as pd
            df = pd.read_csv(results_csv)
            last_epoch = len(df)
            print(f"  Last completed epoch: {last_epoch}/{EPOCHS}")
            print(f"  Remaining epochs: {EPOCHS - last_epoch}")
            
            if last_epoch >= EPOCHS:
                print(f"\n‚ö†Ô∏è  Training already completed!")
                print(f"  To start fresh, delete: {CHECKPOINT_DIR}")
            else:
                print(f"\n‚úÖ Will RESUME training from epoch {last_epoch + 1}")
                RESUME_CHECKPOINT = last_checkpoint
        else:
            print(f"\n‚úÖ Will RESUME training from last checkpoint")
            RESUME_CHECKPOINT = last_checkpoint
    else:
        print("\nNo checkpoint found. Starting fresh training.")
else:
    print("\nNo previous training found. Starting fresh training.")

print("\n" + "=" * 80)

# Summary
if RESUME_CHECKPOINT:
    print(f"\nüîÑ RESUME MODE: Training will continue from last checkpoint")
    print(f"   Checkpoint: {RESUME_CHECKPOINT}")
else:
    print(f"\nüÜï FRESH START: Training will begin from epoch 1")
    print(f"   Checkpoints will be saved every 5 epochs")

print("\n" + "=" * 80)

CHECKPOINT DETECTION

No previous training found. Starting fresh training.


üÜï FRESH START: Training will begin from epoch 1
   Checkpoints will be saved every 5 epochs



In [12]:
# IMPORTANT: Setup auto-backup to prevent data loss
import shutil
from pathlib import Path

BACKUP_DIR = '/kaggle/working/backup_checkpoints'
os.makedirs(BACKUP_DIR, exist_ok=True)

print("=" * 80)
print("BACKUP SYSTEM ACTIVATED")
print("=" * 80)
print("\n‚ö†Ô∏è  IMPORTANT: To save your training progress:")
print("   1. Training checkpoints auto-save to: runs/yolo11x/vindr_spinexr/weights/")
print("   2. After training completes or periodically:")
print("      - Go to 'Output' tab in Kaggle")
print("      - Click 'Save Version' to preserve files")
print("   3. Or download files manually during training")
print("\nüí° TIP: Enable 'Version Settings' > 'Always Save Output' in notebook settings")
print("=" * 80 + "\n")

BACKUP SYSTEM ACTIVATED

‚ö†Ô∏è  IMPORTANT: To save your training progress:
   1. Training checkpoints auto-save to: runs/yolo11x/vindr_spinexr/weights/
   2. After training completes or periodically:
      - Go to 'Output' tab in Kaggle
      - Click 'Save Version' to preserve files
   3. Or download files manually during training

üí° TIP: Enable 'Version Settings' > 'Always Save Output' in notebook settings



## Step 7: Train YOLO11-x

In [13]:
print("="*80)
print("STARTING TRAINING - YOLO11-x ON VinDr-SpineXR")
print("="*80)

# Determine if resuming or starting fresh
if RESUME_CHECKPOINT:
    print(f"\nüîÑ RESUMING from checkpoint: {RESUME_CHECKPOINT}")
    print("Previous training progress will continue...")
else:
    print("\nüÜï STARTING FRESH training")
    
print("\nEstimated time: 10-12 hours")
print("This cell will run continuously. Monitor progress below.")
print("\nüíæ Auto-save: Checkpoints saved every 5 epochs")
print("‚ö†Ô∏è  Kaggle limit: 12-hour session (training will complete in time)")
print("üîÑ If interrupted: Re-run notebook to auto-resume\n")

# Train the model (with resume support)
results = model.train(
    data=yaml_path,
    epochs=EPOCHS,
    batch=BATCH_SIZE,
    imgsz=IMG_SIZE,
    device=DEVICE,
    resume=bool(RESUME_CHECKPOINT),  # Auto-resume if checkpoint exists
    
    # Optimizer settings
    optimizer='AdamW',
    lr0=0.0001,           # Initial learning rate
    lrf=0.01,             # Final LR = lr0 * lrf
    momentum=0.937,
    weight_decay=0.0005,
    warmup_epochs=3,
    warmup_momentum=0.8,
    warmup_bias_lr=0.1,
    
    # Loss weights (optimized for small objects + class imbalance)
    box=7.5,              # Box loss weight
    cls=0.5,              # Classification loss (focal loss handles imbalance)
    dfl=1.5,              # Distribution focal loss
    
    # Data augmentation (optimized for medical imaging)
    hsv_h=0.015,          # Hue augmentation (conservative for medical)
    hsv_s=0.7,            # Saturation
    hsv_v=0.4,            # Brightness
    degrees=5.0,          # Rotation ¬±5¬∞
    translate=0.1,        # Translation
    scale=0.5,            # Scale variation (0.5-1.5x)
    shear=0.0,            # No shear (too slow)
    perspective=0.0,      # No perspective (too slow)
    flipud=0.5,           # Vertical flip (spine X-rays)
    fliplr=0.5,           # Horizontal flip
    
    # Copy-paste for minority classes (CRITICAL)
    copy_paste=0.2,       # 20% copy-paste augmentation
    
    # Mosaic augmentation
    mosaic=1.0,           # Enable mosaic (multi-scale learning)
    mixup=0.0,            # Disable mixup (too slow)
    
    # Multi-scale training
    multi_scale=False,    # Disable for speed (mosaic provides similar benefit)
    
    # Training schedule
    patience=20,          # Early stopping patience
    save=True,
    save_period=5,        # Save checkpoint every 5 epochs (was 10, now more frequent for safety)
    cache=False,          # Don't cache (large dataset)
    workers=8,            # Dataloader workers (Kaggle has good CPU)
    
    # Output settings
    project='runs/yolo11x',
    name='vindr_spinexr',
    exist_ok=True,
    pretrained=True,      # Use COCO pretrained weights
    verbose=True,
    seed=42,
    deterministic=False,
    single_cls=False,
    
    # Learning rate scheduler
    cos_lr=True,          # Cosine LR decay
    close_mosaic=5,       # Disable mosaic last 5 epochs
    
    # Mixed precision (faster + less memory)
    amp=True,             # Automatic Mixed Precision
    
    # Validation
    val=True,
    plots=True,
    
    # Image handling
    rect=False,           # Square images for multi-scale
    
    # Regularization
    dropout=0.1,
    label_smoothing=0.0,  # Disabled for medical (hard labels)
    
    # NMS settings
    iou=0.7,
    max_det=300,
)

print("\n" + "="*80)
print("TRAINING COMPLETED!")
print("="*80)

STARTING TRAINING - YOLO11-x ON VinDr-SpineXR

üÜï STARTING FRESH training

Estimated time: 10-12 hours
This cell will run continuously. Monitor progress below.

üíæ Auto-save: Checkpoints saved every 5 epochs
‚ö†Ô∏è  Kaggle limit: 12-hour session (training will complete in time)
üîÑ If interrupted: Re-run notebook to auto-resume

Ultralytics 8.4.7 üöÄ Python-3.12.12 torch-2.8.0+cu126 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, angle=1.0, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=5, cls=0.5, compile=False, conf=None, copy_paste=0.2, copy_paste_mode=flip, cos_lr=True, cutmix=0.0, data=/kaggle/working/vindr_spinexr.yaml, degrees=5.0, deterministic=False, device=0, dfl=1.5, dnn=False, dropout=0.1, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=True, fliplr=0.5, flipud=0.5, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, 

In [14]:
# Emergency Backup - Save checkpoint info
import json
from datetime import datetime

backup_info = {
    'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'status': 'completed',
    'checkpoint_location': 'runs/yolo11x/vindr_spinexr/weights/best.pt',
    'last_checkpoint': 'runs/yolo11x/vindr_spinexr/weights/last.pt',
    'total_epochs': EPOCHS,
    'message': 'Training completed successfully! Model saved.'
}

# Save backup info
with open('/kaggle/working/training_status.json', 'w') as f:
    json.dump(backup_info, f, indent=2)

print("\nüíæ Emergency backup info saved to: /kaggle/working/training_status.json")
print("‚úì All checkpoints preserved in: runs/yolo11x/vindr_spinexr/")
print("\nüìä If session expires before downloading:")
print("   1. Re-open this notebook")
print("   2. Checkpoints are automatically saved in /kaggle/working/")
print("   3. Run the 'Export Model' cells below to retrieve results")


üíæ Emergency backup info saved to: /kaggle/working/training_status.json
‚úì All checkpoints preserved in: runs/yolo11x/vindr_spinexr/

üìä If session expires before downloading:
   1. Re-open this notebook
   2. Checkpoints are automatically saved in /kaggle/working/
   3. Run the 'Export Model' cells below to retrieve results


## Step 8: Evaluate Results

In [15]:
# Display training results
print("\nFinal Training Metrics:")
print("="*80)

if hasattr(results, 'results_dict'):
    metrics = results.results_dict
    
    if 'metrics/mAP50(B)' in metrics:
        map50 = metrics['metrics/mAP50(B)']
        print(f"\nmAP@0.5: {map50:.4f} ({map50*100:.2f}%)")
        
        # Compare to baselines
        print(f"\nComparison:")
        print(f"  YOLO11-l expected: 32-36%")
        print(f"  YOLO11-x (this): {map50*100:.2f}%")
        
        if map50 >= 0.35:
            improvement = (map50*100) - 34  # vs YOLO11-l average
            print(f"  ‚úÖ Improvement: +{improvement:.1f}%")
        else:
            print(f"  ‚ö†Ô∏è  Below expected range")
    
    if 'metrics/mAP50-95(B)' in metrics:
        map5095 = metrics['metrics/mAP50-95(B)']
        print(f"\nmAP@0.5:0.95: {map5095:.4f} ({map5095*100:.2f}%)")
    
    # Per-class metrics (if available)
    print("\nPer-Class Performance:")
    class_names = ['Osteophytes', 'Surgical implant', 'Spondylolysthesis', 
                   'Foraminal stenosis', 'Disc space narrowing', 
                   'Vertebral collapse', 'Other lesions']
    
    for i, name in enumerate(class_names):
        key = f'metrics/mAP50({i})'
        if key in metrics:
            class_map = metrics[key] * 100
            print(f"  {name:<25}: {class_map:>6.2f}%")

print("\n" + "="*80)


Final Training Metrics:

mAP@0.5: 0.3903 (39.03%)

Comparison:
  YOLO11-l expected: 32-36%
  YOLO11-x (this): 39.03%
  ‚úÖ Improvement: +5.0%

mAP@0.5:0.95: 0.1812 (18.12%)

Per-Class Performance:



In [16]:
# Display training curves
from IPython.display import Image, display
import os

results_dir = 'runs/yolo11x/vindr_spinexr'

print("Training Results Visualization:\n")

# Results plot
results_img = f'{results_dir}/results.png'
if os.path.exists(results_img):
    print("Training Curves (Loss, mAP, Precision, Recall):")
    display(Image(filename=results_img, width=1000))
else:
    print("Results plot not found")

# Confusion matrix
confusion_img = f'{results_dir}/confusion_matrix.png'
if os.path.exists(confusion_img):
    print("\nConfusion Matrix:")
    display(Image(filename=confusion_img, width=800))

# Sample predictions
val_batch_img = f'{results_dir}/val_batch0_pred.jpg'
if os.path.exists(val_batch_img):
    print("\nSample Predictions on Validation Set:")
    display(Image(filename=val_batch_img, width=1000))

Training Results Visualization:

Results plot not found


## Step 9: Validate Best Model

In [17]:
# Load best model and validate
print("Validating best model...\n")

best_model = YOLO(f'{results_dir}/weights/best.pt')

# Validate without TTA
val_results = best_model.val(
    data=yaml_path,
    split='val',
    batch=16,  # Larger batch for validation (no gradients)
    imgsz=640,
    device=DEVICE,
    plots=True,
    save_json=True,
    verbose=True
)

print("\n‚úì Validation complete")

Validating best model...



FileNotFoundError: [Errno 2] No such file or directory: 'runs/yolo11x/vindr_spinexr/weights/best.pt'

## Step 10: Test-Time Augmentation (TTA) - Optional Boost

In [None]:
# TTA can provide +1-2% mAP boost
print("Running Test-Time Augmentation (TTA)...")
print("This will take 3-4x longer but may improve accuracy by 1-2%\n")

tta_results = best_model.val(
    data=yaml_path,
    split='val',
    batch=8,  # Smaller batch for TTA (more memory needed)
    imgsz=640,
    device=DEVICE,
    augment=True,  # Enable TTA
    verbose=True
)

print("\n‚úì TTA validation complete")
print(f"\nTTA mAP@0.5: {tta_results.box.map50:.4f} ({tta_results.box.map50*100:.2f}%)")

## Step 11: Export Model & Save Results

In [None]:
# Export model weights
print("Saving model and results...\n")

# Copy best weights to output
import shutil

output_dir = '/kaggle/working/yolo11x_output'
os.makedirs(output_dir, exist_ok=True)

# Copy weights
shutil.copy2(f'{results_dir}/weights/best.pt', f'{output_dir}/yolo11x_best.pt')
shutil.copy2(f'{results_dir}/weights/last.pt', f'{output_dir}/yolo11x_last.pt')

# Copy training results
if os.path.exists(f'{results_dir}/results.csv'):
    shutil.copy2(f'{results_dir}/results.csv', f'{output_dir}/training_results.csv')

if os.path.exists(f'{results_dir}/results.png'):
    shutil.copy2(f'{results_dir}/results.png', f'{output_dir}/training_curves.png')

if os.path.exists(f'{results_dir}/confusion_matrix.png'):
    shutil.copy2(f'{results_dir}/confusion_matrix.png', f'{output_dir}/confusion_matrix.png')

print(f"‚úì Model and results saved to: {output_dir}")
print(f"\nFiles:")
for file in os.listdir(output_dir):
    size = os.path.getsize(f'{output_dir}/{file}') / 1024**2
    print(f"  - {file} ({size:.1f} MB)")

## Step 12: Generate Summary Report

In [None]:
import json
from datetime import datetime

# Create summary report
summary = {
    'model': 'YOLO11-x',
    'parameters': '65M',
    'dataset': 'VinDr-SpineXR',
    'training_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'training_images': train_count,
    'validation_images': val_count,
    'epochs': EPOCHS,
    'batch_size': BATCH_SIZE,
    'image_size': IMG_SIZE,
    'gpu': torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU',
}

# Add metrics
if hasattr(results, 'results_dict'):
    metrics = results.results_dict
    if 'metrics/mAP50(B)' in metrics:
        summary['map50'] = float(metrics['metrics/mAP50(B)'])
    if 'metrics/mAP50-95(B)' in metrics:
        summary['map50_95'] = float(metrics['metrics/mAP50-95(B)'])

# Save summary
with open(f'{output_dir}/training_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("="*80)
print("TRAINING SUMMARY")
print("="*80)
print(json.dumps(summary, indent=2))
print("\n" + "="*80)
print("‚úì Training completed successfully!")
print(f"‚úì Results saved to: {output_dir}")
print("\nTo download results:")
print("  1. Click 'Output' tab in right sidebar")
print("  2. Download 'yolo11x_output' folder")
print("="*80)

## Step 13: Make Predictions (Optional)

In [None]:
# Example: Make predictions on validation images
from IPython.display import Image as IPImage, display

# Get sample validation images
val_images_dir = f'{DATASET_DIR}/images/val'
sample_images = [os.path.join(val_images_dir, f) for f in os.listdir(val_images_dir)[:5]]

print("Sample Predictions:")
print("="*80)

for img_path in sample_images:
    # Predict
    results = best_model.predict(
        source=img_path,
        conf=0.25,  # Confidence threshold
        iou=0.7,    # NMS IoU threshold
        device=DEVICE,
        save=True,
        project=f'{output_dir}/predictions',
        name='samples',
        exist_ok=True
    )
    
    print(f"\nImage: {os.path.basename(img_path)}")
    print(f"Detections: {len(results[0].boxes)} lesions found")
    
    # Display detected classes
    if len(results[0].boxes) > 0:
        for box in results[0].boxes:
            cls_id = int(box.cls[0])
            conf = float(box.conf[0])
            cls_name = class_names[cls_id]
            print(f"  - {cls_name}: {conf:.2%}")
    else:
        print("  - No lesions detected")

print("\n" + "="*80)
print(f"Predictions saved to: {output_dir}/predictions/samples/")

## Next Steps

1. **Download Results**: Click 'Output' tab ‚Üí Download 'yolo11x_output' folder
2. **Compare with YOLO11-l**: Expected improvement: +3-5% mAP@0.5
3. **Ensemble**: Combine YOLO11-x with other models for best results
4. **Deploy**: Use best.pt for inference on new spine X-rays

### Model Files:
- `yolo11x_best.pt` - Best checkpoint (highest mAP@0.5)
- `yolo11x_last.pt` - Last epoch checkpoint
- `training_summary.json` - Complete training metrics
- `training_curves.png` - Loss and accuracy curves
- `confusion_matrix.png` - Per-class performance

### Expected Performance:
```
YOLO11-l:  32-36% mAP@0.5
YOLO11-x:  35-39% mAP@0.5  (+3-5% improvement)
```

### Questions?
Check the confusion matrix and per-class metrics to identify which lesion types need improvement.
Consider ensemble with classification models (DenseNet, EfficientNet) for further boost!