# Spondylolisthesis Detection - Kaggle Training

**Template notebook for training on Kaggle with P100 GPU**

## üìã Before Running:

1. **Upload Dataset** (One-time setup):
   - Go to [kaggle.com/datasets](https://www.kaggle.com/datasets)
   - Click "New Dataset"
   - Upload `spondylolisthesis-dataset.zip`
   - Title: `Spondylolisthesis Vertebral Landmark Dataset`
   - Set to Private

2. **Attach Dataset to This Notebook**:
   - Click "Add Data" (right panel)
   - Search for your dataset: "Spondylolisthesis Vertebral Landmark Dataset"
   - Click "Add"

3. **Enable GPU**:
   - Settings ‚Üí Accelerator ‚Üí GPU P100
   - Click "Save"

## üöÄ Then Run All Cells Below

## 1. Check GPU Availability

In [None]:
import torch
import sys

print("="*60)
print("Environment Check")
print("="*60)
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
print("="*60)

## 2. Get Latest Code from GitHub

In [None]:
import os

# Always start from /kaggle/working
%cd /kaggle/working

# Check if repo already exists
if os.path.exists('spondylolisthesis-maht-net'):
    print("Repository exists - pulling latest changes...")
    %cd spondylolisthesis-maht-net
    !git pull origin main
    print("‚úì Code updated successfully")
else:
    print("Cloning repository for the first time...")
    !git clone https://github.com/mohamednourdine/spondylolisthesis-maht-net.git
    %cd spondylolisthesis-maht-net
    print("‚úì Code cloned successfully")

print(f"\nCurrent directory: {os.getcwd()}")

## 3. Install Dependencies

In [None]:
# Install project in editable mode (required for imports from src/)
# Note: Most dependencies (PyTorch, albumentations, etc.) are pre-installed in Kaggle
!pip install -q -e .

print("‚úì Project installed successfully")

## 4. Link Dataset

**Important**: Make sure you've attached your dataset in the Kaggle UI (Add Data ‚Üí Your Dataset)

In [None]:
import os

# List available datasets
print("Available datasets in /kaggle/input/:")
print("="*60)
for item in os.listdir('/kaggle/input'):
    print(f"  - {item}")
print("="*60)

# Expected dataset path (adjust if your dataset has a different name)
# Kaggle converts spaces to hyphens and makes lowercase
dataset_name = 'spondylolisthesis-vertebral-landmark-dataset'
dataset_path = f'/kaggle/input/{dataset_name}'

# Check if dataset exists
if os.path.exists(dataset_path):
    print(f"\n‚úì Dataset found at: {dataset_path}")
    print(f"\nDataset structure:")
    !ls -la {dataset_path}
else:
    print(f"\n‚ùå Dataset not found at: {dataset_path}")
    print("\n‚ö†Ô∏è  Please attach your dataset:")
    print("   1. Click 'Add Data' in right panel")
    print("   2. Search for 'Spondylolisthesis Vertebral Landmark Dataset'")
    print("   3. Click 'Add'")
    print("   4. Re-run this cell")

In [None]:
# Create proper data structure by copying files
# This is more reliable than symlinks for this dataset structure

dataset_name = 'spondylolisthesis-vertebral-landmark-dataset'
dataset_path = f'/kaggle/input/{dataset_name}'

# Remove old data directory if exists
!rm -rf data

# Create data directory with proper structure
!mkdir -p data/Train/images data/Train/labels data/Validation/images data/Validation/labels

# Copy files instead of symlinking (more reliable)
print("Copying dataset files...")
print("="*60)

# Copy train images and labels
!cp -r {dataset_path}/Train/Keypointrcnn_data/images/train/* data/Train/images/
!cp -r {dataset_path}/Train/Keypointrcnn_data/labels/train/* data/Train/labels/

# Copy validation images and labels
!cp -r {dataset_path}/Train/Keypointrcnn_data/images/val/* data/Validation/images/
!cp -r {dataset_path}/Train/Keypointrcnn_data/labels/val/* data/Validation/labels/

# Verify final structure
print("\nFinal data structure:")
print("="*60)
!ls data/
print("\nTrain images (first 5):")
!ls data/Train/images/ | head -5
print(f"\nTotal train images: {len(os.listdir('data/Train/images/'))}")
print(f"Total train labels: {len(os.listdir('data/Train/labels/'))}")
print(f"Total val images: {len(os.listdir('data/Validation/images/'))}")
print(f"Total val labels: {len(os.listdir('data/Validation/labels/'))}")
print("="*60)
print("‚úì Dataset copied and ready for training")

## 5. Quick Environment Test

Test with 2 epochs on 10 samples (~2 minutes)

In [None]:
# Run quick test to verify everything works (2-3 minutes)
!python tests/test_training_small.py

## 6. Start Full Training

**Training Configuration:**
- Model: UNet (31M parameters)
- Epochs: 50
- Batch Size: 16 (optimal for P100 GPU with 16GB memory)
- Expected Duration: ~5-7 hours
- Metrics: MRE, SDR@2mm, SDR@2.5mm, SDR@3mm, SDR@4mm

In [None]:
# Full training with production settings
!python train.py \
    --model unet \
    --epochs 50 \
    --batch-size 16 \
    --experiment-name kaggle_p100_production_v1

## 7. View Training Results

In [None]:
import json
import glob
import pandas as pd

# Find latest experiment
experiment_dirs = sorted(glob.glob('experiments/results/unet/*'), key=os.path.getmtime, reverse=True)

if experiment_dirs:
    latest_exp = experiment_dirs[0]
    print(f"Latest experiment: {latest_exp}")
    print("="*60)
    
    # Load training history
    history_file = os.path.join(latest_exp, 'training_history.json')
    if os.path.exists(history_file):
        with open(history_file, 'r') as f:
            history = json.load(f)
        
        print(f"\nBest Validation Loss: {history['best_val_loss']:.4f}")
        print(f"Best Validation Metric: {history['best_val_metric']:.4f}")
        
        # Show final metrics
        if history['val_metrics']:
            final_metrics = history['val_metrics'][-1]
            print("\nFinal Validation Metrics:")
            print(f"  MRE: {final_metrics.get('MRE', 'N/A'):.2f} pixels")
            print(f"  SDR@2mm: {final_metrics.get('SDR_2.0mm', 0)*100:.2f}%")
            print(f"  SDR@2.5mm: {final_metrics.get('SDR_2.5mm', 0)*100:.2f}%")
            print(f"  SDR@3mm: {final_metrics.get('SDR_3.0mm', 0)*100:.2f}%")
            print(f"  SDR@4mm: {final_metrics.get('SDR_4.0mm', 0)*100:.2f}%")
    
    print("\n" + "="*60)
    print("Saved files:")
    !ls -lh {latest_exp}
else:
    print("No experiments found")

## 8. Plot Training Curves

In [None]:
import matplotlib.pyplot as plt
import json
import glob

# Find latest experiment
experiment_dirs = sorted(glob.glob('experiments/results/unet/*'), key=os.path.getmtime, reverse=True)

if experiment_dirs:
    latest_exp = experiment_dirs[0]
    history_file = os.path.join(latest_exp, 'training_history.json')
    
    if os.path.exists(history_file):
        with open(history_file, 'r') as f:
            history = json.load(f)
        
        # Create figure with subplots
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        
        # Plot 1: Loss
        axes[0, 0].plot(history['train_losses'], label='Train Loss', marker='o')
        axes[0, 0].plot(history['val_losses'], label='Val Loss', marker='s')
        axes[0, 0].set_xlabel('Epoch')
        axes[0, 0].set_ylabel('Loss')
        axes[0, 0].set_title('Training and Validation Loss')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)
        
        # Plot 2: MRE
        mre_values = [m.get('MRE', 0) for m in history['val_metrics']]
        axes[0, 1].plot(mre_values, label='Val MRE', marker='o', color='orange')
        axes[0, 1].set_xlabel('Epoch')
        axes[0, 1].set_ylabel('MRE (pixels)')
        axes[0, 1].set_title('Mean Radial Error')
        axes[0, 1].legend()
        axes[0, 1].grid(True, alpha=0.3)
        
        # Plot 3: SDR metrics
        sdr_2mm = [m.get('SDR_2.0mm', 0)*100 for m in history['val_metrics']]
        sdr_25mm = [m.get('SDR_2.5mm', 0)*100 for m in history['val_metrics']]
        sdr_3mm = [m.get('SDR_3.0mm', 0)*100 for m in history['val_metrics']]
        sdr_4mm = [m.get('SDR_4.0mm', 0)*100 for m in history['val_metrics']]
        
        axes[1, 0].plot(sdr_2mm, label='SDR@2mm', marker='o')
        axes[1, 0].plot(sdr_25mm, label='SDR@2.5mm', marker='s')
        axes[1, 0].plot(sdr_3mm, label='SDR@3mm', marker='^')
        axes[1, 0].plot(sdr_4mm, label='SDR@4mm', marker='d')
        axes[1, 0].set_xlabel('Epoch')
        axes[1, 0].set_ylabel('SDR (%)')
        axes[1, 0].set_title('Successful Detection Rates')
        axes[1, 0].legend()
        axes[1, 0].grid(True, alpha=0.3)
        
        # Plot 4: Summary text
        axes[1, 1].axis('off')
        summary_text = f"""
        Training Summary
        ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        
        Experiment: {os.path.basename(latest_exp)}
        
        Total Epochs: {len(history['train_losses'])}
        
        Best Val Loss: {history['best_val_loss']:.4f}
        
        Final Metrics:
          ‚Ä¢ MRE: {mre_values[-1]:.2f} pixels
          ‚Ä¢ SDR@2mm: {sdr_2mm[-1]:.2f}%
          ‚Ä¢ SDR@2.5mm: {sdr_25mm[-1]:.2f}%
          ‚Ä¢ SDR@3mm: {sdr_3mm[-1]:.2f}%
          ‚Ä¢ SDR@4mm: {sdr_4mm[-1]:.2f}%
        
        Best Epoch: {history['val_losses'].index(min(history['val_losses'])) + 1}
        """
        axes[1, 1].text(0.1, 0.5, summary_text, fontsize=12, family='monospace',
                       verticalalignment='center')
        
        plt.tight_layout()
        plt.savefig(os.path.join(latest_exp, 'training_curves.png'), dpi=150, bbox_inches='tight')
        plt.show()
        
        print(f"\n‚úì Plot saved to: {os.path.join(latest_exp, 'training_curves.png')}")
else:
    print("No experiments found")

## 9. Download Results

Download the trained model and results to your local machine

In [None]:
import shutil

# Find latest experiment
experiment_dirs = sorted(glob.glob('experiments/results/unet/*'), key=os.path.getmtime, reverse=True)

if experiment_dirs:
    latest_exp = experiment_dirs[0]
    exp_name = os.path.basename(latest_exp)
    
    # Create archive
    archive_name = f'{exp_name}_results'
    shutil.make_archive(archive_name, 'zip', latest_exp)
    
    print(f"‚úì Results archived: {archive_name}.zip")
    print(f"\nFile size: {os.path.getsize(archive_name + '.zip') / (1024*1024):.2f} MB")
    print("\nTo download:")
    print("  1. Check the 'Output' tab on the right ‚Üí")
    print(f"  2. Download {archive_name}.zip")
    
    # List archive contents
    print("\nArchive contains:")
    !unzip -l {archive_name}.zip | head -20
else:
    print("No experiments found")

## üìù Notes

### Batch Size Guidelines:
- **P100 (16GB)**: Use batch size 16 (optimal)
- **T4 (15GB)**: Use batch size 8
- **K80 (12GB)**: Use batch size 4

### To Change Batch Size:
```python
!python train.py --model unet --epochs 50 --batch-size 8 --experiment-name my_experiment
```

### To Resume Training:
```python
!python train.py --model unet --resume experiments/results/unet/your_experiment/checkpoints/last_model.pth
```

### Expected Training Time:
- **P100**: ~5-7 hours (50 epochs, batch size 16)
- **T4**: ~8-10 hours (50 epochs, batch size 8)

### Target Metrics (After 50 epochs):
- MRE: < 30 pixels
- SDR@2mm: > 70%
- SDR@3mm: > 85%