 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5-Fold Cross-Validation Training (Sliding Window) - Google Colab\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/marioknicola/synthsup-speechMRI-recon/blob/main/train_cv_sliding_colab.ipynb)\n",
    "\n",
    "This notebook trains a U-Net model using 5-fold cross-validation with a sliding window approach:\n",
    "- **Train/Val/Test = 5/1/1 subjects per fold**\n",
    "- **Fold 1**: Train=[0021,0022,0023,0024,0025], Val=[0026], Test=[0027]\n",
    "- **Fold 2**: Train=[0022,0023,0024,0025,0026], Val=[0027], Test=[0021]\n",
    "- **Fold 3**: Train=[0023,0024,0025,0026,0027], Val=[0021], Test=[0022]\n",
    "- **Fold 4**: Train=[0024,0025,0026,0027,0021], Val=[0022], Test=[0023]\n",
    "- **Fold 5**: Train=[0025,0026,0027,0021,0022], Val=[0023], Test=[0024]\n",
    "\n",
    "**Features:**\n",
    "- Early stopping (patience=20)\n",
    "- Only saves best model per fold\n",
    "- Batch size = 4\n",
    "- Combined MSE + SSIM loss"
   ]
  },

## 1. Setup and Install Dependencies

In [None]:
# Check GPU
!nvidia-smi

In [None]:
# Install dependencies
!pip install nibabel tqdm

## 2. Clone Repository

In [None]:
# Clone your repository
!git clone https://github.com/marioknicola/synthsup-speechMRI-recon.git
%cd synthsup-speechMRI-recon

## 3. Mount Google Drive and Link Data

Upload your data folders to Google Drive:
- `Synth_LR_unpadded_nii/` (input LR images)
- `Dynamic_SENSE_padded/` (target HR images)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Create symbolic links to your data
# Adjust paths to match your Google Drive structure
!ln -s "/content/drive/MyDrive/MSc_Project/Synth_LR_unpadded_nii" ../Synth_LR_unpadded_nii
!ln -s "/content/drive/MyDrive/MSc_Project/Dynamic_SENSE_padded" ../Dynamic_SENSE_padded

In [None]:
# Verify data is accessible
!ls -lh ../Synth_LR_unpadded_nii | head -20
!ls -lh ../Dynamic_SENSE_padded | head -20

## 4. Run Training

### Option A: Train All 5 Folds Sequentially (Recommended)

In [None]:
# Train all 5 folds with early stopping
# This will take approximately 10-15 hours total on T4 GPU
!python train_cross_validation_sliding.py \
    --all-folds \
    --input-dir ../Synth_LR_unpadded_nii \
    --target-dir ../Dynamic_SENSE_padded \
    --epochs 200 \
    --batch-size 4 \
    --lr 1e-5 \
    --early-stopping-patience 20 \
    --output-dir cv_results_sliding

### Option B: Train Individual Folds

If you want to train folds separately (e.g., across multiple Colab sessions):

In [None]:
# Train Fold 1 only
!python train_cross_validation_sliding.py \
    --fold 1 \
    --input-dir ../Synth_LR_unpadded_nii \
    --target-dir ../Dynamic_SENSE_padded \
    --epochs 200 \
    --batch-size 4 \
    --lr 1e-5 \
    --early-stopping-patience 20 \
    --output-dir cv_results_sliding

In [None]:
# Train Fold 2
!python train_cross_validation_sliding.py --fold 2 --input-dir ../Synth_LR_unpadded_nii --target-dir ../Dynamic_SENSE_padded --epochs 200 --output-dir cv_results_sliding

In [None]:
# Train Fold 3
!python train_cross_validation_sliding.py --fold 3 --input-dir ../Synth_LR_unpadded_nii --target-dir ../Dynamic_SENSE_padded --epochs 200 --output-dir cv_results_sliding

In [None]:
# Train Fold 4
!python train_cross_validation_sliding.py --fold 4 --input-dir ../Synth_LR_unpadded_nii --target-dir ../Dynamic_SENSE_padded --epochs 200 --output-dir cv_results_sliding

In [None]:
# Train Fold 5
!python train_cross_validation_sliding.py --fold 5 --input-dir ../Synth_LR_unpadded_nii --target-dir ../Dynamic_SENSE_padded --epochs 200 --output-dir cv_results_sliding

## 5. Check Results

In [None]:
# List all fold results
!ls -lh cv_results_sliding/

In [None]:
# Check individual fold results
!ls -lh cv_results_sliding/fold1/
!ls -lh cv_results_sliding/fold2/
!ls -lh cv_results_sliding/fold3/
!ls -lh cv_results_sliding/fold4/
!ls -lh cv_results_sliding/fold5/

In [None]:
# View summary if all folds completed
import json

try:
    with open('cv_results_sliding/cv_summary.json', 'r') as f:
        summary = json.load(f)
    
    print("="*80)
    print("CROSS-VALIDATION SUMMARY")
    print("="*80)
    print(f"Total folds: {summary['total_folds']}")
    print(f"\nAverage validation loss: {summary['avg_val_loss']:.6f} ± {summary['std_val_loss']:.6f}")
    print(f"Average test loss: {summary['avg_test_loss']:.6f} ± {summary['std_test_loss']:.6f}")
    print("\nPer-fold results:")
    for result in summary['results']:
        print(f"  Fold {result['fold']}: Val={result['best_val_loss']:.6f}, Test={result['final_test_loss']:.6f}")
        if result.get('early_stopped'):
            print(f"            (early stopped at epoch {result.get('best_epoch', 'N/A')})")
except FileNotFoundError:
    print("Summary not found. Make sure all folds have completed training.")

## 6. Download Results to Google Drive

In [None]:
# Copy results to Google Drive
!cp -r cv_results_sliding "/content/drive/MyDrive/MSc_Project/cv_results_sliding"
print("✅ Results copied to Google Drive")

## 7. Optional: Download Results as ZIP

In [None]:
# Create a zip file (excluding large model files if needed)
!zip -r cv_results_sliding.zip cv_results_sliding/ -x "*.pth"

# Download via Colab
from google.colab import files
files.download('cv_results_sliding.zip')

In [None]:
# Or create zip WITH model files
!zip -r cv_results_sliding_full.zip cv_results_sliding/

# Copy to Drive (recommended for large files)
!cp cv_results_sliding_full.zip "/content/drive/MyDrive/MSc_Project/"
print("✅ Full results (with models) saved to Google Drive")

## 8. Monitor Training (Optional)

If training is running, you can check progress in another cell:

In [None]:
# Check training history for a specific fold
import json
import matplotlib.pyplot as plt

fold_num = 1  # Change to check different folds

try:
    with open(f'cv_results_sliding/fold{fold_num}/training_history.json', 'r') as f:
        history = json.load(f)
    
    epochs = range(1, len(history['train_loss']) + 1)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    # Loss
    axes[0].plot(epochs, history['train_loss'], label='Train')
    axes[0].plot(epochs, history['val_loss'], label='Val')
    axes[0].plot(epochs, history['test_loss'], label='Test', alpha=0.7)
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].set_title(f'Fold {fold_num} - Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # MSE
    axes[1].plot(epochs, history['train_mse'], label='Train')
    axes[1].plot(epochs, history['val_mse'], label='Val')
    axes[1].plot(epochs, history['test_mse'], label='Test', alpha=0.7)
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('MSE')
    axes[1].set_title(f'Fold {fold_num} - MSE')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    # SSIM
    axes[2].plot(epochs, history['train_ssim'], label='Train')
    axes[2].plot(epochs, history['val_ssim'], label='Val')
    axes[2].plot(epochs, history['test_ssim'], label='Test', alpha=0.7)
    axes[2].set_xlabel('Epoch')
    axes[2].set_ylabel('SSIM')
    axes[2].set_title(f'Fold {fold_num} - SSIM')
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print(f"Best epoch: {history.get('best_epoch', 'N/A')}")
    if history.get('early_stopped'):
        print(f"Early stopped at epoch: {history.get('stopped_epoch', 'N/A')}")
    
except FileNotFoundError:
    print(f"Training history for fold {fold_num} not found yet.")