# 🚀 Google Colab Setup for Endometriosis Segmentation

This notebook sets up the complete environment on Google Colab.

**Requirements:**
- Google account
- ~15GB Google Drive space (for dataset + checkpoints)
- GPU runtime (free T4 or Colab Pro)

**Runtime Settings:**
1. Runtime → Change runtime type
2. Hardware accelerator: **GPU** (T4)
3. Save

## Step 1: Check GPU and System Info

In [None]:
# Check GPU availability
!nvidia-smi

import torch
print(f"\nPyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Step 2: Mount Google Drive

We'll store the dataset and checkpoints in Google Drive to persist across sessions.

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Create project directory in Drive
PROJECT_ROOT = '/content/drive/MyDrive/endometriosis-uncertainty-seg'
os.makedirs(PROJECT_ROOT, exist_ok=True)

print(f"✓ Project root: {PROJECT_ROOT}")
print(f"✓ Available space: {os.statvfs('/content/drive').f_bavail * os.statvfs('/content/drive').f_frsize / 1e9:.2f} GB")

## Step 3: Clone Repository

Clone the project repository to the local Colab instance (faster access).

In [None]:
# Clone repository to local Colab storage (faster)
import os

if not os.path.exists('/content/endo-uncertainty-seg'):
    # Option 1: Clone from your GitHub (replace with your repo URL)
    # !git clone https://github.com/yourusername/endo-uncertainty-seg.git /content/endo-uncertainty-seg
    
    # Option 2: Copy from Drive if you uploaded it
    !cp -r {PROJECT_ROOT}/code /content/endo-uncertainty-seg
    
    # Option 3: Create fresh structure
    !mkdir -p /content/endo-uncertainty-seg
    print("✓ Created project directory")
else:
    print("✓ Project directory already exists")

# Change to project directory
%cd /content/endo-uncertainty-seg

## Step 4: Install Dependencies

In [None]:
# Install core dependencies
!pip install -q monai nibabel SimpleITK pydicom
!pip install -q pytorch-lightning wandb tensorboard
!pip install -q scikit-image albumentations
!pip install -q coloredlogs plotly

print("✓ Dependencies installed")

## Step 5: Setup Project Structure

In [None]:
# Create directory structure
dirs = [
    'src/data',
    'src/models',
    'src/training',
    'src/utils',
    'configs',
    'scripts',
    'notebooks',
]

for d in dirs:
    os.makedirs(d, exist_ok=True)

# Link data directory to Google Drive
!ln -sf {PROJECT_ROOT}/data /content/endo-uncertainty-seg/data

# Create experiment directories in Drive (persistent)
!mkdir -p {PROJECT_ROOT}/experiments/checkpoints
!mkdir -p {PROJECT_ROOT}/experiments/logs
!mkdir -p {PROJECT_ROOT}/experiments/results
!ln -sf {PROJECT_ROOT}/experiments /content/endo-uncertainty-seg/experiments

print("✓ Project structure created")
print("✓ Data and experiments linked to Google Drive")

## Step 6: Upload/Create Source Files

**Option A:** Upload files from your local machine  
**Option B:** Copy from GitHub  
**Option C:** Create files directly in Colab

In [None]:
# Option A: Upload files
from google.colab import files

print("Upload your source files (.py files from src/)")
print("Or skip if using Option B/C")

# Uncomment to upload
# uploaded = files.upload()

# Option B: Clone from GitHub
# Already done in Step 3

# Option C: Will create minimal versions below
print("✓ Ready to create source files")

### Create Minimal Source Files (if not uploaded)

In [None]:
# This creates minimal versions - replace with your full files
# Or skip if you uploaded/cloned the full project

import sys
sys.path.append('/content/endo-uncertainty-seg')

# Create __init__ files
!touch src/__init__.py
!touch src/data/__init__.py
!touch src/models/__init__.py
!touch src/utils/__init__.py

print("✓ Minimal structure created")
print("⚠️  Upload your full source files for complete functionality")

## Step 7: Download Dataset to Google Drive

**Important:** This is ~8GB and will take 15-30 minutes.  
**Strategy:** Download once to Google Drive, reuse across sessions.

In [None]:
# Check if dataset already exists
DATASET_PATH = f"{PROJECT_ROOT}/data/raw/UT-EndoMRI"

if os.path.exists(DATASET_PATH):
    print("✓ Dataset already exists in Google Drive!")
    
    # Verify dataset
    d1_path = os.path.join(DATASET_PATH, "D1_MHS")
    d2_path = os.path.join(DATASET_PATH, "D2_TCPW")
    
    if os.path.exists(d1_path):
        d1_count = len([d for d in os.listdir(d1_path) if os.path.isdir(os.path.join(d1_path, d))])
        print(f"  Dataset 1: {d1_count} subjects")
    
    if os.path.exists(d2_path):
        d2_count = len([d for d in os.listdir(d2_path) if os.path.isdir(os.path.join(d2_path, d))])
        print(f"  Dataset 2: {d2_count} subjects")
else:
    print("⏳ Downloading dataset... (this will take 15-30 minutes)")
    print("   You only need to do this ONCE - it will be saved to Drive")
    
    # Create data directory
    !mkdir -p {PROJECT_ROOT}/data/raw
    
    # Download dataset
    !wget -O {PROJECT_ROOT}/data/raw/UT-EndoMRI.zip https://zenodo.org/records/15750762/files/UT-EndoMRI.zip
    
    print("✓ Download complete!")
    print("⏳ Extracting...")
    
    # Extract
    !unzip -q {PROJECT_ROOT}/data/raw/UT-EndoMRI.zip -d {PROJECT_ROOT}/data/raw/
    
    print("✓ Extraction complete!")
    
    # Optional: Remove zip to save space
    # !rm {PROJECT_ROOT}/data/raw/UT-EndoMRI.zip

## Step 8: Create Data Splits

In [None]:
# Create splits using paper's split or custom
SPLITS_FILE = f"{PROJECT_ROOT}/data/splits/split_info.json"

if os.path.exists(SPLITS_FILE):
    print("✓ Splits already exist!")
    
    import json
    with open(SPLITS_FILE, 'r') as f:
        splits = json.load(f)
    print(f"  Train: {len(splits['train'])} subjects")
    print(f"  Val: {len(splits['val'])} subjects")
    print(f"  Test: {len(splits['test'])} subjects")
else:
    print("Creating data splits...")
    
    # You'll need to run create_splits.py here
    # Or create splits manually
    
    import json
    import numpy as np
    
    # Paper's split for D2_TCPW
    train_val_ids = [f"D2-{i:03d}" for i in range(8)]
    test_ids = [f"D2-{i:03d}" for i in range(8, 38)]
    
    np.random.seed(42)
    indices = np.random.permutation(len(train_val_ids))
    n_train = int(len(train_val_ids) * 0.8)
    
    train_ids = [train_val_ids[i] for i in indices[:n_train]]
    val_ids = [train_val_ids[i] for i in indices[n_train:]]
    
    splits = {
        'train': train_ids,
        'val': val_ids,
        'test': test_ids,
        'dataset': 'D2_TCPW',
        'seed': 42
    }
    
    os.makedirs(os.path.dirname(SPLITS_FILE), exist_ok=True)
    with open(SPLITS_FILE, 'w') as f:
        json.dump(splits, f, indent=2)
    
    print("✓ Splits created!")

## Step 9: Test Data Loading

In [None]:
# Test that we can load data
import nibabel as nib
import numpy as np
from pathlib import Path

# Load one sample
dataset_path = Path(DATASET_PATH) / "D2_TCPW"
sample_dir = list(dataset_path.glob("D2-*"))[0]

print(f"Testing with: {sample_dir.name}")

# Find T2FS image
t2fs_files = list(sample_dir.glob("*T2FS.nii.gz"))
if t2fs_files:
    img = nib.load(str(t2fs_files[0]))
    data = img.get_fdata()
    
    print(f"✓ Successfully loaded image")
    print(f"  Shape: {data.shape}")
    print(f"  Intensity range: [{data.min():.2f}, {data.max():.2f}]")
    print(f"  Spacing: {np.abs(np.diag(img.affine)[:3])}")
else:
    print("⚠️  No T2FS image found")

## Step 10: Quick Visualization

In [None]:
import matplotlib.pyplot as plt

# Visualize middle slice
if 'data' in locals():
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Axial
    axes[0].imshow(data[:, :, data.shape[2]//2].T, cmap='gray', origin='lower')
    axes[0].set_title('Axial View')
    axes[0].axis('off')
    
    # Sagittal
    axes[1].imshow(data[data.shape[0]//2, :, :].T, cmap='gray', origin='lower')
    axes[1].set_title('Sagittal View')
    axes[1].axis('off')
    
    # Coronal
    axes[2].imshow(data[:, data.shape[1]//2, :].T, cmap='gray', origin='lower')
    axes[2].set_title('Coronal View')
    axes[2].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("✓ Visualization successful!")

## Step 11: Setup Complete! 🎉

### What's Ready:
- ✅ GPU environment configured
- ✅ Google Drive mounted
- ✅ Dependencies installed
- ✅ Dataset downloaded (8GB in Drive)
- ✅ Data splits created
- ✅ Project structure set up

### Storage Layout:
```
Google Drive/endometriosis-uncertainty-seg/
├── data/raw/UT-EndoMRI/     # 8GB dataset (persistent)
├── data/splits/             # Train/val/test splits
├── experiments/
│   ├── checkpoints/         # Model weights (persistent)
│   ├── logs/               # Training logs
│   └── results/            # Predictions & metrics
└── code/                    # Your source code (backup)
```

### Next Steps:
1. **Continue to Phase 1:** Run data exploration
2. **Start Phase 2:** Implement and train baseline model
3. **Phase 3-4:** Implement Transformer + Uncertainty

### 💡 Colab Tips:
- Save checkpoints frequently (to Drive)
- Use smaller batch sizes (2-4) for T4 GPU
- Consider Colab Pro for longer runtimes
- Disconnect when not training to save compute quota

## Quick Commands Reference

In [None]:
# Save this cell for quick re-setup in future sessions

# Mount Drive
from google.colab import drive
drive.mount('/content/drive')

# Set paths
PROJECT_ROOT = '/content/drive/MyDrive/endometriosis-uncertainty-seg'
%cd /content/endo-uncertainty-seg

# Link data and experiments
!ln -sf {PROJECT_ROOT}/data /content/endo-uncertainty-seg/data
!ln -sf {PROJECT_ROOT}/experiments /content/endo-uncertainty-seg/experiments

import sys
sys.path.append('/content/endo-uncertainty-seg')

print("✓ Quick setup complete!")
print(f"✓ GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")