# RTpipeline on Google Colab - Part 1: GPU Segmentation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kstawiski/rtpipeline/blob/main/rtpipeline_colab_part1_gpu.ipynb)

**üí∞ Cost Optimization:** This notebook is split into two parts to optimize GPU costs:
- **Part 1 (this notebook):** Runs TotalSegmentator with GPU (~10-30 min/patient)
- **Part 2:** Runs DVH, radiomics, and analysis on CPU only (saves GPU costs)

## What This Part Does

‚úÖ **Automatic segmentation** of 100+ organs using TotalSegmentator (GPU-accelerated)
‚úÖ **Saves outputs** to Google Drive for Part 2

## Prerequisites

- Google Colab with **GPU runtime** (Runtime ‚Üí Change runtime type ‚Üí GPU)
- DICOM files in Google Drive
- Google Drive mounted for saving outputs

---

**‚ö° Quick Start:** 
1. Run cells 1-3 (setup)
2. Mount Google Drive (cell 4)
3. **UPDATE CONFIGURATION** (cell 5) - Point to your DICOM folder
4. Run remaining cells

## 1Ô∏è‚É£ Setup: Install Miniconda & System Dependencies

This takes ~2 minutes

In [None]:
%%bash
# Check GPU availability
echo "=== GPU Check ==="
nvidia-smi || echo "‚ö†Ô∏è No GPU detected. Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU"

# Install system dependencies
echo -e "\n=== Installing System Dependencies ==="
apt-get update -qq
apt-get install -y -qq dcm2niix pigz > /dev/null

# Install Miniconda if not already installed
if [ ! -d "/content/miniconda" ]; then
    echo -e "\n=== Installing Miniconda ==="
    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh
    bash /tmp/miniconda.sh -b -p /content/miniconda
    rm /tmp/miniconda.sh
    echo "‚úÖ Miniconda installed"
else
    echo "‚úÖ Miniconda already installed"
fi

# Initialize conda
export PATH="/content/miniconda/bin:$PATH"
eval "$(/content/miniconda/bin/conda shell.bash hook)"
conda init bash

echo -e "\n‚úÖ Setup complete!"

## 2Ô∏è‚É£ Clone RTpipeline Repository

In [None]:
%%bash
if [ ! -d "/content/rtpipeline" ]; then
    echo "Cloning rtpipeline repository..."
    git clone -q https://github.com/kstawiski/rtpipeline.git /content/rtpipeline
    echo "‚úÖ Repository cloned"
else
    echo "‚úÖ Repository already exists"
    cd /content/rtpipeline
    git pull origin main
    echo "Repository updated"
fi

## 3Ô∏è‚É£ Create Conda Environment

This creates the rtpipeline environment for TotalSegmentator (~5-10 minutes, only once per session)

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
eval "$(/content/miniconda/bin/conda shell.bash hook)"

# Accept Anaconda Terms of Service
echo "=== Accepting Anaconda Terms of Service ==="
conda config --set channel_priority flexible
if ! conda tos accept --channel defaults 2>&1; then
    echo "‚ö†Ô∏è ToS acceptance failed or already accepted"
fi
echo "‚úÖ ToS accepted"

cd /content/rtpipeline

# Create rtpipeline environment
if conda env list | grep -q "^rtpipeline "; then
    echo "‚úÖ Environment 'rtpipeline' already exists"
else
    echo "Creating 'rtpipeline' environment (TotalSegmentator)..."
    conda env create -f envs/rtpipeline.yaml -q
    echo "‚úÖ Environment created"
fi

echo ""
conda run -n rtpipeline python -c "import numpy; print(f'‚úÖ numpy {numpy.__version__}')"

## 4Ô∏è‚É£ Mount Google Drive

**IMPORTANT:** Your DICOM files must be in Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

print("\n‚úÖ Google Drive mounted at /content/drive/MyDrive/")

---

# ‚öôÔ∏è CONFIGURATION - UPDATE THIS!

## 5Ô∏è‚É£ Configure Input/Output Paths & Processing Options

**üî¥ REQUIRED:** Update `DICOM_ROOT` to point to your DICOM files in Google Drive

---

In [None]:
import os
import subprocess

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# üî¥ REQUIRED - Point to your DICOM folder in Google Drive
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

DICOM_ROOT = "/content/drive/MyDrive/my_dicom_folder"

# Examples:
# DICOM_ROOT = "/content/drive/MyDrive/RT_Data/DICOM"
# DICOM_ROOT = "/content/drive/MyDrive/Patient_Data"

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# Output location in Google Drive (for Part 2)
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

DRIVE_OUTPUT_DIR = "/content/drive/MyDrive/rtpipeline_part1_output"

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# Parallelism & Performance Settings
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

# Course-level parallelism (how many courses to process simultaneously)
# Google Colab: Keep at 1-2 to avoid memory issues with GPU
# For multiple small patients, can try 2-3
WORKERS = 1  # Recommended: 1-2 for Colab GPU

# GPU worker allocation (keep at 1 for single GPU Colab)
SEG_WORKERS = 1  # DO NOT CHANGE (only 1 GPU available in Colab)

# TotalSegmentator internal threading
# These control parallelism WITHIN each segmentation task
# Higher values = faster but more memory usage
TOTALSEG_NR_THR_RESAMP = 1      # Resampling threads (1-2 recommended)
TOTALSEG_NR_THR_SAVING = 6      # I/O threads for saving (4-8 recommended)
TOTALSEG_NUM_PROC_PRE = 6       # Preprocessing processes (4-8 recommended)
TOTALSEG_NUM_PROC_EXPORT = 6    # Export processes (4-8 recommended)

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# Segmentation Options
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

# Fast mode: 3x faster but slightly lower quality
FAST_MODE = False  # Set to True for faster segmentation

# ROI subset: segment only specific organs (leave None for all)
# Examples: "liver kidney spleen", "lung_left lung_right"
# See TotalSegmentator docs for available ROI names
ROI_SUBSET = None  # None = segment all organs

# Extra TotalSegmentator models (body composition, cardiac, etc.)
# Available: "body", "lung_vessels", "cerebral_bleed", "hip_implant", "coronary_arteries"
EXTRA_MODELS = []  # Example: ["body", "lung_vessels"]

# Force re-segmentation even if outputs exist
FORCE_SEGMENTATION = False

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# Custom nnUNet Models (Advanced)
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

ENABLE_CUSTOM_MODELS = False  # Enable custom nnUNet models
CUSTOM_MODELS_ROOT = "/content/drive/MyDrive/custom_models"  # Path to models
CUSTOM_MODELS_SELECTED = []  # Example: ["prostate_model", "brain_tumor"]

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# Custom Structures Configuration
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

# Custom structures YAML file (for combining/renaming ROIs)
CUSTOM_STRUCTURES_FILE = "custom_structures_pelvic.yaml"  # Default for pelvic cases
# Options: "custom_structures_pelvic.yaml", "custom_structures_thorax.yaml", or custom path

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# Validation & Setup
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

OUTPUT_DIR = "/content/output"
LOGS_DIR = "/content/logs"
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(LOGS_DIR, exist_ok=True)

# Check GPU
try:
    result = subprocess.run(['nvidia-smi'], check=True, capture_output=True, text=True)
    gpu_available = True
    # Extract GPU name
    gpu_info = "GPU detected"
    for line in result.stdout.split('\n'):
        if 'Tesla' in line or 'V100' in line or 'T4' in line or 'A100' in line:
            gpu_info = line.strip().split()[2:4]
            gpu_info = ' '.join(gpu_info)
            break
    print(f"‚úÖ GPU: {gpu_info}")
except:
    gpu_available = False
    print("‚ö†Ô∏è No GPU detected!")
    print("   Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU")

# Check DICOM directory
if not os.path.exists(DICOM_ROOT):
    print(f"\nüî¥ ERROR: DICOM directory not found!")
    print(f"   Path: {DICOM_ROOT}")
    print(f"\n   Please update DICOM_ROOT in the cell above.")
else:
    dicom_count = sum(1 for root, dirs, files in os.walk(DICOM_ROOT) 
                      for f in files if f.lower().endswith('.dcm'))
    print(f"\n‚úÖ DICOM directory: {DICOM_ROOT}")
    print(f"   {dicom_count} DICOM files")

print(f"\nüìã Configuration Summary:")
print(f"   ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ")
print(f"   GPU: {'‚úÖ' if gpu_available else '‚ùå'}")
print(f"   Course parallelism: {WORKERS} concurrent course(s)")
print(f"   TotalSegmentator threads:")
print(f"     ‚Ä¢ Resampling: {TOTALSEG_NR_THR_RESAMP}")
print(f"     ‚Ä¢ Saving I/O: {TOTALSEG_NR_THR_SAVING}")
print(f"     ‚Ä¢ Preprocessing: {TOTALSEG_NUM_PROC_PRE}")
print(f"     ‚Ä¢ Export: {TOTALSEG_NUM_PROC_EXPORT}")
print(f"   Fast mode: {'‚úÖ Enabled' if FAST_MODE else '‚ùå Disabled'}")
if ROI_SUBSET:
    print(f"   ROI subset: {ROI_SUBSET}")
if EXTRA_MODELS:
    print(f"   Extra models: {', '.join(EXTRA_MODELS)}")
print(f"   ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ")
print(f"\nüì¶ Output: {DRIVE_OUTPUT_DIR}_YYYYMMDD_HHMMSS")

# Performance recommendations
if WORKERS > 2 and gpu_available:
    print(f"\n‚ö†Ô∏è PERFORMANCE NOTE: WORKERS={WORKERS} may cause memory issues in Colab")
    print(f"   Recommended: WORKERS=1-2 for optimal GPU memory usage")

---

## 6Ô∏è‚É£ Generate Configuration File

In [None]:
# Build configuration
config_yaml = f"""# RTpipeline Configuration - Part 1 (GPU Segmentation)
dicom_root: "{DICOM_ROOT}"
output_dir: "{OUTPUT_DIR}"
logs_dir: "{LOGS_DIR}"
workers: {WORKERS}

segmentation:
  workers: {SEG_WORKERS}
  threads_per_worker: null
  force: {str(FORCE_SEGMENTATION).lower()}
  fast: {str(FAST_MODE).lower()}
  roi_subset: {f'"{ROI_SUBSET}"' if ROI_SUBSET else 'null'}
  extra_models: {EXTRA_MODELS if EXTRA_MODELS else '[]'}
  device: "{'gpu' if gpu_available else 'cpu'}"
  force_split: true
  nr_threads_resample: {TOTALSEG_NR_THR_RESAMP}
  nr_threads_save: {TOTALSEG_NR_THR_SAVING}
  num_proc_preprocessing: {TOTALSEG_NUM_PROC_PRE}
  num_proc_export: {TOTALSEG_NUM_PROC_EXPORT}

custom_models:
  enabled: {str(ENABLE_CUSTOM_MODELS).lower()}
  root: "{CUSTOM_MODELS_ROOT}"
  models: {CUSTOM_MODELS_SELECTED if CUSTOM_MODELS_SELECTED else '[]'}
  workers: 1
  force: false

custom_structures: "{CUSTOM_STRUCTURES_FILE}"
"""

config_path = "/content/config_part1.yaml"
with open(config_path, 'w') as f:
    f.write(config_yaml)

print(f"‚úÖ Configuration written to: {config_path}")
print(f"\nYou can review the configuration:")
print(f"   !cat {config_path}")

## 7Ô∏è‚É£ Run Segmentation Pipeline

This runs **ONLY** TotalSegmentator segmentation (GPU-accelerated)

‚è±Ô∏è **Estimated Time:**
- With GPU (T4): 10-20 minutes per patient
- With GPU (V100/A100): 5-15 minutes per patient
- Fast mode: ~3x faster
- ROI subset: Proportionally faster

In [None]:
import os
import subprocess
import glob
import time

os.environ['PATH'] = f"/content/miniconda/bin:{os.environ.get('PATH', '')}"
os.chdir('/content/rtpipeline')

print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("   RTpipeline Part 1: GPU Segmentation")
print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("\n‚ö° Processing Mode:")
print(f"   ‚Ä¢ GPU-accelerated segmentation")
print(f"   ‚Ä¢ {WORKERS} concurrent course(s)")
print(f"   ‚Ä¢ Fast mode: {'ON' if FAST_MODE else 'OFF'}")
print(f"\nDVH and radiomics will run in Part 2 (CPU)\n")

start_time = time.time()

# Install Snakemake if needed
try:
    subprocess.run(["conda", "run", "-n", "base", "snakemake", "--version"],
                   check=True, capture_output=True)
except subprocess.CalledProcessError:
    print("Installing Snakemake...")
    subprocess.run(["conda", "install", "-n", "base", "-c", "conda-forge", 
                    "-c", "bioconda", "snakemake", "-y", "-q"], check=True)
    print("‚úÖ Snakemake installed\n")

# Step 1: Organize courses
print("[1/2] Organizing DICOM data...")
cmd_organize = [
    "conda", "run", "-n", "base", "snakemake",
    "--configfile", "/content/config_part1.yaml",
    "--use-conda", "--cores", str(WORKERS),
    "--printshellcmds",
    "/content/output/_COURSES/manifest.json"
]

result = subprocess.run(cmd_organize, capture_output=False, text=True)

if result.returncode != 0:
    print("\n‚ö†Ô∏è Organization failed!")
else:
    org_time = time.time()
    print(f"\n‚úÖ Organization complete ({org_time - start_time:.1f}s)\n")
    
    # Step 2: Run segmentation
    print("[2/2] Running TotalSegmentator...")
    
    # Find all courses
    seg_targets = []
    custom_targets = []
    
    for patient_dir in glob.glob(f"{OUTPUT_DIR}/*/"):
        patient_name = os.path.basename(patient_dir.rstrip('/'))
        if patient_name.startswith('_') or patient_name.startswith('.'):
            continue
        for course_dir in glob.glob(f"{patient_dir}/*/"):
            course_name = os.path.basename(course_dir.rstrip('/'))
            if not course_name.startswith('_'):
                seg_targets.append(f"{OUTPUT_DIR}/{patient_name}/{course_name}/.segmentation_done")
                custom_targets.append(f"{OUTPUT_DIR}/{patient_name}/{course_name}/.custom_models_done")
    
    if seg_targets:
        print(f"Found {len(seg_targets)} course(s) to segment")
        print(f"Estimated time: {len(seg_targets) * (5 if FAST_MODE else 15) / WORKERS:.0f}-{len(seg_targets) * (15 if FAST_MODE else 25) / WORKERS:.0f} minutes\n")
        
        # Run segmentation with resource limits for Colab
        cmd_seg = [
            "conda", "run", "-n", "base", "snakemake",
            "--configfile", "/content/config_part1.yaml",
            "--use-conda",
            "--cores", str(WORKERS),
            "--resources", f"seg_workers={SEG_WORKERS}",
            "--printshellcmds",
            "--keep-going"
        ] + seg_targets + custom_targets
        
        result = subprocess.run(cmd_seg, capture_output=False, text=True)
        
        seg_time = time.time()
        if result.returncode == 0:
            print(f"\n‚úÖ All segmentations complete! ({seg_time - org_time:.1f}s)")
        else:
            print(f"\n‚ö†Ô∏è Some segmentations failed. Check logs.")
    else:
        print("\n‚ö†Ô∏è No courses found")

total_time = time.time() - start_time
print("\n" + "="*50)
print("Part 1 Complete!")
print("="*50)
print(f"Total time: {total_time/60:.1f} minutes")
print(f"\nOutputs: {OUTPUT_DIR}")
print("\nNext: Run the cell below to save to Google Drive")

## 8Ô∏è‚É£ Save Outputs to Google Drive

**IMPORTANT:** This saves your segmentation results to Google Drive for Part 2

In [None]:
import shutil
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
drive_output = f"{DRIVE_OUTPUT_DIR}_{timestamp}"

print(f"Copying outputs to Google Drive...")
print(f"Destination: {drive_output}\n")

try:
    shutil.copytree(OUTPUT_DIR, drive_output)
    shutil.copy("/content/config_part1.yaml", f"{drive_output}/config_part1.yaml")
    
    # Create README for Part 2
    readme = f"""RTpipeline Part 1 Outputs
========================
Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

To continue with Part 2:
1. Open rtpipeline_colab_part2_cpu.ipynb
2. In cell 5 (Configuration), set:
   PART1_OUTPUT_DIR = "{drive_output}"
3. Run all cells (on CPU runtime - no GPU needed!)

Configuration used:
- DICOM source: {DICOM_ROOT}
- Workers: {WORKERS}
- Fast mode: {FAST_MODE}
- GPU: {gpu_available}
"""
    with open(f"{drive_output}/README_PART2.txt", 'w') as f:
        f.write(readme)
    
    print("\n" + "="*60)
    print("üéâ PART 1 COMPLETE - OUTPUTS SAVED!")
    print("="*60)
    print(f"\nSaved to: {drive_output}")
    print("\nüìã Next Steps:")
    print("   1. You can disconnect this GPU runtime now")
    print("   2. Open rtpipeline_colab_part2_cpu.ipynb")
    print(f"   3. Set PART1_OUTPUT_DIR = '{drive_output}'")
    print("   4. Run Part 2 on CPU runtime (saves GPU costs!)")
    
except Exception as e:
    print(f"\n‚ö†Ô∏è Error: {e}")
    print("\nPlease check:")
    print("  - Google Drive is mounted")
    print("  - You have enough space")
    print("  - Path is valid")

## üìä Optional: View Summary

In [None]:
import os
import glob

print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("   Segmentation Summary")
print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê\n")

total = 0
completed = 0

for patient_dir in sorted(glob.glob(f"{OUTPUT_DIR}/*/")):
    patient_name = os.path.basename(patient_dir.rstrip('/'))
    if patient_name.startswith('_') or patient_name.startswith('.'):
        continue
    
    for course_dir in sorted(glob.glob(f"{patient_dir}/*/")):
        course_name = os.path.basename(course_dir.rstrip('/'))
        if course_name.startswith('_'):
            continue
        
        total += 1
        seg_done = os.path.exists(f"{course_dir}/.segmentation_done")
        
        if seg_done:
            completed += 1
        
        status = "‚úÖ" if seg_done else "‚ö†Ô∏è"
        print(f"{status} {patient_name}/{course_name}")

print(f"\nTotal: {completed}/{total} completed")

if completed == total and total > 0:
    print("\nüéâ All segmentations successful!")
elif completed > 0:
    print(f"\n‚ö†Ô∏è {total - completed} incomplete")
else:
    print("\n‚ö†Ô∏è No segmentations completed")

---

## What's Next?

**Continue with Part 2 (CPU):** `rtpipeline_colab_part2_cpu.ipynb`

Part 2 will:
- Extract DVH metrics
- Compute radiomic features
- Run robustness testing (optional)
- Generate visualizations
- Create downloadable results

**üí∞ Cost Savings:** Part 2 runs on CPU only!

---

**Notebook Version:** 2.0 (Part 1 - GPU Segmentation)  
**Repository:** https://github.com/kstawiski/rtpipeline