# RTpipeline - Part 1: Setup & GPU Segmentation

**Radiotherapy DICOM Processing Pipeline - Colab Edition**

This notebook sets up the complete pipeline configuration and runs GPU-intensive tasks:
- **Complete pipeline configuration** (shared with Part 2)
- DICOM organization and conversion
- TotalSegmentator auto-segmentation (GPU)
- Custom nnUNet models (if configured)

---

## Prerequisites

1. **GPU Runtime**: Go to `Runtime > Change runtime type > GPU (T4)`
2. **Google Drive**: Your DICOM data should be uploaded to Google Drive
3. **Time**: ~15-30 min setup, ~5-10 min per patient for segmentation

---

## 1. Check GPU and Mount Google Drive

In [None]:
# Check GPU availability
!nvidia-smi

import torch
print(f"\nPyTorch CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("WARNING: No GPU detected! Segmentation will be very slow.")
    print("Go to Runtime > Change runtime type > GPU")

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("\nGoogle Drive mounted at /content/drive/MyDrive/")

---
## 2. Pipeline Configuration

**Configure all pipeline settings here.** This configuration will be saved and used by Part 2.

### 2.1 Directory Paths

In [None]:
#@title ### Directory Configuration { display-mode: "form" }
#@markdown **Set your data directories in Google Drive:**

#@markdown ---
#@markdown #### Input Directory (where your DICOM files are)
DICOM_INPUT = "/content/drive/MyDrive/RTpipeline/Input"  #@param {type:"string"}

#@markdown #### Output Directory (pipeline results will be saved here)
OUTPUT_DIR = "/content/drive/MyDrive/RTpipeline/Output"  #@param {type:"string"}

#@markdown #### Logs Directory
LOGS_DIR = "/content/drive/MyDrive/RTpipeline/Logs"  #@param {type:"string"}

#@markdown ---

import os
from pathlib import Path

# Create directories
for d in [DICOM_INPUT, OUTPUT_DIR, LOGS_DIR]:
    os.makedirs(d, exist_ok=True)

# Verify input directory has data
input_path = Path(DICOM_INPUT)
if input_path.exists():
    contents = list(input_path.iterdir())
    print(f"Input directory: {DICOM_INPUT}")
    print(f"  Contains {len(contents)} items")
    if contents:
        print(f"  First items: {[c.name for c in contents[:5]]}")
    else:
        print("\n⚠️  WARNING: Input directory is EMPTY!")
        print("   Please upload your DICOM data to this location.")
else:
    print(f"\n❌ ERROR: Input directory does not exist: {DICOM_INPUT}")

print(f"\nOutput directory: {OUTPUT_DIR}")
print(f"Logs directory: {LOGS_DIR}")

### 2.2 Processing Options

In [None]:
#@title ### Processing Configuration { display-mode: "form" }

#@markdown ---
#@markdown #### Anatomical Region
#@markdown Select the body region being analyzed (affects CT cropping landmarks):
ANATOMICAL_REGION = "pelvis"  #@param ["pelvis", "thorax", "abdomen", "head_neck", "brain"]

#@markdown ---
#@markdown #### CT Cropping
#@markdown Crop CTs to consistent anatomical boundaries for meaningful cross-patient DVH comparison:
ENABLE_CT_CROPPING = True  #@param {type:"boolean"}

#@markdown Superior margin above landmark (cm):
SUPERIOR_MARGIN_CM = 2.0  #@param {type:"number"}

#@markdown Inferior margin below landmark (cm):
INFERIOR_MARGIN_CM = 10.0  #@param {type:"number"}

#@markdown ---
#@markdown #### Segmentation Options
#@markdown Fast mode (lower quality but faster):
FAST_SEGMENTATION = False  #@param {type:"boolean"}

#@markdown ---
#@markdown #### Radiomics Options
#@markdown Enable radiomics robustness analysis (perturbation-based stability):
ENABLE_ROBUSTNESS = False  #@param {type:"boolean"}

#@markdown ROIs to skip in radiomics (comma-separated):
SKIP_ROIS = "body,couchsurface,couchinterior,couchexterior,bones"  #@param {type:"string"}

print("Configuration Summary:")
print(f"  Region: {ANATOMICAL_REGION}")
print(f"  CT Cropping: {ENABLE_CT_CROPPING}")
if ENABLE_CT_CROPPING:
    print(f"    Superior margin: {SUPERIOR_MARGIN_CM} cm")
    print(f"    Inferior margin: {INFERIOR_MARGIN_CM} cm")
print(f"  Fast segmentation: {FAST_SEGMENTATION}")
print(f"  Radiomics robustness: {ENABLE_ROBUSTNESS}")
print(f"  Skip ROIs: {SKIP_ROIS}")

### 2.3 Advanced Options (Optional)

In [None]:
#@title ### Advanced Configuration { display-mode: "form" }

#@markdown ---
#@markdown #### Custom Structures
#@markdown Enable custom structure definitions (boolean combinations, margins):
ENABLE_CUSTOM_STRUCTURES = True  #@param {type:"boolean"}

#@markdown Custom structures preset:
CUSTOM_STRUCTURES_PRESET = "pelvic"  #@param ["pelvic", "thorax", "head_neck", "none"]

#@markdown ---
#@markdown #### Performance Tuning
#@markdown Max parallel workers (2 recommended for Colab):
MAX_WORKERS = 2  #@param {type:"integer"}

#@markdown ---
#@markdown #### Robustness Analysis Settings (if enabled)
#@markdown Structures to analyze (comma-separated, supports wildcards):
ROBUSTNESS_STRUCTURES = "GTV*,CTV*,PTV*,urinary_bladder,rectum,prostate"  #@param {type:"string"}

#@markdown Perturbation intensity:
ROBUSTNESS_INTENSITY = "standard"  #@param ["mild", "standard", "aggressive"]

print("Advanced Configuration:")
print(f"  Custom structures: {ENABLE_CUSTOM_STRUCTURES} ({CUSTOM_STRUCTURES_PRESET})")
print(f"  Max workers: {MAX_WORKERS}")
if ENABLE_ROBUSTNESS:
    print(f"  Robustness structures: {ROBUSTNESS_STRUCTURES}")
    print(f"  Robustness intensity: {ROBUSTNESS_INTENSITY}")

---
## 3. Install RTpipeline

In [None]:
%%bash
# Install Miniconda
if [ ! -d "/content/miniconda" ]; then
    echo "Installing Miniconda..."
    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
    bash miniconda.sh -b -p /content/miniconda
    rm miniconda.sh
    echo "Miniconda installed."
else
    echo "Miniconda already installed."
fi

export PATH="/content/miniconda/bin:$PATH"

# Install mamba for faster environment creation
if ! command -v mamba &> /dev/null; then
    echo "Installing mamba..."
    conda install -y -c conda-forge mamba
fi

echo "Done. Conda version: $(conda --version)"

In [None]:
# Add conda to Python path
import os
os.environ['PATH'] = '/content/miniconda/bin:' + os.environ['PATH']

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Clone rtpipeline
if [ ! -d "/content/rtpipeline" ]; then
    echo "Cloning rtpipeline..."
    git clone https://github.com/kstawiski/rtpipeline.git /content/rtpipeline
else
    echo "Updating rtpipeline..."
    cd /content/rtpipeline && git pull
fi

echo "RTpipeline ready."

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Create rtpipeline environment (for segmentation)
if ! conda env list | grep -q "^rtpipeline "; then
    echo "Creating rtpipeline environment (this takes ~10-15 minutes)..."
    mamba env create -f /content/rtpipeline/envs/rtpipeline.yaml
    echo "Environment created."
else
    echo "rtpipeline environment already exists."
fi

# Install rtpipeline package
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline
pip install -e /content/rtpipeline 2>/dev/null

echo "\nInstallation complete!"

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Install dcm2niix
if ! command -v dcm2niix &> /dev/null; then
    echo "Installing dcm2niix..."
    wget -q https://github.com/rordenlab/dcm2niix/releases/download/v1.0.20230411/dcm2niix_lnx.zip
    unzip -o dcm2niix_lnx.zip -d /content/miniconda/bin/
    chmod +x /content/miniconda/bin/dcm2niix
    rm dcm2niix_lnx.zip
    echo "dcm2niix installed."
else
    echo "dcm2niix already installed."
fi

dcm2niix -v

---
## 4. Generate and Save Configuration

This creates the configuration file that will be used by both Part 1 and Part 2.

In [None]:
import yaml
import os

# Parse skip ROIs
skip_rois_list = [r.strip() for r in SKIP_ROIS.split(',') if r.strip()]

# Parse robustness structures
robustness_structures_list = [r.strip() for r in ROBUSTNESS_STRUCTURES.split(',') if r.strip()]

# Determine custom structures file
if ENABLE_CUSTOM_STRUCTURES and CUSTOM_STRUCTURES_PRESET != "none":
    custom_structures_file = f"/content/rtpipeline/custom_structures_{CUSTOM_STRUCTURES_PRESET}.yaml"
else:
    custom_structures_file = None

# Build configuration dictionary
config = {
    'container_mode': False,
    
    # Directories
    'dicom_root': DICOM_INPUT,
    'output_dir': OUTPUT_DIR,
    'logs_dir': LOGS_DIR,
    
    # Processing
    'max_workers': MAX_WORKERS,
    
    # Segmentation
    'segmentation': {
        'max_workers': 1,  # Sequential for GPU stability
        'force': False,
        'fast': FAST_SEGMENTATION,
        'device': 'gpu',
        'force_split': True,
        'nr_threads_resample': 4,
        'nr_threads_save': 4,
        'num_proc_preprocessing': 1,
        'num_proc_export': 1,
    },
    
    # Custom models (disabled for Colab)
    'custom_models': {
        'enabled': False,
        'root': '/content/rtpipeline/custom_models',
    },
    
    # Radiomics
    'radiomics': {
        'sequential': True,
        'params_file': '/content/rtpipeline/rtpipeline/radiomics_params.yaml',
        'mr_params_file': '/content/rtpipeline/rtpipeline/radiomics_params_mr.yaml',
        'skip_rois': skip_rois_list,
        'max_voxels': 500000000,
        'min_voxels': 10,
    },
    
    # Radiomics robustness
    'radiomics_robustness': {
        'enabled': ENABLE_ROBUSTNESS,
        'modes': ['segmentation_perturbation'],
        'segmentation_perturbation': {
            'apply_to_structures': robustness_structures_list,
            'small_volume_changes': [-0.15, 0.0, 0.15],
            'large_volume_changes': [-0.30, 0.0, 0.30],
            'max_translation_mm': 0.0,
            'n_random_contour_realizations': 0,
            'noise_levels': [0.0],
            'intensity': ROBUSTNESS_INTENSITY,
        },
        'metrics': {
            'icc': {'implementation': 'pingouin', 'icc_type': 'ICC3', 'ci': True},
            'cov': {'enabled': True},
            'qcd': {'enabled': True},
        },
        'thresholds': {
            'icc': {'robust': 0.90, 'acceptable': 0.75},
            'cov': {'robust_pct': 10.0, 'acceptable_pct': 20.0},
        },
    },
    
    # Environments
    'environments': {
        'main': 'rtpipeline',
        'radiomics': 'rtpipeline-radiomics',
    },
    
    # Custom structures
    'custom_structures': custom_structures_file,
    
    # CT Cropping
    'ct_cropping': {
        'enabled': ENABLE_CT_CROPPING,
        'region': ANATOMICAL_REGION,
        'superior_margin_cm': SUPERIOR_MARGIN_CM,
        'inferior_margin_cm': INFERIOR_MARGIN_CM,
        'use_cropped_for_dvh': True,
        'use_cropped_for_radiomics': True,
        'keep_original': True,
    },
}

# Save configuration to Google Drive (for Part 2 to use)
config_dir = os.path.dirname(OUTPUT_DIR)
config_path_gdrive = os.path.join(config_dir, 'rtpipeline_config.yaml')
config_path_local = '/content/rtpipeline/config.colab.yaml'

# Save to both locations
for path in [config_path_gdrive, config_path_local]:
    with open(path, 'w') as f:
        yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print("Configuration saved to:")
print(f"  - {config_path_gdrive} (persistent, for Part 2)")
print(f"  - {config_path_local} (local, for this session)")
print("\n" + "="*60)
print("CONFIGURATION SUMMARY")
print("="*60)
print(f"Input:  {DICOM_INPUT}")
print(f"Output: {OUTPUT_DIR}")
print(f"Region: {ANATOMICAL_REGION}")
print(f"CT Cropping: {ENABLE_CT_CROPPING}")
print(f"Robustness: {ENABLE_ROBUSTNESS}")
print("="*60)

---
## 5. Run GPU Tasks (Segmentation)

This will:
1. Organize DICOM files by patient/course
2. Convert CT to NIfTI format
3. Run TotalSegmentator on GPU
4. Generate RTSTRUCT files with auto-segmentations

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline

cd /content/rtpipeline

echo "================================================"
echo "Starting GPU Pipeline (Organize + Segment)"
echo "================================================"
echo "This may take 5-10 minutes per patient."
echo ""

snakemake \
    --cores 2 \
    --configfile config.colab.yaml \
    --until all_segmented \
    --rerun-incomplete \
    2>&1 | tee -a "$(cat config.colab.yaml | grep logs_dir | cut -d':' -f2 | tr -d ' ')/part1_gpu.log"

---
## 6. Verify Results

In [None]:
from pathlib import Path
import os

output_path = Path(OUTPUT_DIR)

print("="*60)
print("PART 1 RESULTS")
print("="*60)

if output_path.exists():
    # Count patients and courses
    patients = [d for d in output_path.iterdir() if d.is_dir() and not d.name.startswith('_')]
    
    seg_count = 0
    course_count = 0
    for patient in patients:
        for course in patient.iterdir():
            if course.is_dir():
                course_count += 1
                seg_dir = course / 'Segmentation_TotalSegmentator'
                if seg_dir.exists() and list(seg_dir.glob('*.nii.gz')):
                    seg_count += 1
    
    print(f"\nPatients found: {len(patients)}")
    print(f"Treatment courses: {course_count}")
    print(f"Courses with segmentation: {seg_count}")
    
    if seg_count > 0:
        print("\n✅ Segmentation completed successfully!")
        print("\nYou can now proceed to Part 2 for analysis.")
    else:
        print("\n⚠️  No segmentations found. Check the logs for errors.")
    
    # Show sample structure
    if patients:
        print(f"\n--- Sample patient structure ({patients[0].name}) ---")
        for item in sorted(list(patients[0].rglob('*'))[:15]):
            rel = item.relative_to(patients[0])
            prefix = '  ' * (len(rel.parts) - 1)
            print(f"{prefix}{item.name}")
else:
    print(f"\n❌ Output directory not found: {OUTPUT_DIR}")

---
## 7. Next Steps

GPU tasks (segmentation) are complete!

### Continue with Part 2:

1. **Open** `rtpipeline_colab_part2_cpu.ipynb`
2. **Run Part 2** to perform:
   - DVH calculation
   - Radiomics extraction
   - Quality control
   - Results aggregation

Part 2 will automatically load your configuration from Google Drive.

---

**Tip**: You can disconnect this runtime now to free up GPU resources. Part 2 only needs CPU.

In [None]:
# Final summary
print("="*60)
print("PART 1 COMPLETE")
print("="*60)
print(f"\nConfiguration saved at:")
print(f"  {config_path_gdrive}")
print(f"\nPart 2 will automatically load this configuration.")
print(f"\nOutput location: {OUTPUT_DIR}")
print(f"Logs location: {LOGS_DIR}")