# RTpipeline - Part 1: GPU Segmentation

**Radiotherapy DICOM Processing Pipeline - Colab Edition**

This notebook runs GPU-intensive segmentation tasks:
- DICOM organization and conversion
- TotalSegmentator auto-segmentation
- Custom nnUNet models (if configured)

---

## Prerequisites

1. **GPU Runtime**: Go to `Runtime > Change runtime type > GPU (T4)`
2. **Google Drive**: Your DICOM data should be in Google Drive
3. **Time**: ~15-30 min setup, ~5-10 min per patient for segmentation

---

## 1. Check GPU and Mount Google Drive

In [None]:
# Check GPU availability
!nvidia-smi

import torch
print(f"\nPyTorch CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("\nGoogle Drive mounted at /content/drive")

## 2. Configure Paths

Adjust these paths to match your Google Drive structure.

In [None]:
#@title Path Configuration { display-mode: "form" }
#@markdown ### Input/Output Directories (in Google Drive)

DICOM_INPUT = "/content/drive/MyDrive/RTpipeline/Input"  #@param {type:"string"}
OUTPUT_DIR = "/content/drive/MyDrive/RTpipeline/Output"  #@param {type:"string"}
LOGS_DIR = "/content/drive/MyDrive/RTpipeline/Logs"  #@param {type:"string"}

#@markdown ### Anatomical Region for CT Cropping
ANATOMICAL_REGION = "pelvis"  #@param ["pelvis", "thorax", "abdomen", "head_neck", "brain"]

#@markdown ### Processing Options
ENABLE_CT_CROPPING = True  #@param {type:"boolean"}
FAST_SEGMENTATION = False  #@param {type:"boolean"}

# Create directories
import os
for d in [DICOM_INPUT, OUTPUT_DIR, LOGS_DIR]:
    os.makedirs(d, exist_ok=True)
    print(f"Directory ready: {d}")

# Check for input data
if os.path.exists(DICOM_INPUT):
    contents = os.listdir(DICOM_INPUT)
    print(f"\nInput directory contains {len(contents)} items")
    if contents:
        print("First 10:", contents[:10])
else:
    print(f"\nWARNING: Input directory does not exist: {DICOM_INPUT}")
    print("Please upload your DICOM data to this location in Google Drive.")

## 3. Install Conda and RTpipeline

In [None]:
%%bash
# Install Miniconda
if [ ! -d "/content/miniconda" ]; then
    echo "Installing Miniconda..."
    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
    bash miniconda.sh -b -p /content/miniconda
    rm miniconda.sh
    echo "Miniconda installed."
else
    echo "Miniconda already installed."
fi

# Add to PATH
export PATH="/content/miniconda/bin:$PATH"

# Install mamba for faster environment creation
if ! command -v mamba &> /dev/null; then
    echo "Installing mamba..."
    conda install -y -c conda-forge mamba
fi

echo "Done. Conda version: $(conda --version)"

In [None]:
# Add conda to Python path
import os
os.environ['PATH'] = '/content/miniconda/bin:' + os.environ['PATH']

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Clone rtpipeline if not already present
if [ ! -d "/content/rtpipeline" ]; then
    echo "Cloning rtpipeline..."
    git clone https://github.com/kstawiski/rtpipeline.git /content/rtpipeline
else
    echo "Updating rtpipeline..."
    cd /content/rtpipeline && git pull
fi

echo "RTpipeline ready at /content/rtpipeline"

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Create rtpipeline environment (for segmentation)
if ! conda env list | grep -q "rtpipeline"; then
    echo "Creating rtpipeline environment (this takes ~10-15 minutes)..."
    mamba env create -f /content/rtpipeline/envs/rtpipeline.yaml
    echo "Environment created."
else
    echo "rtpipeline environment already exists."
fi

# Install rtpipeline package
echo "Installing rtpipeline package..."
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline
pip install -e /content/rtpipeline

echo "\nInstallation complete!"

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Install dcm2niix
if ! command -v dcm2niix &> /dev/null; then
    echo "Installing dcm2niix..."
    wget -q https://github.com/rordenlab/dcm2niix/releases/download/v1.0.20230411/dcm2niix_lnx.zip
    unzip -o dcm2niix_lnx.zip -d /content/miniconda/bin/
    chmod +x /content/miniconda/bin/dcm2niix
    rm dcm2niix_lnx.zip
    echo "dcm2niix installed."
else
    echo "dcm2niix already installed."
fi

dcm2niix -v

## 4. Create Configuration File

In [None]:
# Generate config.yaml
config_content = f'''# RTpipeline Colab Configuration
# Generated automatically - edit as needed

container_mode: false

# Directories
dicom_root: "{DICOM_INPUT}"
output_dir: "{OUTPUT_DIR}"
logs_dir: "{LOGS_DIR}"

# Processing
max_workers: 2  # Colab has limited resources

# Segmentation (GPU)
segmentation:
  max_workers: 1  # Sequential for GPU stability
  force: false
  fast: {str(FAST_SEGMENTATION).lower()}
  device: "gpu"
  force_split: true
  nr_threads_resample: 4
  nr_threads_save: 4
  num_proc_preprocessing: 1
  num_proc_export: 1

# Custom models (disabled by default in Colab)
custom_models:
  enabled: false
  root: "/content/rtpipeline/custom_models"

# Radiomics (will be run in Part 2)
radiomics:
  sequential: true
  params_file: "/content/rtpipeline/rtpipeline/radiomics_params.yaml"
  mr_params_file: "/content/rtpipeline/rtpipeline/radiomics_params_mr.yaml"
  skip_rois:
    - body
    - couchsurface
    - couchinterior
  max_voxels: 500000000
  min_voxels: 10

# Robustness analysis (disabled for Colab)
radiomics_robustness:
  enabled: false

# Environment names
environments:
  main: "rtpipeline"
  radiomics: "rtpipeline-radiomics"

# Custom structures (use pelvic by default)
custom_structures: "/content/rtpipeline/custom_structures_pelvic.yaml"

# CT Cropping
ct_cropping:
  enabled: {str(ENABLE_CT_CROPPING).lower()}
  region: "{ANATOMICAL_REGION}"
  superior_margin_cm: 2.0
  inferior_margin_cm: 10.0
  use_cropped_for_dvh: true
  use_cropped_for_radiomics: true
  keep_original: true
'''

config_path = '/content/rtpipeline/config.colab.yaml'
with open(config_path, 'w') as f:
    f.write(config_content)

print(f"Configuration saved to: {config_path}")
print("\n" + "="*50)
print(config_content)

## 5. Run GPU Tasks (Segmentation)

This will:
1. Organize DICOM files
2. Convert to NIfTI
3. Run TotalSegmentator on GPU
4. Create RTSTRUCT files

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline

cd /content/rtpipeline

# Run Snakemake up to segmentation (GPU tasks)
echo "Starting GPU pipeline (organize + segment)..."
echo "This may take 5-10 minutes per patient."
echo ""

snakemake \
    --cores 2 \
    --configfile config.colab.yaml \
    --until all_segmented \
    --rerun-incomplete \
    2>&1 | tee /content/drive/MyDrive/RTpipeline/Logs/part1_gpu.log

## 6. Verify Results

In [None]:
import os
from pathlib import Path

output_path = Path(OUTPUT_DIR)

if output_path.exists():
    # Count processed patients
    patients = [d for d in output_path.iterdir() if d.is_dir() and not d.name.startswith('_')]
    print(f"Processed patients: {len(patients)}")
    
    # Check for segmentation outputs
    seg_count = 0
    for patient in patients:
        for course in patient.iterdir():
            if course.is_dir():
                seg_dir = course / 'Segmentation_TotalSegmentator'
                if seg_dir.exists():
                    seg_count += 1
    
    print(f"Courses with segmentation: {seg_count}")
    
    # Sample structure
    if patients:
        print(f"\nSample patient structure ({patients[0].name}):")
        for item in sorted(patients[0].rglob('*'))[:20]:
            rel = item.relative_to(patients[0])
            prefix = '  ' * (len(rel.parts) - 1)
            print(f"{prefix}{item.name}")
else:
    print(f"Output directory not found: {OUTPUT_DIR}")

## 7. Next Steps

GPU tasks are complete! Now:

1. **Save your work**: Your results are already in Google Drive
2. **Run Part 2**: Open `rtpipeline_colab_part2_cpu.ipynb` to run:
   - DVH calculation
   - Radiomics extraction
   - Quality control
   - Results aggregation

Part 2 can run on CPU, so you don't need a GPU runtime.

---

**Tip**: You can disconnect this runtime now to free up GPU resources.

In [None]:
# Summary
print("="*60)
print("PART 1 COMPLETE - GPU Segmentation")
print("="*60)
print(f"\nInput:  {DICOM_INPUT}")
print(f"Output: {OUTPUT_DIR}")
print(f"Logs:   {LOGS_DIR}")
print(f"\nRegion: {ANATOMICAL_REGION}")
print(f"CT Cropping: {ENABLE_CT_CROPPING}")
print("\nNext: Run Part 2 notebook for DVH, radiomics, and aggregation.")