# RTpipeline on Google Colab - Part 1: GPU Segmentation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kstawiski/rtpipeline/blob/main/rtpipeline_colab_part1_gpu.ipynb)

**üí∞ Cost Optimization:** This notebook is split into two parts to optimize GPU costs:
- **Part 1 (this notebook):** Runs TotalSegmentator with GPU (~10-30 min/patient)
- **Part 2:** Runs DVH, radiomics, and analysis on CPU only (saves GPU costs)

## What This Part Does

‚úÖ **Automatic segmentation** of 100+ organs using TotalSegmentator (GPU-accelerated)
‚úÖ **Saves outputs** to Google Drive for Part 2

## Prerequisites

- Google Colab with **GPU runtime** (Runtime ‚Üí Change runtime type ‚Üí GPU)
- DICOM files in Google Drive
- Google Drive mounted for saving outputs

---

**‚ö° Quick Start:** 
1. Run cells 1-3 (setup)
2. Mount Google Drive (cell 4)
3. **UPDATE CONFIGURATION** (cell 5) - Point to your DICOM folder
4. Run remaining cells

## 1Ô∏è‚É£ Setup: Install Miniconda & System Dependencies

This takes ~2 minutes

In [1]:
%%bash
# Check GPU availability
echo "=== GPU Check ==="
nvidia-smi || echo "‚ö†Ô∏è No GPU detected. Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU"

# Install system dependencies
echo -e "\n=== Installing System Dependencies ==="
apt-get update -qq
apt-get install -y -qq dcm2niix pigz > /dev/null

echo -e "\n=== Installing Python dependencies (pydicom, SimpleITK, etc.) ==="
python3 -m pip install -q "pydicom>=3.0.0" "SimpleITK>=2.3.0" "dicompyler-core>=0.5.6" "rt-utils>=1.4.0" "nibabel>=5.1.0" "xlsxwriter" "openpyxl"
echo "‚úÖ Core Python deps installed"

# Install Miniconda if not already installed
if [ ! -d "/content/miniconda" ]; then
    echo -e "\n=== Installing Miniconda ==="
    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh
    bash /tmp/miniconda.sh -b -p /content/miniconda
    rm /tmp/miniconda.sh
    echo "‚úÖ Miniconda installed"
else
    echo "‚úÖ Miniconda already installed"
fi

# Initialize conda
export PATH="/content/miniconda/bin:$PATH"
eval "$(/content/miniconda/bin/conda shell.bash hook)"
conda init bash


echo -e "\n=== Installing Snakemake (base env) ==="
conda install -n base -c conda-forge -c bioconda -y -q snakemake
echo -e "\n‚úÖ Setup complete!"

=== GPU Check ===
‚ö†Ô∏è No GPU detected. Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU

=== Installing System Dependencies ===

=== Installing Python dependencies (pydicom, SimpleITK, etc.) ===
‚úÖ Core Python deps installed

=== Installing Miniconda ===
PREFIX=/content/miniconda
Unpacking bootstrapper...
Unpacking payload...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Miniconda3: /content/miniconda
‚úÖ Miniconda installed
no change     /content/miniconda/condabin/conda
no change     /content/miniconda/bin/conda
no change     /content/miniconda/bin/conda-env
no change     /con

bash: line 3: nvidia-smi: command not found
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
ERROR: Could not find a version that satisfies the requirement rt-utils>=1.4.0 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.5, 1.2.6, 1.2.7)
ERROR: No matching distribution found for rt-utils>=1.4.0

CondaToSNonInteractiveError: Terms of Service have not been accepted for the following channels. Please accept or remove them before proceeding:
    - https://repo.anaconda.com/pkgs/main
    - https://repo.anaconda.com/pkgs/r

To accept these channels' Terms of Service, run the following commands:
    conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
    conda tos accept --o

## 2Ô∏è‚É£ Clone RTpipeline Repository

In [2]:
%%bash
if [ ! -d "/content/rtpipeline" ]; then
    echo "Cloning rtpipeline repository..."
    git clone -q https://github.com/kstawiski/rtpipeline.git /content/rtpipeline
    echo "‚úÖ Repository cloned"
else
    echo "‚úÖ Repository already exists"
    cd /content/rtpipeline
    git pull origin main
    echo "Repository updated"
fi

Cloning rtpipeline repository...
‚úÖ Repository cloned


## 3Ô∏è‚É£ Create Conda Environment

This creates the rtpipeline environment for TotalSegmentator (~5-10 minutes, only once per session)

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
eval "$(/content/miniconda/bin/conda shell.bash hook)"

# Accept Anaconda Terms of Service
echo "=== Accepting Anaconda Terms of Service ==="
conda config --set channel_priority flexible
if ! conda tos accept --channel defaults 2>&1; then
    echo "‚ö†Ô∏è ToS acceptance failed or already accepted"
fi
echo "‚úÖ ToS accepted"

cd /content/rtpipeline

# Create rtpipeline environment
if conda env list | grep -q "^rtpipeline "; then
    echo "‚úÖ Environment 'rtpipeline' already exists"
else
    echo "Creating 'rtpipeline' environment (TotalSegmentator)..."
    conda env create -f envs/rtpipeline.yaml -q
    echo "‚úÖ Environment created"
fi

echo ""
conda run -n rtpipeline python -c "import numpy; print(f'‚úÖ numpy {numpy.__version__}')"

## 4Ô∏è‚É£ Mount Google Drive

**IMPORTANT:** Your DICOM files must be in Google Drive

In [4]:
from google.colab import drive
drive.mount('/content/drive')

print("\n‚úÖ Google Drive mounted at /content/drive/MyDrive/")

ValueError: mount failed

---

# ‚öôÔ∏è CONFIGURATION - UPDATE THIS!

## 5Ô∏è‚É£ Configure Input/Output Paths & Processing Options

**üî¥ REQUIRED:** Update `DICOM_ROOT` to point to your DICOM files in Google Drive

---

In [None]:
import os
from datetime import datetime

DICOM_ROOT = "/content/drive/MyDrive/my_dicom_folder"

RUN_TIMESTAMP = datetime.now().strftime("%Y%m%d_%H%M%S")
OUTPUT_DIR = f"/content/drive/MyDrive/rtpipeline_part1_output_{RUN_TIMESTAMP}"
LOGS_DIR = "/content/logs"
LOCAL_TEMP_DIR = "/content/tmp_part1"
SEG_TEMP_DIR = os.path.join(LOCAL_TEMP_DIR, "totalseg")
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(LOGS_DIR, exist_ok=True)
os.makedirs(SEG_TEMP_DIR, exist_ok=True)

CPU_COUNT = os.cpu_count() or 2
WORKERS = max(1, CPU_COUNT - 1)
SNAKEMAKE_JOB_THREADS = WORKERS
SEG_WORKERS = 1

TOTALSEG_NR_THR_RESAMP = 1
TOTALSEG_NR_THR_SAVING = 6
TOTALSEG_NUM_PROC_PRE = 6
TOTALSEG_NUM_PROC_EXPORT = 6

FAST_MODE = False
ROI_SUBSET = None
EXTRA_MODELS = []
FORCE_SEGMENTATION = False

ENABLE_CUSTOM_MODELS = False
CUSTOM_MODELS_ROOT = "/content/drive/MyDrive/custom_models"
CUSTOM_MODELS_SELECTED = []

CUSTOM_STRUCTURES_FILE = "custom_structures_pelvic.yaml"


---

## 6Ô∏è‚É£ Generate Configuration File

In [None]:
try:
    import yaml
except ImportError:
    import subprocess as _subprocess
    import sys as _sys
    _subprocess.check_call([_sys.executable, '-m', 'pip', 'install', 'pyyaml'])
    import yaml

config_data = {
    'dicom_root': DICOM_ROOT,
    'output_dir': OUTPUT_DIR,
    'logs_dir': LOGS_DIR,
    'snakemake_job_threads': WORKERS,
    'workers': WORKERS,
    'segmentation': {
        'workers': SEG_WORKERS,
        'threads_per_worker': None,
        'force': bool(FORCE_SEGMENTATION),
        'fast': bool(FAST_MODE),
        'roi_subset': ROI_SUBSET if ROI_SUBSET else None,
        'extra_models': EXTRA_MODELS or [],
        'device': 'gpu' if gpu_available else 'cpu',
        'force_split': True,
        'nr_threads_resample': TOTALSEG_NR_THR_RESAMP,
        'nr_threads_save': TOTALSEG_NR_THR_SAVING,
        'num_proc_preprocessing': TOTALSEG_NUM_PROC_PRE,
        'num_proc_export': TOTALSEG_NUM_PROC_EXPORT
    },
    'custom_models': {
        'enabled': bool(ENABLE_CUSTOM_MODELS),
        'root': CUSTOM_MODELS_ROOT,
        'models': CUSTOM_MODELS_SELECTED or [],
        'workers': 1,
        'force': False
    },
    'custom_structures': CUSTOM_STRUCTURES_FILE
}

config_path = '/content/config_part1.yaml'
with open(config_path, 'w') as f:
    f.write('# RTpipeline Configuration - Part 1 (GPU Segmentation)\n')
    yaml.safe_dump(config_data, f, sort_keys=False)

print(f"‚úÖ Configuration written to: {config_path}")
print(f"\nYou can review the configuration:")
print(f"   !cat {config_path}")


## 7Ô∏è‚É£ Run Segmentation Pipeline

This runs **ONLY** TotalSegmentator segmentation (GPU-accelerated)

‚è±Ô∏è **Estimated Time:**
- With GPU (T4): 10-20 minutes per patient
- With GPU (V100/A100): 5-15 minutes per patient
- Fast mode: ~3x faster
- ROI subset: Proportionally faster

In [None]:
import os
import subprocess
import glob
import time

os.environ['PATH'] = f"/content/miniconda/bin:{os.environ.get('PATH', '')}"
os.chdir('/content/rtpipeline')

print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("   RTpipeline Part 1: GPU Segmentation")
print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("\n‚ö° Processing Mode:")
print(f"   ‚Ä¢ GPU-accelerated segmentation")
print(f"   ‚Ä¢ {WORKERS} concurrent course(s)")
print(f"   ‚Ä¢ Fast mode: {'ON' if FAST_MODE else 'OFF'}")
print(f"\nDVH and radiomics will run in Part 2 (CPU)\n")

start_time = time.time()

# Install Snakemake if needed
try:
    subprocess.run(["conda", "run", "-n", "base", "snakemake", "--version"],
                   check=True, capture_output=True)
except subprocess.CalledProcessError:
    print("Installing Snakemake...")
    subprocess.run(["conda", "install", "-n", "base", "-c", "conda-forge", 
                    "-c", "bioconda", "snakemake", "-y", "-q"], check=True)
    print("‚úÖ Snakemake installed\n")

# Step 1: Organize courses
print("[1/2] Organizing DICOM data...")
cmd_organize = [
    "conda", "run", "-n", "base", "snakemake",
    "--configfile", "/content/config_part1.yaml",
    "--use-conda", "--cores", str(WORKERS),
    "--printshellcmds",
    "/content/output/_COURSES/manifest.json"
]

result = subprocess.run(cmd_organize, capture_output=False, text=True)

if result.returncode != 0:
    print("\n‚ö†Ô∏è Organization failed!")
else:
    org_time = time.time()
    print(f"\n‚úÖ Organization complete ({org_time - start_time:.1f}s)\n")
    
    # Step 2: Run segmentation
    print("[2/2] Running TotalSegmentator...")
    
    # Find all courses
    seg_targets = []
    custom_targets = []
    
    for patient_dir in glob.glob(f"{OUTPUT_DIR}/*/"):
        patient_name = os.path.basename(patient_dir.rstrip('/'))
        if patient_name.startswith('_') or patient_name.startswith('.'):
            continue
        for course_dir in glob.glob(f"{patient_dir}/*/"):
            course_name = os.path.basename(course_dir.rstrip('/'))
            if not course_name.startswith('_'):
                seg_targets.append(f"{OUTPUT_DIR}/{patient_name}/{course_name}/.segmentation_done")
                custom_targets.append(f"{OUTPUT_DIR}/{patient_name}/{course_name}/.custom_models_done")
    
    if seg_targets:
        print(f"Found {len(seg_targets)} course(s) to segment")
        print(f"Estimated time: {len(seg_targets) * (5 if FAST_MODE else 15) / WORKERS:.0f}-{len(seg_targets) * (15 if FAST_MODE else 25) / WORKERS:.0f} minutes\n")
        
        # Run segmentation with resource limits for Colab
        cmd_seg = [
            "conda", "run", "-n", "base", "snakemake",
            "--configfile", "/content/config_part1.yaml",
            "--use-conda",
            "--cores", str(WORKERS),
            "--resources", f"seg_workers={SEG_WORKERS}",
            "--printshellcmds",
            "--keep-going"
        ] + seg_targets + custom_targets
        
        result = subprocess.run(cmd_seg, capture_output=False, text=True)
        
        seg_time = time.time()
        if result.returncode == 0:
            print(f"\n‚úÖ All segmentations complete! ({seg_time - org_time:.1f}s)")
        else:
            print(f"\n‚ö†Ô∏è Some segmentations failed. Check logs.")
    else:
        print("\n‚ö†Ô∏è No courses found")

total_time = time.time() - start_time
print("\n" + "="*50)
print("Part 1 Complete!")
print("="*50)
print(f"Total time: {total_time/60:.1f} minutes")
print(f"\nOutputs: {OUTPUT_DIR}")
print("\nNext: Run the cell below to save to Google Drive")

## 8Ô∏è‚É£ Save Outputs to Google Drive

**IMPORTANT:** This saves your segmentation results to Google Drive for Part 2

In [None]:
import os
from datetime import datetime

def _gpu_present():
    try:
        import subprocess
        subprocess.run(['nvidia-smi'], check=True, capture_output=True)
        return True
    except Exception:
        return False

readme_path = os.path.join(OUTPUT_DIR, "README_PART2.txt")
os.makedirs(os.path.dirname(readme_path), exist_ok=True)
readme_template = """RTpipeline Part 1 Outputs
========================
Generated: {timestamp}

To continue with Part 2:
1. Open rtpipeline_colab_part2_cpu.ipynb
2. Set PART1_OUTPUT_DIR = "{output}"
3. Run all cells (CPU runtime is sufficient)

Configuration summary:
- DICOM source: {dicom}
- Workers: {workers}
- Fast mode: {fast}
- GPU detected: {gpu}
"""
readme_text = readme_template.format(
    timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    output=OUTPUT_DIR,
    dicom=DICOM_ROOT,
    workers=WORKERS,
    fast=FAST_MODE,
    gpu='Yes' if _gpu_present() else 'No'
)
with open(readme_path, 'w') as f:
    f.write(readme_text)

print("
" + "="*60)
print("üéâ PART 1 COMPLETE - OUTPUTS SAVED DIRECTLY TO GOOGLE DRIVE!")
print("="*60)
print(f"
Outputs stored at: {OUTPUT_DIR}")
print("
üìã Next Steps:")
print("   1. You can disconnect this GPU runtime now")
print("   2. Open rtpipeline_colab_part2_cpu.ipynb")
print(f"   3. Set PART1_OUTPUT_DIR = '{OUTPUT_DIR}'")
print("   4. Run Part 2 on CPU runtime (no GPU needed)")


## üìä Optional: View Summary

In [None]:
import os
import glob

print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("   Segmentation Summary")
print("‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê\n")

total = 0
completed = 0

for patient_dir in sorted(glob.glob(f"{OUTPUT_DIR}/*/")):
    patient_name = os.path.basename(patient_dir.rstrip('/'))
    if patient_name.startswith('_') or patient_name.startswith('.'):
        continue
    
    for course_dir in sorted(glob.glob(f"{patient_dir}/*/")):
        course_name = os.path.basename(course_dir.rstrip('/'))
        if course_name.startswith('_'):
            continue
        
        total += 1
        seg_done = os.path.exists(f"{course_dir}/.segmentation_done")
        
        if seg_done:
            completed += 1
        
        status = "‚úÖ" if seg_done else "‚ö†Ô∏è"
        print(f"{status} {patient_name}/{course_name}")

print(f"\nTotal: {completed}/{total} completed")

if completed == total and total > 0:
    print("\nüéâ All segmentations successful!")
elif completed > 0:
    print(f"\n‚ö†Ô∏è {total - completed} incomplete")
else:
    print("\n‚ö†Ô∏è No segmentations completed")

---

## What's Next?

**Continue with Part 2 (CPU):** `rtpipeline_colab_part2_cpu.ipynb`

Part 2 will:
- Extract DVH metrics
- Compute radiomic features
- Run robustness testing (optional)
- Generate visualizations
- Create downloadable results

**üí∞ Cost Savings:** Part 2 runs on CPU only!

---

**Notebook Version:** 2.0 (Part 1 - GPU Segmentation)  
**Repository:** https://github.com/kstawiski/rtpipeline