# RTpipeline - Part 2: CPU Analysis

**Radiotherapy DICOM Processing Pipeline - Colab Edition**

This notebook runs CPU-intensive analysis tasks:
- DVH (Dose-Volume Histogram) calculation
- Radiomics feature extraction
- Quality control reports
- Results aggregation

---

## Prerequisites

1. **Part 1 Complete**: Run `rtpipeline_colab_part1_gpu.ipynb` first for segmentation
2. **Runtime**: CPU runtime is sufficient (no GPU needed)
3. **Time**: ~2-5 min per patient for full analysis

---

## 1. Mount Google Drive

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("Google Drive mounted at /content/drive")

## 2. Configure Paths

**Important**: Use the same paths as Part 1!

In [None]:
#@title Path Configuration { display-mode: "form" }
#@markdown ### Directories (must match Part 1)

DICOM_INPUT = "/content/drive/MyDrive/RTpipeline/Input"  #@param {type:"string"}
OUTPUT_DIR = "/content/drive/MyDrive/RTpipeline/Output"  #@param {type:"string"}
LOGS_DIR = "/content/drive/MyDrive/RTpipeline/Logs"  #@param {type:"string"}

#@markdown ### Anatomical Region (must match Part 1)
ANATOMICAL_REGION = "pelvis"  #@param ["pelvis", "thorax", "abdomen", "head_neck", "brain"]

#@markdown ### Processing Options
ENABLE_CT_CROPPING = True  #@param {type:"boolean"}
ENABLE_ROBUSTNESS = False  #@param {type:"boolean"}

# Verify Part 1 outputs exist
import os
from pathlib import Path

output_path = Path(OUTPUT_DIR)
if output_path.exists():
    patients = [d for d in output_path.iterdir() if d.is_dir() and not d.name.startswith('_')]
    seg_count = sum(1 for p in patients for c in p.iterdir() 
                    if c.is_dir() and (c / 'Segmentation_TotalSegmentator').exists())
    print(f"Found {len(patients)} patients with {seg_count} segmented courses")
    
    if seg_count == 0:
        print("\nWARNING: No segmentations found!")
        print("Please run Part 1 (GPU notebook) first.")
else:
    print(f"ERROR: Output directory not found: {OUTPUT_DIR}")
    print("Please run Part 1 first or check your path configuration.")

## 3. Install Dependencies

In [None]:
%%bash
# Install Miniconda
if [ ! -d "/content/miniconda" ]; then
    echo "Installing Miniconda..."
    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
    bash miniconda.sh -b -p /content/miniconda
    rm miniconda.sh
    echo "Miniconda installed."
else
    echo "Miniconda already installed."
fi

export PATH="/content/miniconda/bin:$PATH"

# Install mamba
if ! command -v mamba &> /dev/null; then
    echo "Installing mamba..."
    conda install -y -c conda-forge mamba
fi

echo "Done."

In [None]:
import os
os.environ['PATH'] = '/content/miniconda/bin:' + os.environ['PATH']

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Clone/update rtpipeline
if [ ! -d "/content/rtpipeline" ]; then
    echo "Cloning rtpipeline..."
    git clone https://github.com/kstawiski/rtpipeline.git /content/rtpipeline
else
    echo "Updating rtpipeline..."
    cd /content/rtpipeline && git pull
fi

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"

# Create rtpipeline-radiomics environment (PyRadiomics requires NumPy 1.x)
if ! conda env list | grep -q "rtpipeline-radiomics"; then
    echo "Creating rtpipeline-radiomics environment (this takes ~10 minutes)..."
    mamba env create -f /content/rtpipeline/envs/rtpipeline-radiomics.yaml
    echo "Environment created."
else
    echo "rtpipeline-radiomics environment already exists."
fi

# Also ensure main environment exists for DVH
if ! conda env list | grep -q "^rtpipeline "; then
    echo "Creating rtpipeline environment..."
    mamba env create -f /content/rtpipeline/envs/rtpipeline.yaml
fi

# Install rtpipeline package in both environments
source /content/miniconda/etc/profile.d/conda.sh

conda activate rtpipeline
pip install -e /content/rtpipeline 2>/dev/null || true

conda activate rtpipeline-radiomics
pip install -e /content/rtpipeline 2>/dev/null || true

echo "\nEnvironments ready!"
conda env list

## 4. Create Configuration

In [None]:
# Generate config.yaml (same as Part 1 but may have robustness enabled)
config_content = f'''# RTpipeline Colab Configuration - Part 2
# Generated automatically

container_mode: false

# Directories
dicom_root: "{DICOM_INPUT}"
output_dir: "{OUTPUT_DIR}"
logs_dir: "{LOGS_DIR}"

# Processing
max_workers: 2

# Segmentation (already done in Part 1)
segmentation:
  max_workers: 1
  force: false
  fast: false
  device: "cpu"  # Not needed for Part 2

# Custom models
custom_models:
  enabled: false
  root: "/content/rtpipeline/custom_models"

# Radiomics
radiomics:
  sequential: true  # More stable in Colab
  params_file: "/content/rtpipeline/rtpipeline/radiomics_params.yaml"
  mr_params_file: "/content/rtpipeline/rtpipeline/radiomics_params_mr.yaml"
  skip_rois:
    - body
    - couchsurface
    - couchinterior
    - couchexterior
    - bones
  max_voxels: 500000000
  min_voxels: 10

# Robustness analysis
radiomics_robustness:
  enabled: {str(ENABLE_ROBUSTNESS).lower()}
  modes:
    - segmentation_perturbation
  segmentation_perturbation:
    apply_to_structures:
      - "GTV*"
      - "CTV*"
      - "PTV*"
    small_volume_changes: [-0.15, 0.0, 0.15]
    large_volume_changes: [-0.30, 0.0, 0.30]
    intensity: "standard"

# Environment names
environments:
  main: "rtpipeline"
  radiomics: "rtpipeline-radiomics"

# Custom structures
custom_structures: "/content/rtpipeline/custom_structures_pelvic.yaml"

# CT Cropping
ct_cropping:
  enabled: {str(ENABLE_CT_CROPPING).lower()}
  region: "{ANATOMICAL_REGION}"
  superior_margin_cm: 2.0
  inferior_margin_cm: 10.0
  use_cropped_for_dvh: true
  use_cropped_for_radiomics: true
  keep_original: true
'''

config_path = '/content/rtpipeline/config.colab.yaml'
with open(config_path, 'w') as f:
    f.write(config_content)

print(f"Configuration saved to: {config_path}")

## 5. Run DVH Calculation

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline

cd /content/rtpipeline

echo "Running DVH calculation..."
snakemake \
    --cores 2 \
    --configfile config.colab.yaml \
    --until all_dvh \
    --rerun-incomplete \
    2>&1 | tee /content/drive/MyDrive/RTpipeline/Logs/part2_dvh.log

echo "\nDVH calculation complete!"

## 6. Run Radiomics Extraction

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline-radiomics

cd /content/rtpipeline

echo "Running radiomics extraction..."
echo "(This may take several minutes per patient)"

snakemake \
    --cores 2 \
    --configfile config.colab.yaml \
    --until all_radiomics \
    --rerun-incomplete \
    2>&1 | tee /content/drive/MyDrive/RTpipeline/Logs/part2_radiomics.log

echo "\nRadiomics extraction complete!"

## 7. Run Quality Control

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline

cd /content/rtpipeline

echo "Running quality control..."
snakemake \
    --cores 2 \
    --configfile config.colab.yaml \
    --until all_qc \
    --rerun-incomplete \
    2>&1 | tee /content/drive/MyDrive/RTpipeline/Logs/part2_qc.log

echo "\nQuality control complete!"

## 8. Aggregate Results

In [None]:
%%bash
export PATH="/content/miniconda/bin:$PATH"
source /content/miniconda/etc/profile.d/conda.sh
conda activate rtpipeline

cd /content/rtpipeline

echo "Aggregating all results..."
snakemake \
    --cores 2 \
    --configfile config.colab.yaml \
    all \
    --rerun-incomplete \
    2>&1 | tee /content/drive/MyDrive/RTpipeline/Logs/part2_aggregate.log

echo "\nAggregation complete!"

## 9. View Results

In [None]:
import pandas as pd
from pathlib import Path

results_dir = Path(OUTPUT_DIR) / '_RESULTS'

print("=" * 60)
print("PIPELINE RESULTS")
print("=" * 60)

if results_dir.exists():
    print(f"\nResults directory: {results_dir}")
    print("\nAvailable files:")
    for f in sorted(results_dir.glob('*.xlsx')):
        size_mb = f.stat().st_size / 1e6
        print(f"  - {f.name} ({size_mb:.2f} MB)")
else:
    print(f"Results directory not found: {results_dir}")
    print("The pipeline may not have completed successfully.")

In [None]:
# Load and preview DVH metrics
dvh_file = Path(OUTPUT_DIR) / '_RESULTS' / 'dvh_metrics.xlsx'

if dvh_file.exists():
    dvh = pd.read_excel(dvh_file)
    print(f"DVH Metrics: {len(dvh)} rows")
    print(f"Columns: {list(dvh.columns)[:10]}...")
    print(f"\nStructures: {dvh['Structure'].nunique()}")
    print(f"Patients: {dvh['PatientID'].nunique()}")
    display(dvh.head(10))
else:
    print("DVH metrics file not found.")

In [None]:
# Load and preview radiomics features
radiomics_file = Path(OUTPUT_DIR) / '_RESULTS' / 'radiomics_ct.xlsx'

if radiomics_file.exists():
    rad = pd.read_excel(radiomics_file)
    print(f"Radiomics: {len(rad)} rows, {len(rad.columns)} features")
    print(f"\nStructures: {rad['Structure'].nunique() if 'Structure' in rad.columns else 'N/A'}")
    
    # Count feature types
    original = len([c for c in rad.columns if c.startswith('original_')])
    wavelet = len([c for c in rad.columns if c.startswith('wavelet')])
    log = len([c for c in rad.columns if c.startswith('log-sigma')])
    print(f"\nFeature breakdown:")
    print(f"  Original: {original}")
    print(f"  Wavelet: {wavelet}")
    print(f"  LoG: {log}")
else:
    print("Radiomics file not found.")

In [None]:
# Check QC summary
qc_file = Path(OUTPUT_DIR) / '_RESULTS' / 'qc_reports.xlsx'

if qc_file.exists():
    qc = pd.read_excel(qc_file)
    print("Quality Control Summary")
    print("=" * 40)
    
    if 'Overall_Status' in qc.columns:
        status_counts = qc['Overall_Status'].value_counts()
        print(f"\nStatus breakdown:")
        for status, count in status_counts.items():
            print(f"  {status}: {count}")
    
    display(qc.head())
else:
    print("QC file not found.")

## 10. Download Results

Results are already saved in your Google Drive at:
```
MyDrive/RTpipeline/Output/_RESULTS/
```

You can also download specific files directly:

In [None]:
# Optional: Download results to your local machine
from google.colab import files

results_dir = Path(OUTPUT_DIR) / '_RESULTS'

# Uncomment to download specific files:
# files.download(str(results_dir / 'dvh_metrics.xlsx'))
# files.download(str(results_dir / 'radiomics_ct.xlsx'))
# files.download(str(results_dir / 'qc_reports.xlsx'))

print("To download files, uncomment the lines above and run this cell.")
print(f"\nOr access them directly in Google Drive at:")
print(f"  {results_dir}")

## Summary

In [None]:
from pathlib import Path

print("=" * 60)
print("RTPIPELINE PROCESSING COMPLETE")
print("=" * 60)

output_path = Path(OUTPUT_DIR)
results_path = output_path / '_RESULTS'

# Count outputs
patients = [d for d in output_path.iterdir() if d.is_dir() and not d.name.startswith('_')]
courses = sum(1 for p in patients for c in p.iterdir() if c.is_dir())

print(f"\nProcessed:")
print(f"  Patients: {len(patients)}")
print(f"  Treatment courses: {courses}")

print(f"\nResults location:")
print(f"  {results_path}")

print(f"\nKey output files:")
if results_path.exists():
    for f in sorted(results_path.glob('*.xlsx')):
        print(f"  - {f.name}")

print(f"\nLogs location:")
print(f"  {LOGS_DIR}")

print("\n" + "=" * 60)
print("Thank you for using RTpipeline!")
print("=" * 60)