IHC Quantification Pipeline

Automated Ki-67, ER, PR scoring from IHC whole slide images.

Zero-annotation ROI detection using UNI v2 morphological features + DAB color deconvolution + unsupervised clustering, followed by StarDist nuclear segmentation and clinically standardized scoring.

Scores Computed

Marker	Scores	Clinical Use
Ki-67	Proliferation Index (PI), hotspot PI	Tumor grading, treatment decisions
ER	% Positive, Allred (0–8), H-Score (0–300)	Hormone receptor status
PR	% Positive, Allred (0–8), H-Score (0–300)	Hormone receptor status

Scoring Definitions

Proliferation Index = (positive nuclei / total nuclei) × 100

Allred Score = Proportion Score (PS) + Intensity Score (IS)

PS: 0=0%, 1=<1%, 2=1–10%, 3=10–33%, 4=33–66%, 5=>66%
IS: 0=negative, 1=weak dominant, 2=moderate dominant, 3=strong dominant
Total: 0 or 2–8. Positive if ≥ 3.

H-Score = 1×(%weak) + 2×(%moderate) + 3×(%strong). Range 0–300.

Architecture

IHC WSI
│
├─ Stage 1: Tissue Detection
│   └─ HSV thresholding at ~1.25x → tissue mask → patch grid
│
├─ Stage 2: Feature Extraction at 20x
│   ├─ UNI v2 (ViT-L, trained on IHC+H&E) → 1024-d morphology features
│   └─ Color deconvolution → DAB optical density features per patch
│
├─ Stage 3: ROI Detection
│   ├─ Joint [UNI + DAB] features → PCA → K-means clustering
│   ├─ Cluster characterization (DAB profile, spatial pattern)
│   └─ Tumor cluster identification (auto or manual)
│
├─ Stage 4: Cell Segmentation at 40x (within ROI only)
│   ├─ Color deconvolution → Hematoxylin channel (all nuclei visible)
│   └─ StarDist (2D_versatile_he) → individual nuclear masks
│
├─ Stage 5: DAB Intensity Classification
│   ├─ Measure mean DAB OD within each segmented nucleus
│   └─ Classify: negative (0) / weak (1+) / moderate (2+) / strong (3+)
│
├─ Stage 6: Quantification
│   ├─ Ki-67: PI (global + hotspot analysis)
│   └─ ER/PR: Percentage, Allred Score, H-Score
│
└─ Stage 7: Export
    ├─ cell_data.csv          Per-cell measurements
    ├─ quantification.json    All scores + metadata
    ├─ roi_overlay.jpg        Tumor ROI on thumbnail
    ├─ roi.geojson            QuPath-compatible annotations
    └─ cell_samples/          Sample cell overlay images

Why This Architecture

Why UNI v2 instead of CONCH? CONCH was trained exclusively on H&E image-caption pairs. Its vision encoder has never seen DAB chromogen, and its text-image alignment doesn't cover IHC-specific concepts. UNI v2 was trained on 100K+ slides across H&E and IHC — it actually understands IHC tissue morphology.

Why clustering instead of text-guided similarity? Without a text encoder trained on IHC captions, text-guided zero-shot prediction produces unreliable results on IHC. Clustering on joint morphology+DAB features discovers natural tissue compartments without needing text alignment. The clusters can be auto-characterized via their DAB profile or manually labeled in ~5 minutes.

Why StarDist on hematoxylin channel? The hematoxylin channel from color deconvolution shows all nuclei regardless of DAB staining status, giving an unbiased segmentation. StarDist's star-convex polygon model is well-suited for nuclei (which are approximately convex), and the pretrained 2D_versatile_he model transfers well to the hematoxylin appearance.

Why separate segmentation from classification? Segmenting nuclei on the hematoxylin channel and measuring DAB separately ensures that DAB-negative nuclei are detected equally well as DAB-positive ones. If you segment on the RGB image directly, the model may miss weakly-stained or negative nuclei.

Installation

git clone <repo-url>
cd ihc-roi-detector
pip install -r requirements.txt

# UNI v2 requires HuggingFace access
# 1. Request access at https://huggingface.co/MahmoodLab/UNI
# 2. Login: huggingface-cli login
export HF_TOKEN=your_token_here

# OpenSlide system dependency
sudo apt-get install openslide-tools  # Ubuntu

Quick Start

# Ki-67 proliferation index
python scripts/run.py /path/to/slide.svs --marker ki67

# ER Allred + H-Score
python scripts/run.py /path/to/slide.svs --marker er

# PR on multiple slides
python scripts/run.py /path/to/slides/ --marker pr --output results/

# CPU mode (slower)
python scripts/run.py /path/to/slide.svs --marker ki67 --device cpu

Python API

from src.pipeline import IHCQuantificationPipeline

pipeline = IHCQuantificationPipeline("config/default.yaml")
results = pipeline.run("/path/to/slide.svs", marker="ki67")

# Access scores
quant = results["quantification"]
print(f"Ki-67 PI: {quant.proliferation_index:.1f}%")
print(f"Total nuclei: {quant.n_total}")
print(f"Positive: {quant.n_positive}")

# ER/PR scores
results = pipeline.run("/path/to/er_slide.svs", marker="er")
quant = results["quantification"]
print(f"Allred: {quant.allred_total}/8 ({'Positive' if quant.allred_positive else 'Negative'})")
print(f"H-Score: {quant.h_score:.0f}/300")
print(f"Percentage: {quant.percentage_index:.1f}%")

# Ki-67 hotspot analysis
hotspots = results.get("hotspots", [])
for i, hs in enumerate(hotspots):
    print(f"Hotspot {i+1}: PI={hs.proliferation_index:.1f}%")

Calibration

DAB intensity thresholds significantly affect scoring and should be calibrated per scanner/lab.

# View the DAB OD distribution and current thresholds
python scripts/calibrate.py histogram output/slide_name/

# Re-quantify with adjusted thresholds (instant, no re-segmentation)
python scripts/calibrate.py rethreshold output/slide_name/ \
    --neg 0.08 --weak 0.20 --mod 0.40

# Multi-Otsu thresholding is suggested automatically in the histogram output

The default thresholds (neg < 0.10, weak < 0.25, moderate < 0.45) are reasonable starting points but should be validated against pathologist ground truth for your specific setup.

ROI Cluster Tuning

The auto-detection of tumor clusters works well for typical cases but can be overridden:

# Inspect cluster profiles
python scripts/calibrate.py clusters output/slide_name/

# Re-run with manual cluster selection
python scripts/run.py /path/to/slide.svs --marker ki67 --roi-clusters 2 5 7

Output Structure

output/slide_name/
├── thumbnail.jpg                 WSI thumbnail
├── tissue_mask.png               Binary tissue mask
├── roi_overlay.jpg               Tumor ROI contours on thumbnail
├── roi.geojson                   QuPath-compatible ROI annotations
├── cell_data.csv                 Per-cell: position, area, DAB OD, intensity class
├── quantification.json           All scores, cluster profiles, metadata
└── cell_samples/                 Sample patches with colored nuclear overlays
    ├── cell_overlay_0000.jpg
    ├── cell_overlay_0001.jpg
    └── ...

Configuration

Edit config/default.yaml to adjust:

Feature extraction: UNI v2 / Virchow-2 / Phikon-v2, batch size, device
ROI clustering: number of clusters, PCA components, DAB feature weight
Cell segmentation: StarDist thresholds, nucleus area limits
Intensity thresholds: negative/weak/moderate/strong OD cutoffs
Quantification: positive threshold per marker

Limitations

Threshold sensitivity: DAB intensity cutoffs need per-lab calibration. The defaults are reasonable but not universal.
ROI auto-detection: Works well for moderate-to-high positivity. Very low-positivity slides may need manual cluster selection.
Single-marker design: Assumes one DAB chromogen. Dual-stain IHC (e.g., Ki-67/CK) would need chromogen separation.
No tissue-type awareness: The ROI detector identifies DAB-positive tumor-like regions but doesn't distinguish invasive carcinoma from DCIS, for example.
StarDist limitations: May under-segment highly overlapping or elongated nuclei. Consider Cellpose or HoVer-Net for difficult morphologies.

References

UNI: Chen et al., "A General-Purpose Self-Supervised Model for Computational Pathology", Nature Medicine 2024
StarDist: Schmidt et al., "Cell Detection with Star-Convex Polygons", MICCAI 2018
Color Deconvolution: Ruifrok & Johnston, "Quantification of histochemical staining by color deconvolution", Analytical and Quantitative Cytology and Histology 2001
Allred Score: Allred et al., "Prognostic and predictive factors in breast cancer by immunohistochemical analysis", Modern Pathology 1998

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cell_segmentor.py		cell_segmentor.py
color_deconv.py		color_deconv.py
default.yaml		default.yaml
feature_extractor.py		feature_extractor.py
pipeline.py		pipeline.py
roi_detector.py		roi_detector.py
run.py		run.py
run_parallel.py		run_parallel.py
tissue_detector.py		tissue_detector.py
visualization.py		visualization.py
wsi_reader.py		wsi_reader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IHC Quantification Pipeline

Scores Computed

Scoring Definitions

Architecture

Why This Architecture

Installation

Quick Start

Python API

Calibration

ROI Cluster Tuning

Output Structure

Configuration

Limitations

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IHC Quantification Pipeline

Scores Computed

Scoring Definitions

Architecture

Why This Architecture

Installation

Quick Start

Python API

Calibration

ROI Cluster Tuning

Output Structure

Configuration

Limitations

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages