Skip to content

modelpath-dev/Cell-Seg

Repository files navigation

IHC Quantification Pipeline

Automated Ki-67, ER, PR scoring from IHC whole slide images.

Zero-annotation ROI detection using UNI v2 morphological features + DAB color deconvolution + unsupervised clustering, followed by StarDist nuclear segmentation and clinically standardized scoring.

Scores Computed

Marker Scores Clinical Use
Ki-67 Proliferation Index (PI), hotspot PI Tumor grading, treatment decisions
ER % Positive, Allred (0–8), H-Score (0–300) Hormone receptor status
PR % Positive, Allred (0–8), H-Score (0–300) Hormone receptor status

Scoring Definitions

Proliferation Index = (positive nuclei / total nuclei) × 100

Allred Score = Proportion Score (PS) + Intensity Score (IS)

  • PS: 0=0%, 1=<1%, 2=1–10%, 3=10–33%, 4=33–66%, 5=>66%
  • IS: 0=negative, 1=weak dominant, 2=moderate dominant, 3=strong dominant
  • Total: 0 or 2–8. Positive if ≥ 3.

H-Score = 1×(%weak) + 2×(%moderate) + 3×(%strong). Range 0–300.

Architecture

IHC WSI
│
├─ Stage 1: Tissue Detection
│   └─ HSV thresholding at ~1.25x → tissue mask → patch grid
│
├─ Stage 2: Feature Extraction at 20x
│   ├─ UNI v2 (ViT-L, trained on IHC+H&E) → 1024-d morphology features
│   └─ Color deconvolution → DAB optical density features per patch
│
├─ Stage 3: ROI Detection
│   ├─ Joint [UNI + DAB] features → PCA → K-means clustering
│   ├─ Cluster characterization (DAB profile, spatial pattern)
│   └─ Tumor cluster identification (auto or manual)
│
├─ Stage 4: Cell Segmentation at 40x (within ROI only)
│   ├─ Color deconvolution → Hematoxylin channel (all nuclei visible)
│   └─ StarDist (2D_versatile_he) → individual nuclear masks
│
├─ Stage 5: DAB Intensity Classification
│   ├─ Measure mean DAB OD within each segmented nucleus
│   └─ Classify: negative (0) / weak (1+) / moderate (2+) / strong (3+)
│
├─ Stage 6: Quantification
│   ├─ Ki-67: PI (global + hotspot analysis)
│   └─ ER/PR: Percentage, Allred Score, H-Score
│
└─ Stage 7: Export
    ├─ cell_data.csv          Per-cell measurements
    ├─ quantification.json    All scores + metadata
    ├─ roi_overlay.jpg        Tumor ROI on thumbnail
    ├─ roi.geojson            QuPath-compatible annotations
    └─ cell_samples/          Sample cell overlay images

Why This Architecture

Why UNI v2 instead of CONCH? CONCH was trained exclusively on H&E image-caption pairs. Its vision encoder has never seen DAB chromogen, and its text-image alignment doesn't cover IHC-specific concepts. UNI v2 was trained on 100K+ slides across H&E and IHC — it actually understands IHC tissue morphology.

Why clustering instead of text-guided similarity? Without a text encoder trained on IHC captions, text-guided zero-shot prediction produces unreliable results on IHC. Clustering on joint morphology+DAB features discovers natural tissue compartments without needing text alignment. The clusters can be auto-characterized via their DAB profile or manually labeled in ~5 minutes.

Why StarDist on hematoxylin channel? The hematoxylin channel from color deconvolution shows all nuclei regardless of DAB staining status, giving an unbiased segmentation. StarDist's star-convex polygon model is well-suited for nuclei (which are approximately convex), and the pretrained 2D_versatile_he model transfers well to the hematoxylin appearance.

Why separate segmentation from classification? Segmenting nuclei on the hematoxylin channel and measuring DAB separately ensures that DAB-negative nuclei are detected equally well as DAB-positive ones. If you segment on the RGB image directly, the model may miss weakly-stained or negative nuclei.

Installation

git clone <repo-url>
cd ihc-roi-detector
pip install -r requirements.txt

# UNI v2 requires HuggingFace access
# 1. Request access at https://huggingface.co/MahmoodLab/UNI
# 2. Login: huggingface-cli login
export HF_TOKEN=your_token_here

# OpenSlide system dependency
sudo apt-get install openslide-tools  # Ubuntu

Quick Start

# Ki-67 proliferation index
python scripts/run.py /path/to/slide.svs --marker ki67

# ER Allred + H-Score
python scripts/run.py /path/to/slide.svs --marker er

# PR on multiple slides
python scripts/run.py /path/to/slides/ --marker pr --output results/

# CPU mode (slower)
python scripts/run.py /path/to/slide.svs --marker ki67 --device cpu

Python API

from src.pipeline import IHCQuantificationPipeline

pipeline = IHCQuantificationPipeline("config/default.yaml")
results = pipeline.run("/path/to/slide.svs", marker="ki67")

# Access scores
quant = results["quantification"]
print(f"Ki-67 PI: {quant.proliferation_index:.1f}%")
print(f"Total nuclei: {quant.n_total}")
print(f"Positive: {quant.n_positive}")

# ER/PR scores
results = pipeline.run("/path/to/er_slide.svs", marker="er")
quant = results["quantification"]
print(f"Allred: {quant.allred_total}/8 ({'Positive' if quant.allred_positive else 'Negative'})")
print(f"H-Score: {quant.h_score:.0f}/300")
print(f"Percentage: {quant.percentage_index:.1f}%")

# Ki-67 hotspot analysis
hotspots = results.get("hotspots", [])
for i, hs in enumerate(hotspots):
    print(f"Hotspot {i+1}: PI={hs.proliferation_index:.1f}%")

Calibration

DAB intensity thresholds significantly affect scoring and should be calibrated per scanner/lab.

# View the DAB OD distribution and current thresholds
python scripts/calibrate.py histogram output/slide_name/

# Re-quantify with adjusted thresholds (instant, no re-segmentation)
python scripts/calibrate.py rethreshold output/slide_name/ \
    --neg 0.08 --weak 0.20 --mod 0.40

# Multi-Otsu thresholding is suggested automatically in the histogram output

The default thresholds (neg < 0.10, weak < 0.25, moderate < 0.45) are reasonable starting points but should be validated against pathologist ground truth for your specific setup.

ROI Cluster Tuning

The auto-detection of tumor clusters works well for typical cases but can be overridden:

# Inspect cluster profiles
python scripts/calibrate.py clusters output/slide_name/

# Re-run with manual cluster selection
python scripts/run.py /path/to/slide.svs --marker ki67 --roi-clusters 2 5 7

Output Structure

output/slide_name/
├── thumbnail.jpg                 WSI thumbnail
├── tissue_mask.png               Binary tissue mask
├── roi_overlay.jpg               Tumor ROI contours on thumbnail
├── roi.geojson                   QuPath-compatible ROI annotations
├── cell_data.csv                 Per-cell: position, area, DAB OD, intensity class
├── quantification.json           All scores, cluster profiles, metadata
└── cell_samples/                 Sample patches with colored nuclear overlays
    ├── cell_overlay_0000.jpg
    ├── cell_overlay_0001.jpg
    └── ...

Configuration

Edit config/default.yaml to adjust:

  • Feature extraction: UNI v2 / Virchow-2 / Phikon-v2, batch size, device
  • ROI clustering: number of clusters, PCA components, DAB feature weight
  • Cell segmentation: StarDist thresholds, nucleus area limits
  • Intensity thresholds: negative/weak/moderate/strong OD cutoffs
  • Quantification: positive threshold per marker

Limitations

  • Threshold sensitivity: DAB intensity cutoffs need per-lab calibration. The defaults are reasonable but not universal.
  • ROI auto-detection: Works well for moderate-to-high positivity. Very low-positivity slides may need manual cluster selection.
  • Single-marker design: Assumes one DAB chromogen. Dual-stain IHC (e.g., Ki-67/CK) would need chromogen separation.
  • No tissue-type awareness: The ROI detector identifies DAB-positive tumor-like regions but doesn't distinguish invasive carcinoma from DCIS, for example.
  • StarDist limitations: May under-segment highly overlapping or elongated nuclei. Consider Cellpose or HoVer-Net for difficult morphologies.

References

  • UNI: Chen et al., "A General-Purpose Self-Supervised Model for Computational Pathology", Nature Medicine 2024
  • StarDist: Schmidt et al., "Cell Detection with Star-Convex Polygons", MICCAI 2018
  • Color Deconvolution: Ruifrok & Johnston, "Quantification of histochemical staining by color deconvolution", Analytical and Quantitative Cytology and Histology 2001
  • Allred Score: Allred et al., "Prognostic and predictive factors in breast cancer by immunohistochemical analysis", Modern Pathology 1998

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages