# Classifying Tissue Patches from IDC Slides with TIAToolbox

<a href="https://colab.research.google.com/github/fedorov/idc-tiatoolbox/blob/main/notebooks/04_patch_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Overview

TIAToolbox's `PatchPredictor` can classify tissue patches using pretrained deep learning models. In this notebook, we use the **ResNet18-Kather100K** model to classify tissue in a colorectal cancer slide from IDC into 9 tissue types:

1. Adipose
2. Background
3. Debris
4. Lymphocytes
5. Mucus
6. Smooth muscle
7. Normal colon mucosa
8. Cancer-associated stroma
9. Colorectal adenocarcinoma epithelium

**GPU recommended** for faster inference (but works on CPU too).

## Installation

Run the cell below to install dependencies. **On Colab, the runtime will automatically restart** after installation to pick up the updated numpy version. After the restart, continue from the imports cell below.

In [None]:
%pip install tiatoolbox idc-index openslide-bin "numcodecs<0.16"

# Restart runtime to pick up updated numpy (required on Colab)
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
import os
import json
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import torch

from idc_index import IDCClient
from tiatoolbox.wsicore.wsireader import WSIReader
from tiatoolbox.models.engine.patch_predictor import PatchPredictor

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cpu":
    print("Note: GPU is recommended for faster inference.")
    print("In Colab: Runtime > Change runtime type > T4 GPU")

## 1. Select and Download a Colorectal Cancer Slide

The Kather100K model was trained on colorectal cancer tissue, so we'll use a slide from the `tcga_coad` (TCGA Colon Adenocarcinoma) collection for best results.

In [None]:
idc_client = IDCClient()
idc_client.fetch_index("sm_index")

candidates = idc_client.sql_query("""
    SELECT
        i.SeriesInstanceUID,
        i.PatientID,
        i.collection_id,
        ROUND(i.series_size_MB, 1) as size_mb,
        s.ObjectiveLensPower,
        s.max_TotalPixelMatrixColumns as width,
        s.max_TotalPixelMatrixRows as height
    FROM sm_index s
    JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
    WHERE i.collection_id = 'tcga_coad'
        AND s.ObjectiveLensPower >= 20
    ORDER BY i.series_size_MB ASC
    LIMIT 5
""")

selected = candidates.iloc[0]
series_uid = selected['SeriesInstanceUID']
print(f"Selected: {selected['PatientID']}, {selected['size_mb']} MB")
print(f"  Dimensions: {selected['width']}x{selected['height']} @ {selected['ObjectiveLensPower']}x")

In [None]:
download_dir = './slides'
os.makedirs(download_dir, exist_ok=True)

idc_client.download_from_selection(
    downloadDir=download_dir,
    seriesInstanceUID=[series_uid],
    dirTemplate='%SeriesInstanceUID'
)

slide_path = os.path.join(download_dir, series_uid)
reader = WSIReader.open(slide_path)
print(f"Opened: {type(reader).__name__}, dimensions: {reader.info.slide_dimensions}")

In [None]:
# Show slide thumbnail
thumbnail = reader.slide_thumbnail(resolution=1.25, units="power")
plt.figure(figsize=(10, 8))
plt.imshow(thumbnail)
plt.title(f"Slide Thumbnail ({selected['PatientID']})", fontsize=14)
plt.axis('off')
plt.show()

## 2. Run Patch Classification with PatchPredictor

TIAToolbox's `PatchPredictor` supports three modes:
- **`patch`**: Classify individual images/patches
- **`tile`**: Classify a larger image tile
- **`wsi`**: Classify an entire whole slide image with automatic patch extraction

We'll use **WSI mode** which automatically extracts patches from tissue regions and classifies them.

### About the Kather100K Model

The `resnet18-kather100k` model is a ResNet-18 architecture trained on the [NCT-CRC-HE-100K dataset](https://zenodo.org/record/1214456) for 9-class tissue type classification in colorectal cancer histology.

In [None]:
# Initialize PatchPredictor with pretrained model
predictor = PatchPredictor(
    pretrained_model="resnet18-kather100k",
    batch_size=64,
)

print(f"Model: resnet18-kather100k")
print(f"Classes: {predictor.labels}")

In [None]:
# Run prediction in WSI mode
# This will automatically extract patches from tissue regions and classify them
output = predictor.predict(
    imgs=[slide_path],
    mode="wsi",
    save_dir="./patch_pred_results/",
    patch_input_shape=(224, 224),
    stride_shape=(224, 224),
    resolution=0.5,
    units="mpp",
    on_gpu=(device == "cuda"),
)

print(f"Prediction complete!")
print(f"Number of patches classified: {len(output[0]['predictions'])}")

## 3. Visualize Classification Results

Let's create a tissue classification heatmap showing what type of tissue is present at each location.

In [None]:
# Extract predictions and coordinates
predictions = output[0]['predictions']
coordinates = output[0]['coordinates']

# Kather100K class names and colors
class_names = predictor.labels
class_colors = {
    'ADI': [0.9, 0.8, 0.2],   # Adipose - yellow
    'BACK': [0.9, 0.9, 0.9],  # Background - light gray
    'DEB': [0.5, 0.3, 0.1],   # Debris - brown
    'LYM': [0.2, 0.6, 0.9],   # Lymphocytes - blue
    'MUC': [0.8, 0.4, 0.8],   # Mucus - purple
    'MUS': [0.9, 0.5, 0.3],   # Smooth muscle - orange
    'NORM': [0.3, 0.8, 0.3],  # Normal mucosa - green
    'STR': [0.6, 0.6, 0.6],   # Stroma - gray
    'TUM': [0.9, 0.2, 0.2],   # Tumor - red
}

print(f"Class names: {class_names}")
print(f"\nPrediction distribution:")
unique, counts = np.unique(predictions, return_counts=True)
for cls_idx, count in zip(unique, counts):
    name = class_names[cls_idx]
    pct = count / len(predictions) * 100
    print(f"  {name}: {count} patches ({pct:.1f}%)")

In [None]:
# Create a prediction overlay on the thumbnail
slide_w, slide_h = reader.info.slide_dimensions
thumb_h, thumb_w = thumbnail.shape[:2]

# Create an overlay image
overlay = np.ones((thumb_h, thumb_w, 3), dtype=np.float32) * 0.95

for pred, coord in zip(predictions, coordinates):
    # coord is (x_start, y_start, x_end, y_end) in baseline coords
    x1 = int(coord[0] * thumb_w / slide_w)
    y1 = int(coord[1] * thumb_h / slide_h)
    x2 = int(coord[2] * thumb_w / slide_w)
    y2 = int(coord[3] * thumb_h / slide_h)

    class_name = class_names[pred]
    color = class_colors.get(class_name, [0.5, 0.5, 0.5])
    overlay[y1:y2, x1:x2] = color

# Blend with thumbnail
alpha = 0.5
blended = (alpha * overlay + (1 - alpha) * thumbnail / 255.0)
blended = np.clip(blended, 0, 1)

fig, axes = plt.subplots(1, 2, figsize=(20, 8))

axes[0].imshow(thumbnail)
axes[0].set_title("Original Slide", fontsize=14)
axes[0].axis('off')

axes[1].imshow(blended)
axes[1].set_title("Tissue Classification Map", fontsize=14)
axes[1].axis('off')

# Add legend
legend_patches = [
    mpatches.Patch(color=color, label=name)
    for name, color in class_colors.items()
    if name != 'BACK'
]
axes[1].legend(handles=legend_patches, loc='lower right', fontsize=9,
               framealpha=0.9, ncol=2)

plt.tight_layout()
plt.show()

## 4. Examine High-Confidence Predictions

Let's look at some example patches for the most common tissue types to verify the classifications look reasonable.

In [None]:
# Show example patches for select tissue types
types_to_show = ['TUM', 'STR', 'LYM', 'NORM']
n_examples = 4

fig, axes = plt.subplots(len(types_to_show), n_examples, figsize=(4 * n_examples, 4 * len(types_to_show)))

for row, tissue_type in enumerate(types_to_show):
    type_idx = class_names.index(tissue_type)
    matching = [(i, c) for i, (p, c) in enumerate(zip(predictions, coordinates)) if p == type_idx]

    # Sample up to n_examples
    np.random.seed(42)
    if len(matching) > n_examples:
        sample_indices = np.random.choice(len(matching), n_examples, replace=False)
        matching = [matching[i] for i in sample_indices]

    for col in range(n_examples):
        ax = axes[row][col]
        if col < len(matching):
            _, coord = matching[col]
            # Read the actual patch at the original resolution
            patch = reader.read_bounds(
                bounds=coord,
                resolution=0.5,
                units="mpp"
            )
            ax.imshow(patch)
        ax.axis('off')
        if col == 0:
            ax.set_ylabel(tissue_type, fontsize=14, rotation=0, labelpad=40)

plt.suptitle("Example Patches by Predicted Tissue Type", fontsize=14, y=1.01)
plt.tight_layout()
plt.show()

## 5. View in SLIM Viewer

Compare the computational predictions with the actual slide using IDC's interactive viewer.

In [None]:
viewer_url = idc_client.get_viewer_URL(seriesInstanceUID=series_uid)
print(f"View this slide in SLIM viewer:")
print(viewer_url)

## Summary

In this notebook, we learned how to:

- Use `PatchPredictor` with the pretrained `resnet18-kather100k` model for 9-class tissue classification
- Run inference in **WSI mode**, which automatically handles patch extraction from tissue regions
- Visualize predictions as a color-coded tissue classification map overlaid on the slide
- Inspect individual patches to verify classification quality

**Available pretrained models** for patch classification include ResNet, DenseNet, MobileNet variants trained on:
- **Kather100K**: 9-class colorectal tissue classification
- **PCam**: Binary cancer detection in lymph node metastases

**Next:** [Notebook 05](05_semantic_segmentation.ipynb) demonstrates pixel-level tissue region segmentation.

## Acknowledgments

- **IDC:** Fedorov, A., et al. "National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence." *RadioGraphics* 43.12 (2023). https://doi.org/10.1148/rg.230180
- **TIAToolbox:** Pocock, J., et al. "TIAToolbox as an end-to-end library for advanced tissue image analytics." *Communications Medicine* 2, 120 (2022). https://doi.org/10.1038/s43856-022-00186-5
- **Kather100K:** Kather, J.N., et al. "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study." *PLOS Medicine* 16.1 (2019). https://doi.org/10.1371/journal.pmed.1002730