In [None]:
!pip install patho-pix

# Introduction to Patho-Pix: Whole Slide Image Preprocessing for Pathology

Welcome to the UKSH Healthcare Hackathon! We are excited to present **Patho-Pix**, a cutting-edge framework designed to streamline the preprocessing of whole slide images (WSIs) in pathology. This Jupyter notebook will guide you through the key features and functionalities of Patho-Pix, demonstrating its potential to enhance the efficiency and accuracy of digital pathology workflows.

## Why Whole Slide Imaging?

Whole slide imaging (WSI) represents a significant advancement in pathology, allowing for the digitization of entire histological slides at high resolution. This technology facilitates remote diagnostics, educational initiatives, and computational pathology research. However, the sheer size and complexity of WSIs pose substantial challenges for data management, processing, and analysis.

## The Need for Preprocessing

Effective preprocessing of WSIs is crucial for several reasons:
- **Quality Enhancement**: Enhancing image quality by correcting artifacts, normalizing stains, and balancing colors ensures more reliable analyses.
- **Data Reduction**: Reducing the data size through techniques like tiling and compression allows for more manageable storage and faster processing.
- **Feature Extraction**: Identifying and isolating regions of interest (ROIs) facilitates targeted analyses and reduces computational load.

## Introducing Patho-Pix

Patho-Pix is designed to address these challenges by providing a comprehensive suite of tools for WSI preprocessing. Our framework includes functionalities such as:

- **Tissue Masks & ROI Detection**: Automatically identify and extract regions of interest based on tissue masks for further examination.
- **Image Tiling**: Divide large WSIs into smaller, more manageable tiles for focused analysis.
- **Stain Normalization**: Standardize staining across slides to minimize variability and enhance visual consistency.
- **FUTURE OUTLOOK: Artifact Removal**: Automatically detect and correct common artifacts in pathology slides.

## Demonstration Overview

In this notebook, we will walk you through the core capabilities of Patho-Pix, showcasing its application to sample WSIs. You will see how our framework can transform raw slide images into preprocessed data ready for analysis. The demonstration will cover the following steps:

1. **Loading and Visualizing WSIs**: Importing whole slide images and displaying them for initial inspection.
2. **Tiling and ROI Extraction**: Segmenting the WSIs into tiles and extracting regions of interest based on tissue masks for detailed analysis.
3. **Metadata Extraction**: Utilize additional metadata beside purely imaging for later artificial intelligence models.
4. **Stain Normalization**: Standardizing the color profiles of the slides to ensure uniformity.

By the end of this demonstration, you will have a clear understanding of how Patho-Pix can be integrated into digital pathology workflows to enhance the preprocessing of whole slide images, ultimately leading to more accurate and efficient diagnostic and research outcomes.

Let's get started!

## 0. Setup

In [None]:
from sys import platform

if platform == "linux" or platform == "linux2":
    !apt update && apt install -y openslide-tools
elif platform == "darwin":
    !brew install openslide

In [None]:
import requests
import os
import tempfile
from patho_pix.utils import convert_jpeg_to_tiff

# Download images
url_img = "http://glioblastoma.alleninstitute.org/cgi-bin/imageservice?path=" + \
           "/external/gbm/prod0/0534338971/0534338971.aff&mime=1&fileout=100125374_2." + \
           "jpg&zoom=9&top=20608&left=55168&width=15040&height=18048"
url_mask = "http://glioblastoma.alleninstitute.org/cgi-bin/imageservice?path=" + \
          "/external/gbm/prod0/0534338827/0534338827_annotation.aff&mime=1" + \
          "&fileout=100122048_1.jpg&zoom=9&top=20224&left=57888&width=15040&height=18048"

# Create temporary directory for dummy data
tmp_data = tempfile.TemporaryDirectory(prefix="tmp.patho-pix.")
# Download dummy image
print("Downloading dummy image")
response = requests.get(url_img)
if response.status_code == 200:
    path_img = os.path.join(tmp_data.name, "image.jpg")
    with open(path_img, "wb") as fd:
        fd.write(response.content)
# Download dummy mask
print("Downloading dummy mask")
response = requests.get(url_mask)
if response.status_code == 200:
    path_mask = os.path.join(tmp_data.name, "mask.jpg")
    with open(path_mask, "wb") as fd:
        fd.write(response.content)
# convert to tiff
convert_jpeg_to_tiff(path_img, path_img.replace(".jpg", ".tiff"))
convert_jpeg_to_tiff(path_mask, path_mask.replace(".jpg", ".tiff"))
path_img = path_img.replace(".jpg", ".tiff")
path_mask = path_mask.replace(".jpg", ".tiff")

## 1. Loading and Visualizing WSIs

In this chapter, we will cover the process of importing whole slide images (WSIs) into our Patho-Pix framework and displaying them for initial inspection. This step is crucial as it allows us to visually assess the quality and characteristics of the slides before applying any preprocessing techniques. We will demonstrate how to load WSIs from a common pathology file format (BigTiff) and explore basic visualization tools to navigate and examine the high-resolution images as simplistic thumbnails.

In [None]:
# Load scans
from patho_pix.io import load_mask, load_wsi

tile_dir_img = tempfile.TemporaryDirectory(prefix="tmp.patho-pix.img")
wsi = load_wsi(path_img, tile_dir_img.name)

tile_dir_mask = tempfile.TemporaryDirectory(prefix="tmp.patho-pix.mask")
mask = load_mask(path_mask, tile_dir_mask.name)

In [None]:
# Create thumbnail for image
wsi.thumbnail

In [None]:
# Create thumbnail for mask
mask.thumbnail

## 2. Tiling and ROI Extraction via Tissue Mask

In this chapter, we will delve into the process of segmenting whole slide images (WSIs) into smaller, manageable tiles and extracting regions of interest (ROIs). Tiling is essential for handling the large size of WSIs, making it easier to focus on specific areas for detailed analysis. We will demonstrate how Patho-Pix automates the tiling process and efficiently identifies and extracts ROIs, ensuring that critical pathological features are highlighted for further examination.


In [None]:
# preview tiling
from patho_pix.utils import AwesomeTiler
wsi_tiler = AwesomeTiler(
    tile_size=(1024, 1024),
    check_tissue=True,
    tissue_percent=10.0,
    prefix="patho-fix.",
    suffix=".png",
)
wsi_tiler.locate_tiles(wsi)

In [None]:
# run tiling
from patho_pix.tiling import tile_wsi_mask
metadata = tile_wsi_mask(wsi, mask)

In [None]:
# demonstrate tile example
from PIL import Image
image = Image.open(os.path.join(tile_dir_img.name, os.listdir(tile_dir_img.name)[40]))
image.resize((340,340))

In [None]:
# show tile files
os.listdir(tile_dir_img.name)

## 3. Metadata

Metadata plays a crucial role in the analysis and interpretation of whole slide images (WSIs). It provides essential information about the image, such as dimensions, resolution, and tissue composition. One important aspect of metadata extraction is determining the tissue percentage per tile, which involves calculating the proportion of tissue versus non-tissue areas within each segmented tile. This metric helps pathologists and researchers focus on the most relevant regions, facilitating more accurate diagnoses and analyses. Patho-Pix includes tools to automatically extract and analyze such metadata, enhancing the efficiency and effectiveness of digital pathology workflows.

In [None]:
# extract metadata
import pandas as pd
df = pd.DataFrame.from_dict(metadata, orient="index", columns=["percentage_tissue"])
print(df)


## 4. Stain Normalization

Future work