# Data Preparation for 2D Medical Imaging

## Kidney Segmentation with PyTorch Lightning and OpenVINO™ - Part 1

This tutorial is part of a series on how to train, optimize, quantize and show live inference on a medical segmentation model. The goal is to accelerate inference on a kidney segmentation model. The [UNet](https://arxiv.org/abs/1505.04597) model is trained from scratch; the data is from [Kits19](https://github.com/neheller/kits19).

 The Kits19 Nifty images are 3D files. Kidney segmentation is a relatively simple problem for neural networks - it is expected that a 2D neural network should work quite well. 2D networks are smaller, and easier to work with than 3D networks, and image data is easier to work with than Nifty files. 

This first tutorial in the series shows how to:
 
- Load Nifty images and get the data as array
- Apply windowing to a CT scan to increase contrast
- Convert Nifty data to 8-bit images

> Note: This will not result in the best kidney segmentation model. Optimizing the kidney segmentation model is outside the scope of this tutorial. The goal is to have a small model that works reasonably well, as a starting point.


All notebooks in this series:

- Data Preparation for 2D Segmentation of 3D Medical Data (this notebook)
- Train a 2D-UNet Medical Imaging Model with PyTorch Lightning (will be published soon)
- [Convert and Quantize a UNet Model and Show Live Inference](../110-ct-segmentation-quantize/110-ct-segmentation-quantize.ipynb)
- [Live Inference and Benchmark CT-scan data](../210-ct-scan-live-inference/210-ct-scan-live-inference.ipynb) 


## Instructions

To install the requirements for running this notebook, please follow the instructions in the README. 

Before running this notebook, you must download the Kits19 dataset, with code from https://github.com/neheller/kits19.

**This code will take a long time to run. The downloaded data takes up around 21GB of space, and the converted images around 3.5GB**. Downloading the full dataset is only required if you want to train the model yourself. To show quantization on a downloadable subset of the dataset, see the [Convert and Quantize a UNet Model and Show Live Inference](../110-ct-segmentation-quantize/110-ct-segmentation-quantize.ipynb) tutorial.


To do this, first clone the repository and install the requirements. It is recommend to install the requirements in the `openvino_env` virtual environment. In short:

```
    1. git clone https://github.com/neheller/kits19
    2. cd kits19
    3. pip install -r requirements.txt
    4. python -m starter_code.get_imaging
```

If you installed the Kits19 requirements in the `openvino_env` environment, you will have installed [nibabel](https://nipy.org/nibabel/). If you get an importerror, you can install nibabel in the current environment by uncommenting and running the first cell.

## Imports

In [None]:
# Uncomment this cell to install nibabel if it is not yet installed
# %pip install nibabel

In [None]:
import os
import time
from pathlib import Path
from typing import Optional, Tuple

import cv2
import matplotlib.pyplot as plt
import nibabel as nib
import numpy as np

## Settings

Set `NIFTI_PATH` to the root directory of the Nifty files. This is the directory that contains subdirectories `case_00000` to `case_00299` containing _.nii.gz_ data. FRAMES_DIR should point to the directory to save the frames.

In [None]:
# Adjust NIFTI_PATH to directory that contains case_00000 to case_00299 files with .nii.gz data
NIFTI_PATH = Path("~/kits19/data").expanduser()
FRAMES_DIR = "kits19_frames"

# This assert checks that the directory exists, but not that the data in it is correct
assert NIFTI_PATH.exists(), f"NIFTI_PATH {NIFTI_PATH} does not exist"

## Show One CT-scan

Let's load one CT-scan and visualize the scan and the label

In [None]:
mask_path = NIFTI_PATH / "case_00002/segmentation.nii.gz"
image_path = mask_path.with_name("imaging.nii.gz")
nii_mask = nib.load(mask_path)
nii_image = nib.load(image_path)

mask_data = nii_mask.get_fdata()
image_data = nii_image.get_fdata()
print(image_data.shape)

A CT-scan is a 3D image. To visualize this in 2D, we can create slices, or frames. This can be done in three [anatomical planes](https://en.wikipedia.org/wiki/Anatomical_plane): from the front (coronal) , from the side (sagittal), or from the top (axial).

Since a kidney is relatively small, most pixels do not contain kidney data. For an indication, let's check the fraction of pixels that contain kidney data, by dividing the number of non-zero pixels by the total number of pixels in the scan. 

In [None]:
np.count_nonzero(mask_data) / np.size(mask_data)

This number shows that in this particular scan, less than one percent of all pixels in the scan belongs to a kidney. 

We find frames with pixels that are annotated as kidney, and show the kidney from all three sides

In [None]:
z = np.argmax([np.count_nonzero(item) for item in mask_data])
x = np.argmax([np.count_nonzero(item) for item in np.transpose(mask_data, (1, 2, 0))])
y = np.argmax([np.count_nonzero(item) for item in np.transpose(mask_data, (2, 1, 0))])
print(z, x, y)

In [None]:
def show_slices(z: int, x: int, y: int):
    fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(12, 6))
    ax[0, 0].imshow(image_data[z], cmap="gray")
    ax[1, 0].imshow(mask_data[z], cmap="gray", vmin=0, vmax=2)
    ax[0, 1].imshow(image_data[:, x, :], cmap="gray")
    ax[1, 1].imshow(mask_data[:, x, :], cmap="gray", vmin=0, vmax=2)
    ax[0, 2].imshow(image_data[:, :, y], cmap="gray")
    ax[1, 2].imshow(mask_data[:, :, y], cmap="gray", vmin=0, vmax=2);

In [None]:
show_slices(z, x, y)

The image above shows three slices, from three different perspectives, in different places in the body. The middle slices shows two colors, indicating a kidney and a tumor were annotated in this slice.

## Apply Window-Level to Increase Contrast

CT-scan data can contain a large range of pixel values. This means that the contrast in the slices shown above is low. We show histograms to visualize the distribution of the pixel values.  We then apply a soft tissue window level to increase the contrast for soft tissue in the visualization. See [Radiopaedia](https://radiopaedia.org/articles/windowing-ct) for information on windowing CT-scan data.

In [None]:
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(15, 4))
axs[0].hist(image_data[z, ::])
axs[1].hist(image_data[:, x, :])
axs[2].hist(image_data[:, :, y]);

In [None]:
# (-125,225) is a suitable level for visualizing soft tissue
window_start = -125
window_end = 225
image_data[image_data < window_start] = window_start
image_data[image_data > window_end] = window_end
show_slices(z, x, y)

## Extract Slices from Nifty Data

The `save_kits19_frames` function has the mask_path of one nii.gz segmentation mask as argument, and converts the mask and corresponding image to a series of images that are saved as jpg (for images) and png (for masks).

In [None]:
def save_kits19_frames(
    mask_path: Path,
    root_dir: os.PathLike,
    window_level: Optional[Tuple] = None,
    make_binary: bool = True,
):
    """
    Save Kits19 CT-scans to image files, optionally applying a window level.
    Images and masks are saved in a subdirectory of root_dir: case_XXXXX.
    Images are saved in imaging_frames, masks in segmentation frames, which are
    both subdirectories of the case directory.
    Frames are taken in the axial direction.

    :param mask_path: Path to segmentation.nii.gz file. The corresponding imaging.nii.gz
                      file should be in the same directory.
    :param root_dir: Root directory to save the generated image files. Will be generated
                     if it does not exist
    :param window_level: Window level top apply to the data before saving
    :param make_binary: If true, create a binary mask where all non-zero pixels are
                        considered to be "foreground" pixels and get pixel value 1.
    """
    start_time = time.time()

    Path(root_dir).mkdir(exist_ok=True)
    image_path = mask_path.with_name("imaging.nii.gz")

    assert mask_path.exists(), f"mask_path {mask_path} does not exist!"
    assert image_path.exists(), f"image_path {image_path} does not exist!"

    nii_mask = nib.load(mask_path)
    nii_image = nib.load(image_path)

    mask_data = nii_mask.get_fdata()
    image_data = nii_image.get_fdata()

    assert mask_data.shape == image_data.shape, f"Mask and image shape of {mask_path} are not equal"
    if make_binary:
        mask_data[mask_data > 0] = 1

    if window_level is not None:
        window_start, window_end = window_level
        image_data[image_data < window_start] = window_start
        image_data[image_data > window_end] = window_end

    image_directory = Path(root_dir) / mask_path.parent.name / "imaging_frames"
    mask_directory = Path(root_dir) / mask_path.parent.name / "segmentation_frames"
    image_directory.parent.mkdir(exist_ok=True)
    image_directory.mkdir(exist_ok=True)
    mask_directory.mkdir(exist_ok=True)

    for i, (mask_frame, image_frame) in enumerate(zip(mask_data, image_data)):
        image_frame = (image_frame - image_frame.min()) / (image_frame.max() - image_frame.min())
        image_frame = image_frame * 255
        image_frame = image_frame.astype(np.uint8)

        new_image_path = str(image_directory / f"{mask_path.parent.name}_{i:04d}.jpg")
        new_mask_path = str(mask_directory / f"{mask_path.parent.name}_{i:04d}.png")
        cv2.imwrite(new_image_path, image_frame)
        cv2.imwrite(new_mask_path, mask_frame)

    end_time = time.time()
    print(
        f"Saved {mask_path.parent.name} with {mask_data.shape[0]} frames "
        f"in {end_time-start_time:.2f} seconds"
    )

Running the next cell will convert all Nifty files in NIFTI_PATH to images that are saved in FRAMES_DIR. A soft tissue window level of (-125,225) is appplied and the segmentation labels are converted to binary kidney segmentations.

Running this cell will take quite a long time.

In [None]:
mask_paths = sorted(NIFTI_PATH.glob("case_*/segmentation.nii.gz"))

for mask_path in mask_paths:
    save_kits19_frames(
        mask_path=mask_path, root_dir=FRAMES_DIR, window_level=(-125, 225), make_binary=True
    )

## References

- [Kits19 Challenge Homepage](https://kits19.grand-challenge.org/)
- [Kits19 Github Repository](https://github.com/neheller/kits19)
- [The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes](https://arxiv.org/abs/1904.00445)
- [The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge](https://www.sciencedirect.com/science/article/pii/S1361841520301857)
