## Introduction

This tutorial showcases how to use Kvikio to accelerate the loading of NIFTI images. We will also utilize the `nibabel` library to handle this medical image format.

### Common Medical Image Formats

Medical images are complex due to the extensive metadata they contain, which includes patient information, imaging parameters, and more.

NIfTI (Neuroimaging Informatics Technology Initiative) is one of the most common formats:

- **Description**: A popular format for storing brain imaging data, particularly in research settings. It is designed to store volumetric data and is often used in neuroimaging.
- **Usage**: Widely used in neuroscience research and supported by many neuroimaging software packages.

### Extra Library Used

#### NiBabel
- **Description**: A Python library for reading and writing medical image formats, particularly NIfTI and Analyze.
- **Usage**: Commonly used in neuroimaging research for handling NIfTI files.

### GPU Acceleration with Kvikio

Kvikio is a powerful tool that leverages GPU acceleration to significantly speed up the loading and processing of medical images. In this tutorial, we will demonstrate how to use Kvikio to efficiently handle NIFTI images, providing a performance comparison between CPU and GPU processing.

By the end of this tutorial, you will understand:
- How to load NIFTI images using `nibabel`.
- How to accelerate the loading and processing of these images using Kvikio.
- The performance benefits of using GPU acceleration for medical image processing.

### Setup Environment

In [1]:
# Check if nibabel is installed, if not, install it
!python -c "import nibabel" || pip install -q nibabel

In [None]:
import kvikio
import kvikio.defaults
import cupy as cp
import numpy as np
import tempfile
import nibabel as nib
import os
import requests
import tarfile
import gzip
import shutil
import io
from timeit import default_timer as timer

### Warmup Kvikio

In [3]:
def warmup_kvikio():
    """
    Warm up the Kvikio library to initialize the internal buffers, cuFile, GDS, etc.
    """
    # warmup cuFile
    a = cp.arange(100)
    with tempfile.NamedTemporaryFile() as tmp_file:
        tmp_file_name = tmp_file.name
        f = kvikio.CuFile(tmp_file_name, "w")
        f.write(a)
        f.close()

        b = cp.empty_like(a)
        f = kvikio.CuFile(tmp_file_name, "r")
        f.read(b)

    # warmup cupy
    c = cp.random.rand(100, 100, 3)
    d = cp.mean(c)

warmup_kvikio()

### Set Kvikio Threads

KvikIO can automatically use multiple threads for I/O operations. Setting the environment variable `KVIKIO_NTHREADS` to the desired number of threads may improve performance. In this tutorial, 4 threads are used. For more details, refer to the [official documentation](https://docs.rapids.ai/api/kvikio/nightly/runtime_settings/#thread-pool-kvikio-nthreads).

In [4]:
kvikio.defaults.num_threads_reset(nthreads=4)

### NIFTI Data Preparation

For NIFTI images, we will use the [MSD Spleen dataset](https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar) from the [Medical Segmentation Decathlon](http://medicaldecathlon.com/dataaws/). This dataset is commonly used for training and evaluating medical image segmentation algorithms and provides a good example of volumetric medical imaging data.

Larger datasets typically demonstrate more significant acceleration benefits when using GPU processing. If you are interested in comparing performance with a larger dataset, it is recommended to use images from the [MSD Liver dataset](https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar) for the following experiments. The MSD Liver dataset contains more extensive volumetric data, which can better showcase the advantages of GPU acceleration.

In [5]:
temp_working_dir = tempfile.mkdtemp()

nifti_output_path = os.path.join(temp_working_dir, "Task09_Spleen.tar")
url = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar"
response = requests.get(url, stream=True)
with open(nifti_output_path, "wb") as file:
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)

# Extract the contents
with tarfile.open(nifti_output_path, "r") as tar:
    tar.extractall(path=temp_working_dir)

print(f"Extraction completed! Files are saved in: {temp_working_dir}")

Extraction completed! Files are saved in: /tmp/tmpcjecciqo


In [6]:
# decompress the nii.gz file
example_nifti_path = os.path.join(temp_working_dir, "Task09_Spleen", "imagesTr", "spleen_53.nii")
with gzip.open(example_nifti_path+".gz", "rb") as f_in:
    with open(example_nifti_path, "wb") as f_out:
        shutil.copyfileobj(f_in, f_out)
print("a decompressed nifti file is saved at: ", example_nifti_path)

a decompressed nifti file is saved at:  /tmp/tmpcjecciqo/Task09_Spleen/imagesTr/spleen_53.nii


### Test NIFTI Data Loading

In [7]:
def nifti_gpu_load(filename):
    file_size = os.path.getsize(filename)
    image = cp.empty(file_size, dtype=cp.uint8)

    with kvikio.CuFile(filename, "r") as f:
        f.read(image)

    header_bytes = cp.asnumpy(image[:348])
    header = nib.Nifti1Header.from_fileobj(io.BytesIO(header_bytes))
    data_offset = header.get_data_offset()
    data_shape = header.get_data_shape()
    data_dtype = header.get_data_dtype()
    affine = header.get_best_affine()
    meta = dict(header)
    meta["affine"] = affine
    return image[data_offset:].view(data_dtype).reshape(data_shape, order="F"), meta

In [8]:
# Measure Kvikio GPU loading time
# the saved outputs are run with a Tesla V100-PCIE-16GB GPU
start_gpu = timer()
img_gpu, meta_gpu = nifti_gpu_load(example_nifti_path)
print(img_gpu.shape, img_gpu.mean())
end_gpu = timer()
gpu_time = end_gpu - start_gpu
print(f"Kvikio GPU loading time: {gpu_time:.4f} seconds")

(512, 512, 156) -474.2267
Kvikio GPU loading time: 0.0505 seconds


In [9]:
# Measure CPU loading time
start_cpu = timer()
img_cpu = nib.load(example_nifti_path)
img_cpu_array = img_cpu.get_fdata()
print(img_cpu_array.shape, img_cpu_array.mean())
end_cpu = timer()
cpu_time = end_cpu - start_cpu
print(f"Normal CPU loading time: {cpu_time:.4f} seconds")

(512, 512, 156) -474.22673315879626
Normal CPU loading time: 0.1699 seconds


### validate cpu and gpu data are close

In [10]:
# validate affine
print(np.all(img_cpu.affine == meta_gpu["affine"]))

True


In [11]:
# validate array
print(np.allclose(img_cpu_array, img_gpu))

True


### Cleanup tmp Directory

In [12]:
shutil.rmtree(temp_working_dir)