# Segmentation of Medical Scans using Variational VAE's - Part 1/3
This series of notebooks enables reproduceability of our final models and testing results.

The first notebook goes through the process of preparing/preprocessing and understanding our data.

We import some necessary libraries, and check if our GPU is available, while also retrieving some system stats. We need a lot of RAM, because our selected datasets are very large. We setup up some global constants.

In [1]:
# For ML
import torch
from torch import Tensor
import torchvision.transforms as Transform

# For reading raw data.
import json
import nibabel as nib

# For displaying and evaluating results.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap

# For monitoring resource-usage and progress.
from tqdm import tqdm # Install ipywidgets to remove warning.
import psutil
from os.path import join, exists


root_dir = '../' # Location of project, relative to the working directory.
raw_data_dir = join(root_dir, 'raw_data')
prep_data_dir = join(root_dir, 'prep_data')
losses_dir = join(root_dir, 'losses')
models_dir = join(root_dir, 'saved_models')
checkpoint_dir = join(root_dir, 'checkpoints')


# Setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using', device)

if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('CUDA version:', torch.version.cuda)

available_ram = round(psutil.virtual_memory()[0]/1000000000,2)
print('RAM: ' + str(available_ram) + 'GB')

Using cuda
NVIDIA GeForce GTX 1070
CUDA version: 11.7
RAM: 16.74GB


We preprocess and format datasets from raw data for each specified organ, using the above function. We save progress after each organ is completed. Can be interrupted and resumed at any time, and accounts for progress, which has already been made. We define a function which loads and stores our data in the proper formatting. As the datasets are huge and have to concatenate each set of 240 slices to the previous, we monitor progress and RAM-usage.

In [2]:
def prep_data(organ, type, resolution):
    with open(join(raw_data_dir, organ, 'dataset.json')) as f:
        manifest = json.load(f)['training']
    
    bar = tqdm(total=len(manifest))
    bar.set_description('Prepping ' + organ + ' ' + type + 's')
    
    resize = Transform.Resize((resolution, resolution))

    try: 
        images = torch.zeros((0, resolution, resolution))

        for entry in manifest:
            bar.set_postfix(**{'RAM':round(psutil.virtual_memory()[3]/10e8, 2)})
            bar.update()

            nii_img = nib.load(join(raw_data_dir, organ, entry[type][2:]))

            # Convert to numpy array, then pytorch tensor.
            nii_data = Tensor(nii_img.get_fdata())
            
            # Scale between 0 and 1. Try -1 and 1?
            nii_data -= nii_data.min()
            nii_data /= nii_data.max()

            nii_data = nii_data.permute(2, 0, 1) # (slice, rows, columns)
            nii_data = resize(nii_data)
            images = torch.cat((images, nii_data), 0)
        
        torch.save(images, join(prep_data_dir, organ + '_' + type + '_slices_' + str(resolution) + '.pt'))

    except KeyboardInterrupt:
        print('Manually stopped.')
    
    bar.close()

We call the preprocessor function for the organs, we wish to train on.

In [3]:
lod = 2**8          # Level of detail.
resolution = lod    # 2**8 = 256
do_prep = False     # Toggle to prep data.

organs = ['spleen'] #,'colon','pancreas','lung','liver']

if do_prep:
    for organ in organs:
        prep_data(organ,'image',resolution)
        prep_data(organ,'label',resolution)
else:
    print('Data already prepped.')

Data already prepped.
