# Data preprocessing
Here we traverse the training data directory and create a manifest for the aeon loader. Aeon expects the manifest to contain the filename of the image and the filename for the image mask. I think the image mask has to be a single file so we need to traverse the mask directories and merge the mask files together for each image.

Nevertheless, I think we can modify U-Net to preict separable masks. Instead of using the np.maximum when combining masks, we use np.sum. This way the mask will be 1 in places where there is only 1 segmentation mask, 2 where 2 masks overlap, 3 where 3 masks overlap, and so forth. We can treat each overlap as its own "class" for U-Net to predict (similar to what SegNet does for Adas). Then in post-processing, we separate the masks.

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt

from skimage.io import imread, imsave

%matplotlib inline

In [2]:
TRAIN_PATH = "data/train"

## Navigate the directory where the train data is stored

In [3]:
# Get train IDs
train_ids = next(os.walk(TRAIN_PATH))[1] 

In [4]:
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

## Read the original image

In [5]:
def get_image(ROOT_PATH, x):
    
    return imread(ROOT_PATH + "/images/" + train_ids[x] + ".png")

## Read the masks from the image and combine them

For now we just combine the individual masks into one single mask (np.maximum). Eventually, we can sum the masks together (np.sum) so that the overlaps are marked as different classes. For example, if 4 masks overlap on a pixel, then it is class 4. In post-processing we can segment the in 

In [6]:
def get_combined_mask(ROOT_PATH):
    
    mask_file = next(os.walk(ROOT_PATH + '/masks/'))[2][0]
    mask = imread(ROOT_PATH + "/masks/" + mask_file)
    
    for mask_file in next(os.walk(ROOT_PATH + '/masks/'))[2][1:]:
    
        mask = np.maximum(mask, imread(ROOT_PATH + "/masks/" + mask_file))
    
    return mask

## Display the image and the combined masks

In [7]:
def plot_image(x):
    
    print("Use the slider or arrow keys to move through the images.")
    ROOT_PATH = TRAIN_PATH + "/{}".format(train_ids[x])
    img = get_image(ROOT_PATH, x)
    print("Image shape = {}".format(np.shape(img)))
    plt.figure(figsize=(10,10))
    plt.subplot(1,2,1)
    plt.imshow(img)
    
    plt.subplot(1,2,2)
    msk = get_combined_mask(ROOT_PATH)
    plt.imshow(msk, cmap='gray')
    

In [8]:
interact(plot_image, x=widgets.IntSlider(min=0,max=len(train_ids),step=1,value=10));

A Jupyter Widget

In [11]:
from tqdm import tqdm # Fancy progress bar
import warnings # Used to ignore warnings from skimage

## Script to go through all images in the directory and create the manifest file for aeon loader

Loops through the images and combines the masks into one file. Saves the processed masks into a new directory and creates a manifest file for the aeon dataloader to use. This will allow us to use neon as the framework.

In [12]:
PROCESS_DIR_TRAIN = "data_processed/train/"

os.makedirs(PROCESS_DIR_TRAIN + "/images/", exist_ok=True) #Make the processed data directory
os.makedirs(PROCESS_DIR_TRAIN + "/masks/", exist_ok=True) #Make the processed data directory

manifest = {}

for idx in tqdm(range(len(train_ids))):
    
    ROOT_PATH = TRAIN_PATH + "/{}".format(train_ids[idx])
    img = get_image(ROOT_PATH, idx)
    msk = get_combined_mask(ROOT_PATH)
    
    fileName = PROCESS_DIR_TRAIN + "images/" + train_ids[idx] + ".png"
    maskName = PROCESS_DIR_TRAIN + "masks/" + train_ids[idx] + "_mask.png"
    manifest[idx] = [fileName, maskName]
    
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")  # Ignore warnings about contrast
        imsave(fileName, img)
        imsave(maskName, msk)
    
# Save the manifest file
with open('manifest.tsv', 'w') as file:
    file.write("@FILE\tFILE\n")
    
    for key, value in manifest.items():
        file.write("{}\t{}\n".format(value[0], value[1]))
    
    

100%|██████████| 670/670 [02:05<00:00,  5.33it/s]
