## This is an interactive file for transforming the raw t-cell/dcell dataset into compressed files. This can take a load on a low-RAM computer for big datasets.

### Input format:
- A folder containing images 
- Each image has a counterpart: for each "filename" (letter - digit), there is a red image, and a green image
- red image = tcell
- green image = dendritic cell
- we need both the separated images (B&W) and the combined images (RGB)
- Each image is 2048x2048 8MB TIFF image

### Steps
1. Pass a 192x192 sliding window over the images. (or else, set in config.py) 
2. Store the filenames
3. Take each of the reduced images, and combine them to create RGB images (red channel = tcell, green channel = dcell)
4. Calculate the intersection over union overlap for each image 
5. Store the combined images, overlap values, combined labels, and filenames in a file for development

### Output:
**DATA_combined.npz**
- x_combined: combined patches of images 
- y_combined: labels for combined images
- y_overlaps: overlap values for combined images 
- filenames: raw filenames for un-combined images (for debugging and labelling purposes)

## Step 1: pass a sliding window over the images

In [1]:
from dataset_helpers import read_folder_filenames
from config import repo_path, imw

from skimage.io import imread
import numpy as np

In [2]:
def sliding_window(img, dest_size, rgb=False):
    """
    This function passes a sliding window over an image
    and returns sub-images
    --> more detail
    --> more training data
    """

    new_img = np.full_like(img, img)

    size = img.shape[0]
    if dest_size > size or dest_size % 2 != 0:
        raise Exception(
            "destination size is bigger than picture size or destination size is not even")

    qty = size // dest_size
    if size % dest_size != 0:
        # need to crop out the left and bottom (less significant in dataset)
        crop = size - dest_size * qty
        new_img = new_img[crop:, :-crop]

    if rgb:
        windows = np.ndarray(
            shape=(qty**2, dest_size, dest_size, 3), dtype=np.uint16)
    else:
        windows = np.ndarray(
            shape=(qty**2, dest_size, dest_size), dtype=np.uint16)

    i = 0
    for row in range(qty):
        y = row * dest_size
        x = 0
        for col in range(qty):
            #print("x:coord {},{} - y:coord {},{}".format(x, x+dest_size, y, y+dest_size))
            windows[i] = new_img[x:x + dest_size, y:y + dest_size]
            x += dest_size
            i += 1

    return windows

In [6]:
def images_to_patches(filenames, size):
    """
    returns:
    @image arrays in shape (size, size, 1)
    @filenames (unmodified)
    
    @parameters:
    filenames = all filenames of files to compress
    size = size of output images 
    
    @assumptions:
    * validity of filenames has been checked
    """
    
    patches = []
    fn = []
    
    for file in filenames:
        img = imread(file)
        windows = sliding_window(img, size)
        img = None
        for img in windows:
            patches.append(img)
            fn.append(file)
            img = None
        windows = None
    
    patches = np.array(patches)
    fn = np.array(fn)
    
    print("All files turned into patches of size {}".format(size))
    return patches, fn

In [7]:
filenames = sorted(read_folder_filenames(repo_path + 'data/sample_data/raw/images'))

In [8]:
x, filenames = images_to_patches(filenames, imw)

All files turned into patches of size 192


## Step 2: Combined images and capture metrics from the images

In [9]:
from dataset_helpers import combine_images
from segmentation import get_mask, iou

In [10]:
# combined images, associated label
x_combined, y_combined = combine_images(x, filenames, mask=False)

Images preprocessed. Size of dataset: 1800


In [11]:
def get_overlaps(x, y):
    overlaps = np.ndarray(shape=(len(x),), dtype=np.float32) # overlap values - combined

    # initialise index values
    i = 0

    print("Looping through images...")
    while i < len(x):
        if y[i] == 3:
            # image is faulty
            overlaps[i] = 0
        else:
            overlaps[i] = iou(get_mask(x[i, ..., 1]), get_mask(x[i, ..., 0]))

        i += 1
    return overlaps

y_overlaps = get_overlaps(x_combined, y_combined)
print("Overlaps have been counted")

Looping through images...
Overlaps have been counted


In [12]:
np.savez_compressed(repo_path + 'data/sample_data/processed/sample_combined.npz', 
                    x_combined=x_combined, y_combined=y_combined, y_overlaps=y_overlaps, filenames=filenames)