# Submission

This notebook contains the inference code for the HuBMAP competition. For our submission, we used a U-Net model with the Tversky loss function and EfficientNet backbone. The model was pretrained on the ImageNet dataset. In order to avoid OOM, images are split into frames on the fly and processed in small batches. In the end the resulting mask. Our submission implements the following steps:
1. Extraction of frames from the original image
2. Downscaling of frames according to the input size of the network
3. Pixel normalization
4. Inference
5. Upscaling to the original size of a frame
6. Cropping the center of a frame in order to avoid the edge effect (optional)
7. Conversion of the inferred masks to binary masks using threshold
8. Reconstruction of the original image
9. Conversion of binary masks to RLE-encoded strings
10. Plotting the resulting masks (optional)
11. Storing RLE-encoded masks on the file system

Setup libraries that are necessary for the submission.

In [None]:
!pip install ../input/keras-applications/Keras_Applications-1.0.8/ -f ./ --no-index
!pip install ../input/image-classifiers/image_classifiers-1.0.0/ -f ./ --no-index
!pip install ../input/efficientnet-1-0-0/efficientnet-1.0.0/ -f ./ --no-index
!pip install ../input/segmentation-models/segmentation_models-1.0.1/ -f ./ --no-index
%env SM_FRAMEWORK=tf.keras

In [None]:
import cv2
import gc
import glob
import math
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import rasterio
import segmentation_models as sm
import tensorflow as tf
import tifffile as tiff
from rasterio.windows import Window
from tensorflow import keras
from tensorflow.keras import backend as K

Our solution uses the Tversky loss function that needs to be defined manually since it is not part of the tensorflow library. The implementation of the loss function was taken from https://www.kaggle.com/code/bigironsphere/loss-function-library-keras-pytorch. 

In [None]:
ALPHA = 0.5
BETA = 0.5

def TverskyLoss(targets, inputs, alpha=ALPHA, beta=BETA, smooth=1e-6):
    '''
    source: https://www.kaggle.com/code/bigironsphere/loss-function-library-keras-pytorch
    
    tversky loss function for keras
    '''
    
    #flatten label and prediction tensors
    inputs = K.flatten(inputs)
    targets = K.flatten(targets)

    #True Positives, False Positives & False Negatives
    TP = K.sum((inputs * targets))
    FP = K.sum(((1-targets) * inputs))
    FN = K.sum((targets * (1-inputs)))

    Tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)  

    return 1 - Tversky

Set notebook variables.

In [None]:
# general notebook variables
DATASET_PATH = '/kaggle/input/hubmap-kidney-segmentation'
TEST_DATASET_PATH = os.path.join(DATASET_PATH, 'test')
TEST_IMG_IDS = [os.path.basename(img_id).split(".")[0] for img_id in glob.glob(f'{TEST_DATASET_PATH}/*.tiff')]
MODEL_PATH = '../input/notest-trainweights-unet-efficentnetb0/unet_efficientnetb0_f01_1.h5'
SUB_CSV = pd.read_csv(os.path.join(DATASET_PATH, 'sample_submission.csv'), index_col='id')
REMOVE_EDGES = False # remove edges to avoid the edge effect
PLOT_RESULT = False # plot resulting masks

# frame variables
INPUT_SIZE = (512, 512) # network input size
FRAME_SIZE = 1024 # size of a frame
FRAME_OVERLAP = 384 # frame overlap size
FRAME_DISCARD = FRAME_OVERLAP // 2 # size of the frame edges that will be removed if REMOVE_EDGES = True
MASK_THRESH = 0.4
BATCH_SIZE = 16

# keras settings
LR = 5e-4
OPTIMIZER = keras.optimizers.RMSprop(LR)
LOSS_FUNC = TverskyLoss
METRICS_ARR = [sm.metrics.f1_score, sm.metrics.iou_score]

Load weights of a trained model and compile.

In [None]:
model = keras.models.load_model(MODEL_PATH, compile=False)
model.compile(OPTIMIZER, loss=LOSS_FUNC, metrics=METRICS_ARR) # same params as in train

Define auxiliary functions. The function `make_grid` (borrowed from https://www.kaggle.com/leighplt/pytorch-fcn-resnet50) creates a grid for the sliding window operation. `rle_encode_less_memory` function (from https://www.kaggle.com/code/bguberfain/memory-aware-rle-encoding) converts the image to an RLE-encoded string. 

In [None]:
def make_grid(shape, window=256, min_overlap=32):
    """
    source: https://www.kaggle.com/leighplt/pytorch-fcn-resnet50
    
    function to generate a grid layout for sliding window
    :param shape: height and width of the image
    :param window: size of the window
    :param min_overlap: minimal window overlap
    :return: array of window coordinates (x1,x2,y1,y2)
    """
    x, y = shape
    nx = x // (window - min_overlap) + 1
    x1 = np.linspace(0, x, num=nx, endpoint=False, dtype=np.int64)
    x1[-1] = x - window
    x2 = (x1 + window).clip(0, x)
    ny = y // (window - min_overlap) + 1
    y1 = np.linspace(0, y, num=ny, endpoint=False, dtype=np.int64)
    y1[-1] = y - window
    y2 = (y1 + window).clip(0, y)
    slices = np.zeros((nx,ny, 4), dtype=np.int64)
    
    for i in range(nx):
        for j in range(ny):
            slices[i,j] = x1[i], x2[i], y1[j], y2[j]    
    return slices.reshape(nx*ny,4)

def rle_encode_less_memory(img):
    """
    source: https://www.kaggle.com/code/bguberfain/memory-aware-rle-encoding
    
    function This method requires first and last pixel to be zero
    :param img: numpy array, 1 - mask, 0 - background
    :return: run length as string formated
    """
    pixels = img.T.flatten()
    pixels[0] = 0
    pixels[-1] = 0
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

def remove_edges(frame, coordinates, discard):
    """
    function to crop the center of a frame
    :param frame: frame to be processed
    :param coordinates: coordinates of the frame in the original image
    :param discard: size of margin to be removed
    :return: cropped frame and its new coordinates
    """
    x1,x2,y1,y2 = coordinates
    
    if x1 != 0:
        x1 += discard
        frame = frame[:,discard:]
    if x2 != shape[1]:
        x2 -= discard
        frame = frame[:,:-discard or None]
    if y1 !=0:
        y1 += discard
        frame = frame[discard:,:]
    if y2 != shape[0]:
        y2 -= discard
        frame = frame[:-discard or None,:]
    return (frame, (x1,x2,y1,y2))

def plot_result(dataset_path, img_id, mask):
    """
    function to plot inferred masks
    :param dataset_path: path to the test dataset
    :param img_id: name of the image file (without extension)
    :param mask: inferred mask
    :return: None
    """
    full_image = tiff.imread(os.path.join(dataset_path, f'{img_id}.tiff'))
        
    # reshape image if necessary
    if len(full_image.shape) == 5:
        full_image = full_image.squeeze()
    if full_image.shape[0] == 3:
        full_image = full_image.transpose(1, 2, 0)

    # create a copy of the image
    full_image = full_image.copy()

    scale_factor = 10
    full_image = cv2.resize(full_image,(full_image.shape[1]//scale_factor,full_image.shape[0]//scale_factor), interpolation = cv2.INTER_CUBIC)
    mask = cv2.resize(mask,(mask.shape[1]//scale_factor, mask.shape[0]//scale_factor), interpolation = cv2.INTER_CUBIC)

    plt.figure(figsize=(15,15))
    plt.imshow(full_image)
    plt.imshow(mask, alpha=0.5, cmap='plasma')
    plt.show()

    del full_image
    gc.collect()

Optionally, our inference code crops the center of every frame in order to avoid the edge effect as described in https://www.kaggle.com/competitions/hubmap-kidney-segmentation/discussion/238013.

In [None]:
for img_id in TEST_IMG_IDS:
    print(img_id)
    
    # source: https://www.kaggle.com/code/iafoss/256x256-images
    # some images have issues with their format
    # and must be saved correctly before reading with rasterio
    img = rasterio.open(os.path.join(TEST_DATASET_PATH, f'{img_id}.tiff'), num_threads='all_cpus')
    if img.count != 3: # the number of raster bands in the dataset
        subdatasets = img.subdatasets # sequence of subdatasets
        layers = []
        if len(subdatasets) > 0:
            for i, subdataset in enumerate(subdatasets, 0):
                layers.append(rasterio.open(subdataset))
                
    # calculate frame coordinates
    shape = img.shape
    frames = make_grid((shape[1], shape[0]), window=FRAME_SIZE, min_overlap=FRAME_OVERLAP)
        
    # placeholder matrix for the resulting mask
    wsi_mask = np.ones(shape, np.uint8)
        
    # process frames as small batches
    for batch_idx in range(0, math.ceil(len(frames) / BATCH_SIZE)):
                
        batch = [] # list of image frames
        batch_coordinates = [] # list of frame coordinates
        
        # create batch
        for frame_idx in range(batch_idx * BATCH_SIZE, min((batch_idx+1) * BATCH_SIZE, len(frames))):            
            img_frame = np.zeros((FRAME_SIZE, FRAME_SIZE, 3), np.uint8)
            x1,x2,y1,y2 = frames[frame_idx]
            
            # source: https://www.kaggle.com/code/iafoss/256x256-images
            if img.count == 3:
                img_frame = np.moveaxis(img.read([1,2,3], window=Window.from_slices((y1,y2),(x1,x2))), 0, -1)
            else:
                for i,layer in enumerate(layers):
                    img_frame[:,:,i] = layer.read(1, window=Window.from_slices((y1,y2),(x1,x2)))
                                
            # resize according to the input size of the network
            img_frame = cv2.resize(img_frame, INPUT_SIZE, interpolation=cv2.INTER_CUBIC)
            
            # convert to float values
            img_frame = tf.cast(img_frame/255.0, tf.float32)
                        
            batch.append(img_frame)
            batch_coordinates.append(frames[frame_idx])
                
        # predict batch
        result = model.predict(np.array(batch))
        
        # process batch
        for res_idx in range(len(result)):
            # get predicted frame mask
            res_frame = result[res_idx]
            res_coordinates = batch_coordinates[res_idx]
            
            # resize to original size
            res_frame = cv2.resize(res_frame, (FRAME_SIZE, FRAME_SIZE), interpolation=cv2.INTER_CUBIC)
            
            if REMOVE_EDGES:
                res_frame,res_coordinates = remove_edges(res_frame, res_coordinates, FRAME_DISCARD)
            
            # apply threshold to the predicted mask
            res_frame = (res_frame > MASK_THRESH).astype(np.uint8)
            
            # insert predicted frame into
            x1,x2,y1,y2 = res_coordinates # get coordinates of the frame
            wsi_mask[y1:y2,x1:x2] = wsi_mask[y1:y2,x1:x2] * res_frame
            
        del res_frame, res_coordinates, result, batch, batch_coordinates, img_frame
        gc.collect()
        
    # save as csv
    SUB_CSV.loc[img_id,'predicted'] = rle_encode_less_memory(wsi_mask)
    
    if PLOT_RESULT:
        plot_result(TEST_DATASET_PATH, img_id, wsi_mask)
        
    del img, frames, wsi_mask
    gc.collect()

Write resulting masks to a CSV file.

In [None]:
SUB_CSV.to_csv('submission.csv')
SUB_CSV