# ***Disclaimer:*** 
Hello Kagglers! I am a Solution Architect with the Google Cloud Platform. I am a coach for this competition, the focus of my contributions is on helping users to leverage GCP components (GCS, TPUs, BigQueryetc..) in order to solve large problems. My ideas and contributions represent my own opinion, and are not representative of an official recommendation by Google. Also, I try to develop notebooks quickly in order to help users early in competitions. There may be better ways to solving particular problems, I welcome comments and suggestions. Use my contributions at your own risk, I don't garantee that they will help on winning any competition, but I am hoping to learn by collaborating with everyone.

# Objective:


The objective of this notebook is to demonstrate how make a submission by using a trained model previously stored in a private dataset. This version is optimized to be memory efficient. I have previously tried to process the images in memory, and kept running out of memory. The solution was to write the image data into a TFRecord file and then use tf.Data.Dataset to feed the TFRecord file to the model for predictions. The tf.Data.Dataset reads the data sequentially from the file and therefore no extra copy of the image is needed in memory. This was the ONLY way I found for submissions to execute succefully both in the public and private datasets.

The Keras model that is used was built and trained using TPUs as described in this notebook, the dataset was all tiles with gloms for 200 epochs:
[https://www.kaggle.com/marcosnovaes/hubmap-unet-keras-model-fit-with-tpu/](https://www.kaggle.com/marcosnovaes/hubmap-unet-keras-model-fit-with-tpu/)

I also have a GPU version that is not as powerful :
[https://www.kaggle.com/marcosnovaes/hubmap-unet-keras-model-fit-with-gpu](https://www.kaggle.com/marcosnovaes/hubmap-unet-keras-model-fit-with-gpu)

The Unet Keras model utilized is the one proposed by a [popular paper in biomedical image segmentation](https://arxiv.org/abs/1505.04597), by (Olaf Ronneberger, Philipp Fischer, Thomas Brox).

The particular implementation used is the one proposed by by [Dr. Bradley Erickson](https://github.com/slowvak), available in the: [The Magician's Corner repository](https://github.com/RSNA/MagiciansCorner/blob/master/UNetWithTensorflow.ipynb). 

The basic modification that I have made to the implementation provided by Dr. Erickson is to enable the Tensorflow distributed training strategy (tf.strategy). You will notice that the function model.fit() is used within a strategy.scope(), so that it leverages either GPU or TPU acceleration. 

In previous notebooks, I demonstrated how to read the competition data and produce a TFRecord dataset tiling the images in 512x512 tiles. This Notebook will use this dataset as input to the Keras Unet model:
--> [Link to the TFRecord Dataset used for training.](https://www.kaggle.com/marcosnovaes/hubmap-tfrecord-512)

Previous Notebooks in this competition: 

[https://www.kaggle.com/marcosnovaes/hubmap-3-unet-models-with-keras-cpu-gpu/](https://www.kaggle.com/marcosnovaes/hubmap-3-unet-models-with-keras-cpu-gpu/): Investigates three implementations of the Unet model

[https://www.kaggle.com/marcosnovaes/hubmap-read-data-and-build-tfrecords/](https://www.kaggle.com/marcosnovaes/hubmap-read-data-and-build-tfrecords/): Demonstrates how the TFRecord Dataset was built

[https://www.kaggle.com/marcosnovaes/hubmap-looking-at-tfrecords/](https://www.kaggle.com/marcosnovaes/hubmap-looking-at-tfrecords/): Explains how to read the data using the TFRecord Dataset

# Setup

1) You need to add your own saved model to a dataset and include it to this notebook. I am not making by model public so as not to make it too easy for everyone, but you can use the above referenced notebook to train your own model.
2) Edit the code to use the name of your own trained model
3) make sure internet is turned off


In [None]:
!ls -l /kaggle/input

In [None]:
import os
import sys
import glob
#import random
import warnings

import tifffile as tiff
from tqdm.notebook import tqdm

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

#from tqdm import tqdm
#from itertools import chain
from skimage.io import imread, imshow, imread_collection, concatenate_images
from skimage.transform import resize
import skimage.transform
import skimage.measure
from skimage.morphology import label
import cv2

from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input

from tensorflow.keras.layers import Conv2D, Conv2DTranspose
from tensorflow.keras.layers import MaxPooling2D, UpSampling2D
from tensorflow.keras.layers import concatenate

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras import backend as K

from tensorflow.keras import layers

from keras.engine.topology import Layer

from tensorflow.keras.optimizers import Adam
from keras.utils.generic_utils import get_custom_objects


from kaggle_datasets import KaggleDatasets
from kaggle_secrets import UserSecretsClient

import tensorflow as tf

import gc

In [None]:
# Utilities serialize data into a TFRecord
def _bytes_feature(value):
  """Returns a bytes_list from a string / byte."""
  if isinstance(value, type(tf.constant(0))):
    value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):
  """Returns a float_list from a float / double."""
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
  """Returns an int64_list from a bool / enum / int / uint."""
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

In [None]:
def image_example(image, tile_col_number, tile_row_number):
    image_shape = image.shape
    
    img_bytes = image.tostring()
    
    feature = {
        'height': _int64_feature(image_shape[0]),
        'width': _int64_feature(image_shape[1]),
        'num_channels': _int64_feature(image_shape[2]),
        'img_bytes': _bytes_feature(img_bytes),
        'tile_col_number': _int64_feature(tile_col_number),
        'tile_row_number': _int64_feature(tile_row_number),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))

In [None]:
# Create a dictionary describing the features.
image_feature_description = {
    'height': tf.io.FixedLenFeature([], tf.int64),
    'width': tf.io.FixedLenFeature([], tf.int64),
    'num_channels': tf.io.FixedLenFeature([], tf.int64),
    'img_bytes': tf.io.FixedLenFeature([], tf.string),
    'tile_col_number': tf.io.FixedLenFeature([], tf.int64),
    'tile_row_number': tf.io.FixedLenFeature([], tf.int64),
}

def _parse_image(example_proto):
    single_example = tf.io.parse_single_example(example_proto, image_feature_description)
    img_height = single_example['height']
    img_width = single_example['width']
    num_channels = single_example['num_channels']
    
    img_bytes =  tf.io.decode_raw(single_example['img_bytes'],out_type='uint8')
    #dynamic shape
    #img_array = tf.reshape( img_bytes, (img_height, img_width, num_channels))
    #fixed shape
    img_array = tf.reshape( img_bytes, (512, 512, 3))
    img_array = tf.cast(img_array, tf.float32) / 255.0
    
    mtd = dict()
    mtd['width'] = single_example['width']
    mtd['height'] = single_example['height']
    mtd['tile_col_number'] = single_example['tile_col_number']
    mtd['tile_row_number'] = single_example['tile_row_number']
   
    return img_array, mtd

def read_dataset(storage_file_path):
    encoded_image_dataset = tf.data.TFRecordDataset(storage_file_path, compression_type="GZIP")
    parsed_image_dataset = encoded_image_dataset.map(_parse_image)
    return parsed_image_dataset

In [None]:
def dice_coeff(y_true, y_pred):
    # add epsilon to avoid a divide by 0 error in case a slice has no pixels set
   # we only care about relative value, not absolute so this alteration doesn't matter
    _epsilon = 10 ** -7
    intersections = tf.reduce_sum(y_true * y_pred)
    unions = tf.reduce_sum(y_true + y_pred)
    dice_scores = (2.0 * intersections + _epsilon) / (unions + _epsilon)
    return dice_scores

def dice_loss(y_true, y_pred):
    loss = 1 - dice_coeff(y_true, y_pred)
    return loss
  
get_custom_objects().update({"dice": dice_loss})

class LayerNormalization (Layer) :
    
    def call(self, x, mask=None, training=None) :
        axis = list (range (1, len (x.shape)))
        x /= K.std (x, axis = axis, keepdims = True) + K.epsilon()
        x -= K.mean (x, axis = axis, keepdims = True)
        return x
        
def compute_output_shape(self, input_shape):
    return input_shape

In [None]:
def magic_unet(act_fn = 'relu', init_fn = 'he_normal', width=512, height = 512, channels = 3): 
    inputs = Input((512,512,3))
    act_fn = 'relu'
    init_fn = 'he_normal'

    # note we use linear function before layer normalization
    conv1 = Conv2D(8, 5, activation = 'linear', padding = 'same', kernel_initializer = init_fn)(inputs)
    conv1 = LayerNormalization()(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    conv2 = Conv2D(16, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(pool1)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    conv3 = Conv2D(32, 3, activation = 'linear', padding = 'same', kernel_initializer = init_fn)(pool2)
    conv3 = LayerNormalization()(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    conv4 = Conv2D(64, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(pool3)
    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

    conv5 = Conv2D(72, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(pool4)

    up6 = Conv2D(64, 2, activation = 'linear', padding = 'same', kernel_initializer = init_fn)(UpSampling2D(size = (2,2))(conv5))
    up6 = LayerNormalization()(up6)
    merge6 = concatenate([conv4,up6], axis = 3)
    conv6 = Conv2D(64, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(merge6)

    up7 = Conv2D(32, 2, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(UpSampling2D(size = (2,2))(conv6))
    merge7 = concatenate([conv3,up7], axis = 3)
    conv7 = Conv2D(32, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(merge7)

    up8 = Conv2D(16, 2, activation = 'linear', padding = 'same', kernel_initializer = init_fn)(UpSampling2D(size = (2,2))(conv7))
    up8 = LayerNormalization()(up8)
    merge8 = concatenate([conv2,up8], axis = 3)
    conv8 = Conv2D(16, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(merge8)

    up9 = Conv2D(8, 2, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(UpSampling2D(size = (2,2))(conv8))
    merge9 = concatenate([conv1,up9], axis = 3)
    conv9 = Conv2D(8, 3, activation = act_fn, padding = 'same', kernel_initializer = init_fn)(merge9)
    conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)
    model = Model(inputs = inputs, outputs = conv10)

    return model


In [None]:
!ls -l /kaggle/input

In [None]:
!ls -l /kaggle/input/humap-sample-models

In [None]:
!head /kaggle/input/hubmap-kidney-segmentation/sample_submission.csv

In [None]:
#tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
#tf.config.experimental_connect_to_cluster(tpu)
#tf.tpu.experimental.initialize_tpu_system(tpu)

#strategy = tf.distribute.experimental.TPUStrategy(tpu)

In [None]:
strategy = tf.distribute.MirroredStrategy()

In [None]:
tf.config.list_physical_devices()

In [None]:
tf.config.list_logical_devices()

In [None]:
from keras.models import load_model

with strategy.scope():
    pred_model = magic_unet()
    #pred_model.load_weights("/kaggle/input/humap-sample-models/hubmap-keras-tpu-200.h5")
    pred_model.load_weights("/kaggle/input/humap-sample-models/hubmap-tpu-cortex-200.h5")


In [None]:
pred_model.inputs

In [None]:
#pred_model.save('/kaggle/working/hubmap-keras-tpu-200.h5')

In [None]:
def predict_mask( image_shape, dataset_path, pred_model, bool_cutoff):
    
    pred_mask = np.zeros((image_shape[0], image_shape[1]), dtype = 'bool')
    #pred_mask = np.zeros((image_shape[0], image_shape[1]), dtype = 'float32')
    #pred_mask = pred_mask.astype(bool)
    
    print('Reading Dataset')
    input_dataset = read_dataset( dataset_path )
    gc.collect()

    print('Calling prediction')
    #pred_float_masks = pred_model.predict(input_dataset.batch(1),batch_size = 8, verbose=1)
      
    #print("Filling tile masks into large mask")
    input_batch_size = 512
    input_dataset = input_dataset.batch(input_batch_size)
    
    for input_batch, input_mtd in input_dataset.as_numpy_iterator():
        # fill in the predicted masks into the large mask
        pred_float_masks = pred_model.predict(input_batch,batch_size = 8, verbose=1)  
        # overwrite the big mask with these values at the right place
        #
        for index in range(pred_float_masks.shape[0]):
            tile_col = input_mtd['tile_col_number'][index]
            tile_row = input_mtd['tile_row_number'][index]
            
            tile_height_begin = tile_row * tile_size
            tile_height_end = tile_height_begin + tile_size  
            tile_width_begin = tile_col * tile_size
            tile_width_end = tile_width_begin + tile_size
            #pred_bool_mask = pred_float_masks[index,:,:,0] > bool_cutoff
            # downsample by 8x8 blocks with min to eliminate small areas
            reduced_pred = skimage.measure.block_reduce(pred_float_masks[index,:,:,0], (4,4), np.min)
            # upsample back to 512x512
            resized_pred = skimage.transform.resize(reduced_pred, (512,512))
            # apply boolean cutoff
            #pred_bool_mask = pred_float_masks[index,:,:,0] > bool_cutoff
            pred_bool_mask = resized_pred > bool_cutoff
            pred_mask[tile_height_begin:tile_height_end, tile_width_begin:tile_width_end] = pred_bool_mask
            #pred_mask[tile_height_begin:tile_height_end, tile_width_begin:tile_width_end] = pred_float_masks[index,:,:,0] 

    return pred_mask


def write_dataset( image_path, img_id, tile_size, output_path, reduce_factor=0):
    baseimage = tiff.imread(image_path)
    print ('original image shape',baseimage.shape)
    baseimage = np.squeeze(baseimage)
    if( baseimage.shape[0] == 3):
        baseimage = baseimage.swapaxes(0,1)
        baseimage = baseimage.swapaxes(1,2)
        print ('swaped shape',baseimage.shape)
    # cv2.resize is causing oom,...
    # resize the image if required
    if reduce_factor > 0:
        baseimage = cv2.resize(baseimage,(baseimage.shape[1]//reduce_factor,baseimage.shape[0]//reduce_factor),
                         interpolation = cv2.INTER_AREA)
    print ('resized image shape',baseimage.shape)    
        
    img_shape = baseimage.shape
    img_height = baseimage.shape[0]
    img_width = baseimage.shape[1]
    num_tile_rows = img_height // tile_size
    num_tile_cols = img_width // tile_size
    
    dataset_path = ('/kaggle/working/{}_tiles.tfrecords'.format(img_id))
    opts = tf.io.TFRecordOptions(compression_type="GZIP")
    writer = tf.io.TFRecordWriter(dataset_path, opts)
         
    for tile_col_number in tqdm(range(num_tile_cols),total=num_tile_cols, desc = 'filtering data by tile columns' ):
        
        for tile_row_number in range(num_tile_rows):

            tile_height_begin = tile_row_number * tile_size
            tile_height_end = tile_height_begin + tile_size
            
            tile_width_begin = tile_col_number * tile_size
            tile_width_end = tile_width_begin + tile_size

            image_tile = baseimage[tile_height_begin:tile_height_end, tile_width_begin:tile_width_end, :]
          
            img_hist = np.histogram(image_tile)
            lowband_density = np.sum(img_hist[0][0:4])
            if lowband_density > 1000:     
                # add tile to dataset
               
                tf_example = image_example(image_tile, tile_col_number, tile_row_number)
                writer.write(tf_example.SerializeToString())
    writer.close()
    baseimage = np.zeros(10)# clear previous variables from mem
    gc.collect()
    return img_shape, dataset_path

I leverage the RLE encoder provided by @xhlulu here:
[https://www.kaggle.com/xhlulu/efficient-mask2rle](https://www.kaggle.com/xhlulu/efficient-mask2rle)

In [None]:
def mask2rle(img):
    '''
    Efficient implementation of mask2rle, from @paulorzp
    --
    img: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    Source: https://www.kaggle.com/xhlulu/efficient-mask2rle
    '''
    pixels = img.T.flatten()
    pixels = np.pad(pixels, ((1, 1), ))
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

#https://www.kaggle.com/bguberfain/memory-aware-rle-encoding
#with bug fix
def rle_encode_less_memory(img):
    #watch out for the bug
    pixels = img.T.flatten()
    
    # This simplified method requires first and last pixel to be zero
    pixels[0] = 0
    pixels[-1] = 0
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
    runs[1::2] -= runs[::2]
    
    return ' '.join(str(x) for x in runs)

In [None]:
REDUCTION_FACTOR = 4

#process all images in the test directory
img_file_list = glob.glob('/kaggle/input/hubmap-kidney-segmentation/test/*.tiff')
#img_file_list = img_file_list[::-1]
#img_file_list = ['/kaggle/input/hubmap-kidney-segmentation/test/b2dc8411c.tiff']
#img_file_list = ['/kaggle/input/hubmap-kidney-segmentation/test/26dc41664.tiff']
tile_size = 512
num_images = img_file_list.__len__()
#num_images = 1
pred_mask = []
submission_df = pd.DataFrame(columns = ['id','predicted'])
#print("Processing all test image files")
img_shapes = []
img_paths = []
img_ids = []
print("Resizing and tiling all images")
for image_index in tqdm(range(num_images),total=num_images, desc='num images processed'):
    file_name = img_file_list[image_index]
    prefix = file_name.split('.')
    parts = prefix[0].split('/')
    image_id = parts[-1]
    image_path = img_file_list[image_index]
    sys.stdout.flush()
    print("Processing Image {}".format(image_id))
    # image will be reduced by REDUCTION_FACTOR
    img_shape, dataset_path = write_dataset(image_path, image_id, 512,'/kaggle/working', REDUCTION_FACTOR)  
    img_shapes.append(img_shape)
    img_paths.append(dataset_path)
    img_ids.append(image_id)
    
print("Generating Prediction Masks")    
for image_index in tqdm(range(num_images),total=num_images, desc='num masks processed'):
    print("Predicting Masks {}".format(image_id))
    pred_mask = predict_mask( img_shapes[image_index], img_paths[image_index], pred_model, 0.5)
    print("bool mask shape",pred_mask.shape)
    pred_mask = np.repeat(pred_mask, REDUCTION_FACTOR, axis=0)
    pred_mask = np.repeat(pred_mask, REDUCTION_FACTOR, axis=1)
    print("bool mask expanded shape",pred_mask.shape )
    print("RLE Encoding {}".format(img_ids[image_index]))
    #rle_string = mask2rle(pred_mask)
    rle_string = rle_encode_less_memory(pred_mask)
    #pred_mask = np.zeros(10)# clear previous variables from mem
    #gc.collect()
    submission_df = submission_df.append({'id':img_ids[image_index], 'predicted':rle_string},ignore_index=True)
    print("Removing Dataset from Disk {}".format(img_paths[image_index]))
    os.remove(img_paths[image_index])

#write csv
print("Writing Submission File")
submission_df.to_csv('/kaggle/working/submission.csv',index=False)

#show last mask
#plt.imshow(pred_mask)
print("Done")

In [None]:
pred_mask.shape

In [None]:
pred_mask.dtype

In [None]:
!ls -l /kaggle/working/

In [None]:
verify_sub = pd.read_csv('/kaggle/working/submission.csv')

In [None]:
verify_sub.head()

In [None]:
#plt.imshow(pred_mask)