# Lavin version
Copied from https://www.kaggle.com/wrrosa/hubmap-tf-with-tpu-efficientunet-512x512-subm by Wojtek Rosa.

The main difference in this version is the use of tiles that overlap on all four sides, as produced by https://www.kaggle.com/markalavin/hubmap-tile-images-w-overlap-and-build-tfrecords.  For example, I'm
currently using tiles that are 512 x 512, plus 64 pixels of overlap on every side, laid out on a grid with
row and column strides of 512 and 512.  The advantage of this approach is that the prediction is done
using 640 x 640, and the inner 512 x 512 all "see" their correct neighborhoods by a "radius" of 64;
after the prediction, the 640 x 640 result is trimmed to 512 x 512 and inserted into the result tableau
prior to run-length coding.  The downside of this approach is that it takes ~1.56 the processing time
for both prediction and training https://www.kaggle.com/markalavin/hubmap-tf-with-tpu-efficientunet-512x512-t-012036.  NOTE:  At the moment, the pnly part of the training that "knows" 
about the overlapping layout is the ```dice_coeff``` function, which trims the borders of the predicted and
ground-truth mask images before calculating the Dice score.

Notes:
1.  Modify the training loss function to also do the trimming of the predicted and ground-truth images
2.  I *think* it is possible to do both the inferencing **and** the run-length calculation tile-by-tile; the
total runcode is the concatenation of each tile run code.  A problem with this is that the current code
does prediction for all the tiles in the whole image, which causes a memory spike, washing out the benefit
of the tile-at-a-time inference/runcoding.

Lavin modifications are enabled by setting variable LAVIN to True.

In [None]:
LAVIN = True

## Tensorflow HuBMAP - Hacking the Kidney competition starter kit:
* https://www.kaggle.com/wrrosa/hubmap-tf-with-tpu-efficientunet-512x512-tfrecs (how to create training and inference tfrecords)
* https://www.kaggle.com/wrrosa/hubmap-tf-with-tpu-efficientunet-512x512-train (training pipeline)
* this notebook (inference with submission)

# Versions
* V1 (V7 train notebook) 4-CV efficientunetb0 512x512 (**LB .834**)
* V2 (V8 train notebook) loss bce (LB .835)
* V3 (V9 train notebook) efficientunetb1 (CV .871, LB .830)
* V4 (V10 train notebook) efficientunetb4 (CV .874, **LB .839**)
* V5 (V12 train notebook) efficientunetb7 (CV .858, LB .835)
* V6 (V13 train notebook) efficientunetb4 (CV .877, LB .836)
* V7 (V14 train notebook) efficientunetb4 with overlapped train data, summing preds in inference (CV .879, **LB .843**)
* V8 (V14 train notebook) efficientunetb4  THRESHOLD=0.4, interpolation = cv2.INTER_AREA, rle_encode_less_memory (**LB .846**)
* V9 (V14 train notebook) efficientunetb4, MIN_OVERLAP = 300 (**LB 0.848**)
* V10 (V14 train notebook) efficientunetb4, checksum mask before modifications (1h 11m, no need to score)
* V11 (V14 train notebook) efficientunetb4, SUBMISSION_MODE added (generate submission from public tfrec files, almost 20m = 3.5 times faster!)
* V12 (V14 train notebook) efficientunetb4, CHECKSUM = False (...)

# Refferences:
* https://www.kaggle.com/joshi98kishan/hubmap-keras-pipeline-training-inference
* https://www.kaggle.com/bguberfain/memory-aware-rle-encoding/
* https://www.kaggle.com/leighplt/pytorch-fcn-resnet50

# Parameters
Read parameteres from notebook output, actually only **DIM** is used:

In [None]:
if LAVIN:
    mod_path = '/kaggle/input/hubmap-models/'
else:
    mod_path = '/kaggle/input/hubmap-tf-with-tpu-efficientunet-512x512-train/'

import yaml
import pprint
with open(mod_path+'params.yaml') as file:
    P = yaml.load(file, Loader=yaml.FullLoader)
    pprint.pprint(P)
    
THRESHOLD = 0.4 # preds > THRESHOLD
WINDOW = 1024
MIN_OVERLAP = 300
NEW_SIZE = P['DIM']

SUBMISSION_MODE = 'PUBLIC_TFREC' 
# 'PUBLIC_TFREC' = use created tfrecords for public test set with MIN_OVERLAP = 300 tiling 1024-512, ignore other (private test) data
# 'FULL' do not use tfrecords, just full submission 

if LAVIN:
    CHECKSUM = True # compute mask sum for each image
else:
    CHECKSUM = False # compute mask sum for each image


# Metrics

In [None]:
import json

with open(mod_path + 'metrics.json') as json_file:
    M = json.load(json_file)
print('Model run datetime: '+M['datetime'])
print('OOF val_dice_coe: ' + str(M['oof_dice_coe']))

# Packages

In [None]:
! pip install ../input/kerasapplications/keras-team-keras-applications-3b180cb -f ./ --no-index -q
! pip install ../input/efficientnet/efficientnet-1.1.0/ -f ./ --no-index -q
import numpy as np
import pandas as pd
import os
import glob
import gc
import sys
import re

import matplotlib.pyplot as plt
%matplotlib inline

import tifffile
import rasterio
from rasterio.windows import Window

import pathlib
from tqdm.notebook import tqdm
import cv2

import tensorflow as tf
import efficientnet as efn
import efficientnet.tfkeras

In [None]:
def running_on_TPU():
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        strategy = tf.distribute.experimental.TPUStrategy(tpu)
        return True
    except:
        return False
    
print( "running_on_TPU", running_on_TPU(), file = sys.stderr )

AUTO = tf.data.experimental.AUTOTUNE
image_feature = {
    'image': tf.io.FixedLenFeature([], tf.string),
}
def _parse_image(example_proto):
    example = tf.io.parse_single_example(example_proto, image_feature, name = "parse_example")
    dim_with_overlap = P[ 'DIM' ] + 2 * P[ 'PIXEL_OVERLAP' ]
    image = tf.reshape( tf.io.decode_raw(example['image'],out_type=np.dtype('uint8')), 
                        (dim_with_overlap, dim_with_overlap, 3))
    return image


dataset = tf.data.TFRecordDataset( "../input/blortzk-hubmap-test-overlapping-tiled-images/test/26dc41664-6068.tfrec",
                                   compression_type = "GZIP" )
dataset = dataset.batch( 77 )
dataset = dataset.prefetch(AUTO)
# dataset = dataset.map(_parse_image)

iterator = iter( dataset )
batch = iterator.get_next()
print( "type( batch )", type( batch ), "batch.shape", batch.shape, "dtype", batch.dtype, file = sys.stderr )
list( map( _parse_image, batch ) )

# Functions

In [None]:
def rle_encode_less_memory(img):
    pixels = img.T.flatten()
    pixels[0] = 0
    pixels[-1] = 0
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

def make_grid(shape, window=256, min_overlap=32):
    """
        Return Array of size (N,4), where N - number of tiles,
        2nd axis represente slices: x1,x2,y1,y2 
    """
    x, y = shape
    nx = x // (window - min_overlap) + 1
    x1 = np.linspace(0, x, num=nx, endpoint=False, dtype=np.int64)
    x1[-1] = x - window
    x2 = (x1 + window).clip(0, x)
    ny = y // (window - min_overlap) + 1
    y1 = np.linspace(0, y, num=ny, endpoint=False, dtype=np.int64)
    y1[-1] = y - window
    y2 = (y1 + window).clip(0, y)
    slices = np.zeros((nx,ny, 4), dtype=np.int64)
    
    for i in range(nx):
        for j in range(ny):
            slices[i,j] = x1[i], x2[i], y1[j], y2[j]    
    return slices.reshape(nx*ny,4)

# Models

In [None]:
identity = rasterio.Affine(1, 0, 0, 0, 1, 0)
fold_models = []
for fold_model_path in glob.glob(mod_path+'*.h5'):
    fold_models.append(tf.keras.models.load_model(fold_model_path,compile = False))
print(len(fold_models))

# Tfrecords functions

In [None]:
AUTO = tf.data.experimental.AUTOTUNE
if LAVIN:
    image_feature = {
        'image': tf.io.FixedLenFeature([], tf.string),
    }
else:
    image_feature = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'x1': tf.io.FixedLenFeature([], tf.int64),
        'y1': tf.io.FixedLenFeature([], tf.int64)
}
def _parse_image(example_proto):
    example = tf.io.parse_single_example(example_proto, image_feature )
    if LAVIN:
        dim_with_overlap = P[ 'DIM' ] + 2 * P[ 'PIXEL_OVERLAP' ]
        image = tf.reshape( tf.io.decode_raw(example['image'],out_type=np.dtype('uint8')), 
                           (dim_with_overlap, dim_with_overlap, 3))
        return image, 24, 37   # Dummy x1 and y1 values
    else:
        image = tf.reshape( tf.io.decode_raw(example['image'],out_type=np.dtype('uint8')), (P['DIM'],P['DIM'], 3))
        return image, example['x1'], example['y1']

def load_dataset(filenames, ordered=True):
    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False
    if LAVIN:
        dataset = tf.data.TFRecordDataset(filenames, compression_type = "GZIP" )
    else:
        dataset = tf.data.TFRecordDataset(filenames)
    dataset = dataset.with_options(ignore_order)
    dataset = dataset.map(_parse_image)
    return dataset

def get_dataset(FILENAME):
    dataset = load_dataset(FILENAME)
    dataset  = dataset.batch(64)
    dataset = dataset.prefetch(AUTO)
    return dataset.take( 10 )   # ### REMOVE "take"

# Results

p = pathlib.Path('../input/hubmap-kidney-segmentation')
subm = {}

for i, filename in tqdm(enumerate(p.glob('test/*.tiff')), 
                        total = len(list(p.glob('test/*.tiff')))):
    
    print(f'{i+1} Predicting {filename.stem}')
    
    dataset = rasterio.open(filename.as_posix(), transform = identity)
    if LAVIN:
        image_pixel_rows, image_pixel_cols = dataset.shape
    preds = np.zeros(dataset.shape, dtype=np.uint8)    
    
    if SUBMISSION_MODE == 'PUBLIC_TFREC' and MIN_OVERLAP == 300 and WINDOW == 1024 and NEW_SIZE == 512:
        print('SUBMISSION_MODE: PUBLIC_TFREC')
        if LAVIN:
            fnames = glob.glob('/kaggle/input/blortzk-hubmap-test-overlapping-tiled-images/test/'+filename.stem+'*.tfrec')
        else:
            fnames = glob.glob('/kaggle/input/hubmap-tf-with-tpu-efficientunet-512x512-tfrecs/test/'+filename.stem+'*.tfrec')
        
        if len(fnames)>0: # PUBLIC TEST SET
            for FILENAME in fnames:
                pred = None
                for fold_model in fold_models:
                    tmp = fold_model.predict(get_dataset(FILENAME))/len(fold_models)
                    if pred is None:
                        pred = tmp
                    else:
                        pred += tmp
                    del tmp
                    gc.collect()

                if LAVIN:
                    # Threshold the prediction values and make them bools:
                    pred = tf.cast( pred > THRESHOLD, tf.bool ).numpy().squeeze()
                    print( "after predict, pred.shape", pred.shape, file = sys.stderr )
                    pred_pixels = pred.shape[ 0 ] * pred.shape[ 1 ] * pred.shape[ 2 ]
                    pred_ones = pred.sum();
                    print( "Total pixels", pred_pixels, "1-valued pixels", pred_ones, file = sys.stderr )

                else:
                    pred = tf.cast((tf.image.resize(pred, (WINDOW,WINDOW)) > THRESHOLD),tf.bool).numpy().squeeze()

                if LAVIN:
                     
                    DIM = P[ 'DIM' ]
                    OVL = P[ 'PIXEL_OVERLAP' ]
                   
                    # Remember that we truncate the input to the largest number of whole DIMxDIM tiles
                    end_pixel_rows = DIM * ( image_pixel_rows // DIM )
                    end_pixel_cols = DIM * ( image_pixel_cols // DIM )
                    
                    print( "preds.shape", preds.shape, "end_pixel_rows", end_pixel_rows, "end_pixel_cols", end_pixel_cols, file = sys.stderr )

                    idx = 0   # Index of tile in column-major order:

                    for image_pixel_col in range( 0, end_pixel_cols, DIM ):
                        for image_pixel_row in range( 0, end_pixel_rows, DIM ):
                            if True:
                                try:
                                    trimmed_tile = pred[ idx, OVL : -OVL, OVL : -OVL ].squeeze()
                                    '''
                                    if ( trimmed_tile.sum() > 0 ):
                                        plt.title( "image_pixel_row " + str( image_pixel_row ) + " image_pixel_col " + str( image_pixel_col ) )
                                        plt.imshow( trimmed_tile )
                                        plt.show()
                                    '''
                                except:
                                    print( "Exception: ", "image_pixel_row " + str( image_pixel_row ) + " image_pixel_col " + str( image_pixel_col ), file = sys.stderr )
                                    if ( idx >= pred.shape[ 0 ] ):
                                        break
                                assert trimmed_tile.shape == ( DIM, DIM )
                                preds[ image_pixel_row : image_pixel_row + DIM,
                                       image_pixel_col : image_pixel_col + DIM ] =  trimmed_tile
                            if False:
                                print( "image_pixel_row", image_pixel_row, "image_pixel_col", image_pixel_col)
                                print( "image_pixel_row+P[DIM]", image_pixel_row + P[ 'DIM' ], file = sys.stderr )
                                print( "image_pixel_col+P[DIM]", image_pixel_col + P[ 'DIM' ], file = sys.stderr )
                                
                            idx += 1
                        if ( idx >= pred.shape[ 0 ] ):
                            break
                        
                                  
                else:
                    idx = 0
                    for img, X1, Y1 in get_dataset(FILENAME):
                        for fi in range(X1.shape[0]):
                            x1 = X1[fi].numpy()
                            y1 = Y1[fi].numpy()
                            preds[x1:(x1+WINDOW),y1:(y1+WINDOW)] += pred[idx]
                            idx += 1
                pred = None
                '''
                fig, (ax1, ax2) = plt.subplots(1, 2, figsize = ( 3, 4 ) )
                fig.suptitle( filename.stem )
                ax1.imshow( preds )
                rows = dataset.height
                cols = dataset.width
                image = dataset.read([1], window= ((0, rows), ( 0, 20000 ) ) ).squeeze()  # ### ((x1,x2),(y1,y2)))
                plt.imshow( image )
                plt.show()
                image = None
                gc.collect
                '''
                        
        else: # IGNORE PRIVATE TEST SET (CREATE TFRECORDS IN FUTURE)
            pass
    else:
        print('SUBMISSION_MODE: FULL')
        slices = make_grid(dataset.shape, window=WINDOW, min_overlap=MIN_OVERLAP)


        for (x1,x2,y1,y2) in slices:
            image = dataset.read([1,2,3],
                        window=Window.from_slices((x1,x2),(y1,y2)))
            image = np.moveaxis(image, 0, -1)
            image = cv2.resize(image, (NEW_SIZE, NEW_SIZE),interpolation = cv2.INTER_AREA)
            image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
            image = np.expand_dims(image, 0)

            pred = None

            for fold_model in fold_models:
                if pred is None:
                    pred = np.squeeze(fold_model.predict(image))
                else:
                    pred += np.squeeze(fold_model.predict(image))

            pred = pred/len(fold_models)

            pred = cv2.resize(pred, (WINDOW, WINDOW))
            preds[x1:x2,y1:y2] += (pred > THRESHOLD).astype(np.uint8)

    preds = (preds > 0.5).astype(np.uint8)
    
    subm[i] = {'id':filename.stem, 'predicted': rle_encode_less_memory(preds)}
    
    if CHECKSUM:
        print('Checksum: '+ str(np.sum(preds)))
    
    del preds
    gc.collect();

# Low-memory glom Prediction and Encoding
This version of the code above reduces the memory by doing model prediction and Run Length Encoding (RLE)
column by column rather than for the entire image.

In [None]:
def predict_and_encode_results( tiff_image_dirname, image_tile_dirname, trained_models, P, CSV_filename ):
    '''
    Does model-prediction using models in "trained_model_dirnamefor all input images
    represented by files in "image_tile_dirname", then run-length-encodes (RLE) the
    predicted glom mask.
    '''
    print( "tiff_image_dirname", tiff_image_dirname, "image_tile_dirname", image_tile_dirname, file = sys.stderr )

    # Retrieve the models, one per Training fold:
    # trained_models = retrieve_trained_models( trained_model_dirname )

    # Iterate over all the test images:
    with open( CSV_filename, "w") as CSV:
        CSV.write( "id,predicted\n" )  # csv header
        tiff_image_filenames = glob.glob( tiff_image_dirname + "*.tiff" )
        for tiff_image_filename in tqdm( tiff_image_filenames ):  # ### FIX THIS!!!! ###
            if "26dc41664" in tiff_image_filename:
                print( "tiff_image_filename", tiff_image_filename, file = sys.stderr )
                predict_and_encode_image( trained_models, tiff_image_filename, image_tile_dirname, CSV, P )
            

def predict_and_encode_image( trained_models, tiff_image_filename, image_tile_dirname, CSV, P ):
    print( "\ntiff_image_filename", tiff_image_filename, "image_tile_dirname", image_tile_dirname, file = sys.stderr )
    try:
        image_id = re.compile( r".*/([0-9a-fA-F]+)\.tiff" ).match( tiff_image_filename ).group( 1 )
    except:
        print( f"ERROR: Could not find image_id in filename {tiff_image_filename}")
    with rasterio.open( tiff_image_filename, transform = None ) as dataset:  # Just to query shape
        image_pixel_rows, image_pixel_cols = dataset.shape
    image_tile_rows = image_pixel_rows // P[ 'DIM' ]   # e.g., 512 pixels / tile
    image_tile_cols = image_pixel_cols // P[ 'DIM' ]
        
    # Iterate over all the tile columns in image:
    tile_batch_iterator = open_tile_batch_iterator( image_id, image_tile_dirname, image_tile_rows )
    RLE = ''
    for image_tile_col in tqdm( range( image_tile_cols ) ):
        RLE += " " + predict_and_encode_column( trained_models, image_id, image_tile_col, image_tile_rows, tile_batch_iterator, CSV, P )
        print( "Processed column", image_tile_col, "/", image_tile_cols, "for image_id", image_id, file = sys.stderr )  # ### REMOVE!!! ###
    

    # Write out the result as one line in the CSV file
    CSV.write( image_id + "," )
    CSV.write( RLE )
    CSV.write( "\n" )
    CSV.flush()
    print( "wrote one CSV record for image", image_id, file = sys.stderr )
                  
def predict_and_encode_column( trained_models, image_id, image_tile_col, image_tile_rows, tile_batch_iterator, CSV, P ):
    '''
    Returns string RLE for the tiles in column "image_tile_col"
    '''
    # Calculate the prediction for all tiles in "image_tile_col", averaging over one
    # prediction for each model in "trained_models" and thresholding
    tile_batch = tile_batch_iterator.get_next()
    col_tiles = tf.map_fn( parse_tfrecs_to_column_tiles, tile_batch, dtype = "uint8" )
    predicted_tiles = None
    for trained_model in trained_models:
        predicted_tile = trained_model.predict( col_tiles )
        predicted_tiles = predicted_tile if predicted_tiles is None else predicted_tiles + predicted_tile
    predicted_tiles /= len( trained_models )
    predicted_tiles = tf.cast( predicted_tiles > THRESHOLD, tf.bool ).numpy().squeeze()
    
    # Trim the overlapping borders from each predicted output tile, then concatenate
    # the column's tiles into a single 1-column-wide image
    OVL = P[ 'PIXEL_OVERLAP' ]
    trimmed_predicted_tiles = predicted_tiles[ :, OVL : - OVL, OVL : - OVL ]
    column_tensor = tf.reshape( trimmed_predicted_tiles, ( ( P[ "DIM" ] * image_tile_rows , P[ "DIM" ] ) ) )

    # Calculate the offset, in pixels, for the start of the current column
    column_pixel_offset = P['DIM'] * P['DIM'] * image_tile_rows * image_tile_col
    
    # Calculate the string RLE, which will be concatenated with RLEs from other columns, to
    # construct RLE for the entire image
    RLE = encode_RLE( column_tensor.numpy(), column_pixel_offset )
    return RLE

def open_tile_batch_iterator( image_id, image_tile_dirname, image_tile_rows ):
    '''
    Returns an iterator that iterates over batches in "image_tile_dirname"/"image_id"-nnn.tfrec
    '''
    image_tile_filename = glob.glob( image_tile_dirname + image_id + "*.tfrec" )
    print( "image_tile_filename", image_tile_filename, file = sys.stderr)

    dataset = tf.data.TFRecordDataset( image_tile_filename, compression_type = "GZIP" )
    dataset = dataset.batch( image_tile_rows )
    dataset = dataset.prefetch(AUTO)
    # ### dataset = dataset.map(_parse_image)

    iterator = iter( dataset )
    return iterator

def parse_tfrecs_to_column_tiles(example_proto):
    '''
    Parse a batch of TFRecords in "example_proto" to extract an
    array of image tiles comprising one tile column in the input.
    '''
    image_feature = {
        'image': tf.io.FixedLenFeature([], tf.string),
    }
    example = tf.io.parse_single_example(example_proto, image_feature )
    dim_with_overlap = P[ 'DIM' ] + 2 * P[ 'PIXEL_OVERLAP' ]
    image = tf.reshape( tf.io.decode_raw(example['image'],out_type=np.dtype('uint8')), 
                        (dim_with_overlap, dim_with_overlap, 3))
    return image

# Based on https://www.kaggle.com/friedchips/fully-correct-hubmap-rle-encoding-and-decoding:
# Given a predicted binary image tile column "mask" and a starting offset in
# column-major ordered pixels, calculate and return the string RLE, which will be
# concatenated with RLEs from other columns, to construct RLE for the entire image:
def encode_RLE( mask, column_pixel_offset = 0 ):
    mask = mask.T.reshape(-1) # make 1D, column-first
    mask = np.pad(mask, 1) # make sure that the 1d mask starts and ends with a 0
    starts = np.nonzero((~mask[:-1]) & mask[1:])[0] + column_pixel_offset # start points
    ends = np.nonzero(mask[:-1] & (~mask[1:]))[0] + column_pixel_offset # end points
    rle = np.empty(2 * starts.size, dtype=int) # interlacing...
    rle[0::2] = starts # ...starts...
    rle[1::2] = ends - starts # ...and lengths
    rle = ' '.join([ str(elem) for elem in rle ]) # turn into space-separated string
    return rle

def mask2rle(mask, column_offset = 0):
    ''' takes a 2d boolean numpy array and turns it into a space-delimited RLE string '''
    
    mask = mask.T.reshape(-1) # make 1D, column-first
    mask = np.pad(mask, 1) # make sure that the 1d mask starts and ends with a 0
    starts = np.nonzero((~mask[:-1]) & mask[1:])[0] + column_offset # start points
    ends = np.nonzero(mask[:-1] & (~mask[1:]))[0] + column_offset # end points
    rle = np.empty(2 * starts.size, dtype=int) # interlacing...
    rle[0::2] = starts # ...starts...
    rle[1::2] = ends - starts # ...and lengths
    rle = ' '.join([ str(elem) for elem in rle ]) # turn into space-separated string
    return rle

def rle2mask(rle, mask_shape):
    ''' takes a space-delimited RLE string in column-first order
    and turns it into a 2d boolean numpy array of shape mask_shape '''
    
    mask = np.zeros(np.prod(mask_shape), dtype=bool) # 1d mask array
    rle = np.array(rle.split()).astype(int) # rle values to ints
    starts = rle[::2]
    lengths = rle[1::2]
    for s, l in zip(starts, lengths):
        mask[s:s+l] = True
    return mask.reshape(np.flip(mask_shape)).T # flip because of column-first order



In [None]:
column_array = [ [ 0, 1, 1 ], [1, 1, 1 ], [ 1, 0, 0 ] ] 
column_tensor = tf.convert_to_tensor( column_array )
RLE = encode_RLE( column_tensor.numpy(), column_pixel_offset = 6 )
print( "column_tensor\n", column_tensor, "\nRLE", RLE, file = sys.stderr )
rle2mask( RLE, ( 3, 5 ) )

In [None]:
!ls -al /kaggle/input/hubmap-kidney-segmentation/test/*.tiff

In [None]:
'''
! touch /kaggle/working/submission.csv
tiff_image_dirname = "/kaggle/input/hubmap-kidney-segmentation/test/"
image_tile_dirname = "/kaggle/input/blortzk-hubmap-test-overlapping-tiled-images/test/"
trained_models = fold_models
CSV_filename = "/kaggle/working/submission.csv"
predict_and_encode_results( tiff_image_dirname, image_tile_dirname, trained_models, P, CSV_filename )
'''

In [None]:
'''
! ls -al /kaggle/working/submission.csv
! echo "" >/kaggle/working/submission.csv
'''

# Making submission

In [None]:
'''
submission = pd.DataFrame.from_dict(subm, orient='index')
submission.to_csv('submission.csv', index=False)
submission.head()
'''

# Visualizing results

In [None]:
# From https://www.kaggle.com/friedchips/fully-correct-hubmap-rle-encoding-and-decoding

def visualize_mask_and_image( tiff_image_dirname, CSV_filename ): 
    image_RLEs = pd.read_csv( CSV_filename )
    print( "image_RLEs", image_RLEs.head(), file = sys.stderr )

    for tiff_image_filename in glob.glob( tiff_image_dirname + "*.tiff" ):
        image_id = pathlib.Path( tiff_image_filename ).stem
        RLE = image_RLEs.predicted[ image_RLEs.id == image_id ]
        if ( len( RLE ) == 0 ):
            print( "For image_id", image_id, "no prediction", file = sys.stderr )
        else:
            RLE = RLE.values[ 0 ]
            print( "image_id", image_id, file = sys.stderr )
            with rasterio.open( tiff_image_filename, transform = identity) as dataset:
                # Round down the number of rows and columns to a multiple of the result tile size
                image_pixel_rows, image_pixel_cols = dataset.shape
                image_pixel_rows = image_pixel_rows // P[ 'DIM'] * P[ 'DIM' ]
                image_pixel_cols = image_pixel_cols // P[ 'DIM'] * P[ 'DIM' ]
            mask = rle2mask( RLE, ( image_pixel_rows, image_pixel_cols ) )
            plt.imshow( mask )
            plt.show()
            plt.imshow( cv2.imread( tiff_image_filename ) )
            plt.show()

'''            
tiff_image_dirname = "/kaggle/input/hubmap-kidney-segmentation/test/"
CSV_filename = "/kaggle/input/submission-four-out-of-five/submission.csv" # ### "/kaggle/working/submission.csv"   
visualize_mask_and_image( tiff_image_dirname, CSV_filename )
'''

In [None]:
! cat /kaggle/input/submission-four-out-of-five/submission.csv >/kaggle/working/submission.csv
! tail -n 1 /kaggle/input/submission-last-out-of-five-images/submission.csv >>/kaggle/working/submission.csv
! wc /kaggle/working/submission.csv
! grep -e "predicted" 