<h2 style='background:purple; border:0; color:white;font-size:2em'><center> HuBMAP Hacking the Kidney </center></h2>

<img src="https://storage.googleapis.com/kaggle-competitions/kaggle/22990/logos/header.png" alt="HuBMAP">

1. [Competition purpose](#1)
2. [Data Overview](#2)
3. [Tiles](#3)

This notebook is using this notebook for the tiles making part : https://www.kaggle.com/iafoss/256x256-images 




<a id='1'></a>
<h2 style='background:green; border:0; color:white;font-size:1.5em'><center> Competition purpose </center></h2>

> Your challenge is to detect functional tissue units (FTUs) across different tissue preparation pipelines. An FTU is defined as a “three-dimensional block of cells centered around a capillary, such that each cell in this block is within diffusion distance from any other cell in the same block” (de Bono, 2013). The goal of this competition is the implementation of a successful and robust glomeruli FTU detector.

The dataset is comprised of 8 very large (>500MB - 5GB) TIFF files is huge reason why we are gonna use tiles. 8 big images are not fit for deep neural networks ! The training set includes annotations in both RLE-encoded and unencoded (JSON) forms. The annotations denote segmentations of glomeruli.


## What we are prediciting?

Develop segmentation algorithms that identify glomeruli in the PAS stained microscopy data. Detect functional tissue units (FTUs) across different tissue preparation pipelines


## Evaluation Metric: Dice Coefficient

Dice Coefficient is common in case our task involve **segmentation**. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. the Dice similarity coefficient for two sets X and Y is defined as:

$$\text{DC}(X, Y) = \frac{2 \times |X \cap Y|}{|X| + |Y|}.$$

where X is the predicted set of pixels and Y is the ground truth.

<a id='2'></a>
<h2 style='background:green; border:0; color:white;font-size:1.5em'><center> Overview </center></h2>


In [None]:
# basic
import os, gc
import warnings
import numpy as np
import pandas as pd

# visualize
import seaborn as sns
import matplotlib.pyplot as plt

# reading tiff images
import tifffile as tiff 

%matplotlib inline
warnings.filterwarnings('ignore')

# directory
ROOT = '../input/hubmap-kidney-segmentation/'

We have a **train and a test folders with .tiff images and annotations in JSON**. We have **train.csv** and **HuBMAP-20-dataset_information.csv** containing image, masks information and metadata respectively. 

We have 8 images in the train dataset, with 2 caracteristics in the `train.csv`: id and encoding (RLE-encoded representation of the mask).

In [None]:
train = pd.read_csv(f'{ROOT}train.csv')
train

We have some additional metadata in the `HuBMAP-20-dataset_information.csv` file such as sex, age, weight, height, laterality, percentage of cortex and medulla within the kidney. Note that this file provides also metadata for the test dataset.

In [None]:
metadata = pd.read_csv(f'{ROOT}HuBMAP-20-dataset_information.csv')
metadata.head()

Finally, the images :

In [None]:
example_image = tiff.imread(os.path.join(ROOT, 'train/2f6ecfcdf.tiff'))
plt.figure(figsize=(16, 16))
plt.imshow(example_image)
plt.axis("off")
print(f'Image Shape: {example_image.shape}')

<a id='3'></a>
<h2 style='background:green; border:0; color:white;font-size:1.5em'><center> Tiles </center></h2>

We use the dataset used in this brilliant notebook : https://www.kaggle.com/iafoss/256x256-images

We first load the image tiles :

In [None]:
TRAIN = '../input/256256-hubmap/train/'
MASKS = '../input/256256-hubmap/masks/'
train_images = os.listdir(TRAIN)

from PIL import Image
import numpy as np

plt.figure(figsize=(15,15))
for i in range(25):
    plt.subplot(5,5,i+1)
    img = Image.open(TRAIN + train_images[i])
    img = np.array(img.getdata())
    img = img.reshape((256,256,3))
    plt.imshow(img)
    plt.title(train_images[i])
    plt.grid(False)
    plt.axis(False)
plt.show()

Then the corresponding masks :

In [None]:
train_masks = os.listdir(MASKS)
plt.figure(figsize=(15,15))
for i in range(25):
    plt.subplot(5,5,i+1)
    img = Image.open(MASKS + train_masks[i])
    img = np.array(img.getdata())
    img = img.reshape((256,256))
    plt.imshow(img)
    plt.title(train_images[i])
    plt.grid(False)
    plt.axis(False)
plt.show()

And then we superpose the two :

In [None]:
plt.figure(figsize=(15,15))
for i in range(25):
    plt.subplot(5,5,i+1)
    Image2_mask = Image.open(MASKS + train_masks[i])
    Image2_mask = np.array(Image2_mask.getdata())
    Image2_mask = Image2_mask.reshape((256,256))
    img = Image.open(TRAIN + train_images[i])
    img = np.array(img.getdata())
    img = img.reshape((256,256,3))
    plt.imshow(img)
    plt.imshow(Image2_mask, alpha=0.5)
    plt.title(train_images[i])
    plt.grid(False)
    plt.axis(False)
plt.show()


# TO BE CONTINUED...