* The data used in this competition includes 11 fresh frozen and 9 Formalin Fixed Paraffin Embedded (FFPE) PAS kidney images. 
* Glomeruli FTU annotations exist for all 20 tissue samples; some of these will be shared for training, and others will be used to judge submissions.

The Dataset is comprised of very large TIFF files.

* The **training set** has **8** files.
* The **public test set** has **5** files.
* The **private test set** is larger than the public test set. I suppose there will be **7** files. 

The train set includes annotations in both RLE-encoded and unencoded(JSON) forms. The annotations denote segmentations of glomeruli.

Both training and public test sets include anatomical structure segmentations. I suppose this can be used for pretraining.

JSON files are structured as follows
* A `type` (`Feature`) and object type id (`PathAnnotationObject`). Note that these fields are the same between all files and do not offer signal.
* A `geometry` containing a `Polygon` with `coordinates` for the feature's enclosing volume
* Additional `properties`, including the name and color of the feature in the image.
* The `IsLocked` field is the same across file types (locked for glomerulus, unlocked for anatomical structure) and is not signal-bearing.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import tifffile as tiff
import cv2
import os
from tqdm.notebook import tqdm
import zipfile

In [None]:
from pathlib import Path
Path.ls = lambda x: list(x.iterdir())

In [None]:
path = Path('/kaggle/input/hubmap-kidney-segmentation/')
path.ls()

In [None]:
train_df = pd.read_csv(path/'train.csv')

# Understanding RLE

The masks provided in the `train.csv` is in Running Length Encoding format. This encoding comes in pairs of pixel values as follows:
1. The starting pixel.
2. Number of pixels from the starting pixel. 

So, to specify 10 pixels starting from pixel number 200 would be written as:
>200 10

Also, the pixels are numbered from top to bottom and the left to right. This looks as follows:

In [None]:
np.arange(0,25).reshape(5,5).T