# The HuBMAP Glomeruli FTU Segmentation Dataset
#### by Leah Scherschel, Ellen M Quardokus, Yingnan Yu, and Katy Börner - Indiana University

## Background Information

### Human Biomolecular Atlas Program (HuBMAP)
HuBMAP aims to create an open, global atlas of the human body at the cellular level, see the [HuBMAP Consortium](https://hubmapconsortium.org/about/) and [Visible Human Massive Open Online Course (VHMOOC)](https://expand.iu.edu/browse/sice/cns/courses/hubmap-visible-human-mooc). One component of this overarching goal is to identify medically relevant functional tissue units (FTUs) within whole slide microscopy images of human tissues. Once these FTUs are detected, information on size, shape, variability in number and location within the tissue samples can be used to help build a spatially accurate and semantically explicit model of the human body.
### FTUs and Glomeruli in Kidney
Bernard de Bono and his team coined the term “functional tissue unit” in 2013 as: “...a three-dimensional block of cells centered around a capillary, such that each cell in this block is within diffusion distance from any other cell in the same block” [(de Bono, 2013)](https://www.ncbi.nlm.nih.gov/pubmed/24103658) One example of an FTU is the glomerulus found in the outer layer of  kidney tissue known as the cortex, which in humans has an area of about 800 mm<sup>2</sup> and average depth of about 9 mm [(Mounier-Vehier, 2002)](https://doi.org/10.1046/j.1523-1755.2002.00167.x). 
Glomeruli consist of capillaries that facilitate filtration of waste products out of blood. Normal glomeruli typically range from 100-350 μm in diameter with a roughly spherical shape [(Kannan, 2019)](https://www.kireports.org/article/S2468-0249(19)30155-X/abstract). Refer to Figure 1 for a zoom sequence from the human body to single-cell level for kidney. Figure 2 highlights the cortex region of a tissue sample in green. Glomeruli contain four cell types: Parietal epithelial cells that form Bowman’s capsule, podocytes cover the outer layer of the filtration barrier, fenestrated endothelial cells that are coated with a glycolipid and glycoprotein matrix called glycocalyx that are in direct contact with blood and mesangial cells that occupy the space between the capillary blood vessel loops and are stained by the colorimetric histological stain called Periodic acid-Schiff (PAS) stain [(Vaughan, 2008)](https://doi.org/10.1681/ASN.2007040471). PAS stains polysaccharides (complex sugars like glycogen) such as those found in and around the glomeruli making it a favored stain for delineating them in tissue sections [(Agarwal, 2013)](https://doi.org/10.4103/0971-4065.114462). The periodic acid oxidizes the sugars to expose aldehyde free tips of the broken monosaccharide rings that react with the Schiff reagent to give a magenta color. Figure 3 is a light microscopy image that contains a glomerulus from a subsection of human kidney tissue stained with a PAS. One nucleus (spherical in shape) per cell, containing the genetic material for a cell (its chromosomes composed of nucleic acids) is stained dark bluish purple in the PAS stained images. The magenta stained regions in PAS stained tissue are the stained polysaccharides.



In [None]:
import os
from IPython.display import Image
print("Figure 1: Zoom in from Human Body to Single-Cell Level")
Image(filename="../input/datasetdetailsimages/HuBMAP zoom in.png")


In [None]:
print("Figure 2: Cortex Anatomical Structure in Green")
Image(filename="../input/datasetdetailsimages/AS mask example.png")

In [None]:
print("Figure 3: Glomerulus within PAS Stain Kidney Image")
Image(filename="../input/datasetdetailsimages/Glom mask example.png")

## Dataset

### Data Origin: Tissue Mapping Center (TMC) - Vanderbilt University (VU)
The kidney images and their segmentation masks, both of anatomical structures and glomeruli, were generated by HuBMAP’s tissue mapping center at Vanderbilt University (TMC-VU) team at the [BIOmolecular Multimodal Imaging Center (BIOMIC)](https://medschool.vanderbilt.edu/biomic/). The glomeruli masks were further improved to provide a "gold standard" by the Indiana University team through a combination of manual and machine learning efforts. Any annotations of glomeruli that were considered incorrect, ambiguous, or originating from damaged portions of the images were removed. 
### Dataset Specifications

- 20 kidney tissue sections
    - 11 fresh frozen (FF) carboxymethylcellulose (CMC) embedded
    - 9 formalin fixed paraffin embedded (FFPE)
- Each sample has the following data:
    - PAS stain microscopy image (RGB-channel TIFF)
        - The histology stained image is saved as a 24 bit RGB .tif file. 
        - The spatial resolution is .5 micron per pixel.
    - Anatomical Structure segmentation mask (JSON)
    - Glomeruli segmentation mask (JSON)
- Known clinical metadata includes:
    - Patient Number	
    - Race	
    - Ethnicity	
    - Sex	
    - Age	
    - Weight (kilograms)	
    - Height (centimeters)	
    - BMI (body mass index) (kg/m<sup>2</sup>)	
    - Laterality		
    - Percent Cortex	
    - Percent Medulla	
    - Global GS (%)  Count consecutive 100 glomeruli across cortex
        - Global  glomerulosclerosis  (GS) definition: The  global  glomerulosclerosis  (GS)  extent  (%)  was  defined as the number of globally sclerotic glomeruli/the total  number  of  available  glomeruli  ×  100.
- Total dataset size: approximately 80GB
    - PAS stain microscopy images are 2.6GB on average.
    
### Fresh Frozen vs. Formalin Fixed, Paraffin Embedded (FFPE) Tissue Preservation
Different types of tissue preservation techniques are used to maintain the integrity of tissue for downstream biomolecular assays analysis. “Fresh frozen” tissue is frozen in liquid nitrogen (-190°C) within 30-60 minutes after surgical excision; this type of preservation has been the method of choice for transcriptomics and immunohistochemistry; tissue samples are often embedded in Optimal Cutting Temperature (OCT) media for thin sectioning [(Robbe, 2018)](https://doi.org/10.1038/gim.2017.241) or carboxymethylcellulose (CMC) for imaging mass spectrometry [(Stoeckli, 2007)](https://doi.org/10.1016/j.ijms.2006.10.007). Formalin fixed, paraffin embedded (FFPE) tissue is the preferred method for clinical pathology samples for histology assessment since the formalin aldehyde cross links proteins to maintain structural integrity of the sample [(Bass, 2014)](https://doi.org/10.5858/arpa.2013-0691-RA). 
### Periodic acid-Schiff (PAS) Stain Microscopy
PAS is a histology stain that detects complex sugars in tissue sections. Periodic acid is used to break specific bonds within these sugars. The resulting aldehydes react with the Schiff reagent to produce the purple-magenta color exhibited by these images. See Figure 2 for an example of a PAS stain kidney slide image with a cyan overlay marking cortex area. Glomeruli can be observed as the circular areas of dark stain.
### Glomeruli Segmentation Masks
The glomeruli segmentation masks are a mix of manually and deep learning (DL) generated annotations in a slightly modified [geoJSON](https://geojson.org/) format, which were made in QuPath [(Bankhead, 2017)](https://doi.org/10.1038/s41598-017-17204-5) and reviewed by subject matter experts (SMEs) for quality control. The JavaScript Object Notation (JSON) file lists all glomeruli identified for each of the 11 + 9 tissue sections. The position and shape of a glomeruli is represented by a set of coordinates, see sample below in Figure 4. The “detection_score”, present only in the DL generated annotations, is a measure the DL model used during detection.



In [None]:
print("Figure 4: Sample JSON Annotation")
Image(filename="../input/datasetdetailsimages/json example.png")



Each item in the JSON list is an annotation with the following pertinent fields:
 
- “geometry”: 
    - “geometry/type”: All are “Polygon”
    - geometry/coordinates”: A list of each polygon vertex in x,y order
- “properties”:
    - “properties/classification”: 
        - “properties/classification/name”: Annotation class (in this case all are “Glomerulus”)
    - “properties/measurements”: list of key,value pairs for some quantitative property of the annotation. For annotations generated by the DL model, this includes “detection_score”. 