# Extracting Cell Properties 

After you have obtained a proper segmentation of cells it is time to extract the properties of the cells, such as there size, location, and fluorescent intensity. We can do this using `skimage.measure.regionprops` and `skimage.measure.regionprops_table`. These funcytions extract measurements include features such as area or volume, bounding boxes, and intensity statistics.

---

## Import packages

Before starting the code we need to import all the required packages.

We use a number of important Python packages:
- [Numpy](https://numpy.org): Goto package for vector/matrix based calculations (heavily inspired by Matlab)
- [Pandas](https://pandas.pydata.org): Goto package for handling data tables (heavily inspired by R) 
- [Scikit-image (skimage)](https://scikit-image.org): Goto package for image analysis
- [Matplotlib](https://matplotlib.org): Goto package for plotting data
- [Dask-Image](https://image.dask.org/en/latest/): Out-of-memory computation made easy
- [pathlib](https://docs.python.org/3/library/pathlib.html): Path handling made easy
- [h5py](https://www.h5py.org): Read HDF5 file format

In [None]:
#next two lines make sure that Matplotlib plots are shown properly in Jupyter Notebook
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

#main data analysis packages
import numpy as np
import pandas as pd

#image processing packages
from skimage.measure import regionprops, regionprops_table

#data plotting packages
import matplotlib
import matplotlib.pyplot as plt
#set default figure size
matplotlib.rc("figure", figsize=(10,5))

#out of memory computation
from dask_image.imread import imread
import dask.array as da

#path handling
import pathlib

#file handling
import h5py

In [None]:
#we initiate a cashe for Dask to speed up repeated computation (important for working with Napari)
from dask.cache import Cache
cache = Cache(2e9)  # Leverage two gigabytes of memory
cache.register()    # Turn cache on globally

---
## Import Data
We start by specifying the paths to our data

In [None]:
#Set the path to the folder that contains project data
root = pathlib.Path(pathlib.Path.home(), 
                    'I2ICourse/Project2A/ProcessedData/')

image_name = 'pos0_preproc-rg.tif' #set name of image
n_channel = 2 #set number of color channels in image

im_path = root / image_name 
seg_path = root /  image_name.replace('.tif','_label_im.hdf5')

We now load the image and the segmentation data we created in the previous notebook.

In [None]:
#load image with dask-image for out of memory processing 
im_stack = imread(im_path) 
# dask_image imread creates a 3D stack, where both color channels are interweaved
# to separate them we need to reshape to 4D stack
if n_channel>1: 
    newshape = (int(im_stack.shape[0]/n_channel), n_channel, *im_stack.shape[1:])
    im_stack = im_stack.reshape(newshape)

#load segdata with daskfor out of memory processing 
label_data_file = h5py.File(seg_path, 'r') #open file
label_stack = da.from_array(label_data_file['labels_final'], chunks=(1,-1,-1)) #create dask array

---
## Using Regionprops to extract cell properties for single time-point
`skimage.measure.regionprops` automatically measures many labeled image features. Optionally, an `intensity_image` can be supplied and intensity features are extracted per object. Note that color axis needs to be at the end!

We demonstrate it's usage here by analyzing the first frame.

In [None]:
#extract first frame
label_im = label_stack[0,:,:]
image = im_stack[0,:,:]

#region props need color channel to be at end
image = np.moveaxis(image, 0, -1)
#we add the .compute() to instruct Dask to do all the calculations at this stage
reg_props = regionprops(label_im.compute(), intensity_image=image.compute())
    
print(len(reg_props))
reg_props[0] 

`reg_props` is a list containing the region properties of all cells present at this time. A list of all propertied can be found in the [scikit-image documentation](https://scikit-image.org/docs/dev/api/skimage.measure.html?highlight=regionprops#skimage.measure.regionprops).

We can thus extract a property using `rep_props[c].[property_name]`, where `c` indicates the cell label nr.  

We can for example extract the mean intensity of the cell:

In [None]:
print('cell mean intensity = ', reg_props[0].intensity_mean)

But we can also extract more complex info, such as the cell mask:

In [None]:
fig, axs = plt.subplots(figsize=(10,5))
axs.imshow(reg_props[0].image)

*(Technical aside: scikit-image 0.18 adds support for [passing custom functions](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops) for region properties as `extra_properties`.)*

---

## Using regionprops_table to extract cell measurements for single time-point

`regionprops` returns a huge amount of info, but navigating it is a bit cumbersome.  
Instead you can use [`regionprops_table`](https://scikit-image.org/docs/dev/api/skimage.measure.html?highlight=regionprops#skimage.measure.regionprops). It work in the same way, however, you will have to specify which properties you want to extract.

Here we will first look at a single frame:

In [None]:

#extract first frame
label_im = label_stack[0,:,:]
image = im_stack[0,:,:]
#region props need color channel to be at end
image = np.moveaxis(image, 0, -1)

#specify properties to extract 
prop_list = ['label', 
            'area', 'centroid', 
            'axis_major_length', 'axis_minor_length',
            'mean_intensity'] 

#get region prop table 
#we add the .compute() to instruct Dask to do all the calculations at this stage
rp_table = regionprops_table(label_im.compute(), intensity_image=image.compute(), properties=prop_list) 

### Create Pandas Dataframe for single time point
The output of the `regionprops_table` function can easily be converted in a [Pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html), using the [`DataFrame`](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) function. 

Pandas dataframes and R dataframes share a lot of features so if you are familiar with R you should hopefully feel right at home. For those coming from Matlab/Numpy it might take some time to get used to the Pandas syntax. Luckily there are very detailed guides available online, starting with the [Pandas Homepage](https://pandas.pydata.org).

We now convert the region properties table to a Pandas dataframe:

In [None]:
info_table = pd.DataFrame(rp_table).set_index('label')

With `head()` we can have a look at the Pandas data frame:

In [None]:
info_table.head()

---

## Extract cell properties for all time-points

So far we only analyzed a single frame, now let's combine them all. To do this it is helpful to define a function that processes a single time point:

In [None]:
#function to process single frame  
def extract_prop_slice(t, label_im, image, prop_list):
    #region props need color channel to be at end
    image = np.moveaxis(image, 0, -1)
    
    #get region prop table
    rp_table = regionprops_table(label_im.compute(), intensity_image=image.compute(), properties=prop_list) 
    df = pd.DataFrame(rp_table)
    
    #add the time index
    df["frame"] = t
    
    return df

We now loop over all time points calling the function we defined above and use `pandas.concat` to combine all the frames into a single data frame.

In [None]:
#specify properties to extract 
prop_list = ['label', 
            'area', 'centroid', 
            'axis_major_length', 'axis_minor_length',
            'mean_intensity'] 

#loop over all frames
df_list = [extract_prop_slice(t, label, image, prop_list) for t, (label, image) in enumerate(zip(label_stack, im_stack))]

#combine into single dataframe
info_table_all = pd.concat(df_list)

In [None]:
#show start of dataframe
info_table_all.head()

In [None]:
#show end of dataframe
info_table_all.tail()

*Technical aside: Python has some nice tools which allows you to write a full for loop in a single line:*

*`df_list = [extract_prop_slice(t, label, image, prop_list) for t, (label, image) in enumerate(zip(labels_final, im_stack))]`*

*In this code `for t, (label, image) in enumerate(zip(labels_final, im_stack))` will iterate simultaneously through the frames contained in `labels_final` and `im_stack` (`zip` takes care of this) and in addition it will provide the index of the iteration (`enumerate` takes care of this). These values are then passed on to the `extract_prop_slice()` function. By placing everything between square brackets in is returned as a list* 

*[Here](https://www.w3schools.com/python/python_lists_comprehension.asp) you can find more details of this so called list comprehension.* 

---

## Saving and loading Pandas dataframes

### Storing dataframe
We can store the dataframe to HDD in e.g. `.pkl` or `.csv` format. [See here](https://pandas.pydata.org/docs/user_guide/io.html) for all supported formats.

In [None]:
data_name_pkl = root /  image_name.replace('.tif','_cellprop.pkl')
info_table_all.to_pickle(data_name_pkl)

data_name_csv = root /  image_name.replace('.tif','_cellprop.csv')
info_table_all.to_csv(data_name_csv)

### Loading dataframe
And we can load it again. [See here](https://pandas.pydata.org/docs/user_guide/io.html) for all supported formats.

In [None]:
info_table_loaded = pd.read_pickle(data_name_pkl)
info_table_loaded.head()

---
## Next step: Data Analysis Using Pandas

We will continue in the next notebook: `2_explore_data_with_pandas`, however, **before starting the Tutors will give you a brief intro**, please let them know your are ready for the next step!