# Postprocess Delta Analysis

We will now look at the output of Delta.
You can also find a lot of extra info in the Delta documentation [here](https://delta.readthedocs.io/en/latest/usage/analysis.html)

---

## Import packages

Before starting the code we need to import all the required packages.

We use a number of important Python packages:
- [Numpy](https://numpy.org): Goto package for vector/matrix based calculations (heavily inspired by Matlab)
- [Pandas](https://pandas.pydata.org): Goto package for handling data tables (heavily inspired by R) 
- [Scipy](https://scipy.org): Numpy extensions for statistics, image analysis, and more
- [Scikit-image (skimage)](https://scikit-image.org): Goto package for image analysis
- [Matplotlib](https://matplotlib.org): Goto package for plotting data
- [Napari](https://napari.org): GUI based interactive image viewer
- [pathlib](https://docs.python.org/3/library/pathlib.html): Path handling made easy
- [pickle](https://docs.python.org/3/library/pickle.html): Read pkl file format
- [delta](https://delta.readthedocs.io/en/latest/usage/analysis.html): Delta pipeline

In [2]:

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
matplotlib.rc("figure", figsize=(10,5))

import seaborn as sns

import pathlib
import tifffile
import pickle

from skimage.measure import regionprops

from delta.pipeline import Position as delta_pos

In [3]:
def to_str(posixpath):
    return str(posixpath.resolve())    

---

## Setup Folders
As always we start with specifying the data paths:

In [4]:
proj_dir = pathlib.Path(pathlib.Path.home(), 'I2ICourse', 'Project2B')
processed_dir = proj_dir / 'ProcessedData'
image_dir = proj_dir / 'RawData'

---

## Conversion functions

In [369]:
def add_segment_info(lin, label_stack):
    #lin: delta linage object
    #label_stack: list of label images = pos.rois[0].label_stack
   
    #initialize new property keys:
    for cells in lin.cells:
        cells.setdefault('x_pos',[])
        cells.setdefault('y_pos',[])
   
    #loop frames
    for label_im in label_stack:
        #get region properties
        rp_list = regionprops(label_im)
        
        #assign cell phenotypes
        for idx, rp in enumerate(rp_list):
            #get lineage number of cell (note labels are 1 based, cell lineages are 0 based!)
            cell_idx = rp.label-1
            #assign phenotypes
            lin.cells[cell_idx]['x_pos'].append(rp.centroid[1]) #order in centroid is (y,x)
            lin.cells[cell_idx]['y_pos'].append(rp.centroid[0]) #order in centroid is (y,x)
                    
    return None

def split_lineages(lin):
    new_lin = []
    lut = np.empty((0,5)) #id / first frame / last_frame / new_cell_id / colony_id
    id_cell = 0 
    
    firstcells = lin.cellnumbers[0]

    for id, cell in enumerate(lin.cells):
        #find division events
        div_time = [i for i, val in enumerate(cell['daughters']) if val != None]
        ndiv = len(div_time)
        
        for i in range(ndiv+1):
            
            if i==0:
                if cell['mother'] is not None:
                    corr_cell = lut[:,0] == cell['mother']
                    corr_frame = (lut[:,2] == cell['frames'][0]-1)
                    id_par = int(lut[np.all((corr_cell, corr_frame), axis=0),3])
                    id_colony = int(lut[np.all((corr_cell, corr_frame), axis=0),4])
                else: 
                    id_par = -1  
                    id_colony = id if  id in firstcells else -1             
            else: id_par = id_cell - 1
                
           
            if ndiv == 0:
                new_cell = cell.copy()
                cur_lut = [id, cell['frames'][0], cell['frames'][-1], id_cell, id_colony]
            else:
                start = div_time[i-1] if i>0 else 0
                end = div_time[i] if i<ndiv else len(cell['frames'])   
                cur_lut = [id, cell['frames'][start], cell['frames'][end-1], id_cell, id_colony]    
                        
                new_cell = {}
                for key, item in cell.items():
                    if isinstance(item, list):
                        new_cell[key] = item[start:end]
                    else:
                        new_cell[key] = item
                                            
            
            _ = new_cell.pop('mother')
            _ = new_cell.pop('daughters')
            new_cell['id_seg'] = new_cell.pop('id')
            new_cell['id_cell'] = id_cell 
            new_cell['id_par'] = id_par 
            new_cell['id_colony'] = id_colony 

                
            lut = np.concatenate((lut, np.array(cur_lut)[np.newaxis,:]))               
            new_lin.append(new_cell) 
            id_cell += 1   
    return new_lin

def lin_to_df(cell_list):
    #find vector based data (only vector based data is compatible with dataframe)
    vector_data = []
    [vector_data.append(key) for key in cell_list[0].keys() if isinstance(cell_list[0][key], list)]
    #create data frame
    df = pd.DataFrame(cell_list) 
    #this creates nested dataframe, we need to explode time into separate rows:
    df = df.explode(vector_data)
    #and reindex
    df = df.reset_index(drop=True)

    return df

def add_exra_lin_info(df):
    #create look up table to link cells to parent, offspring, and siblings
    df_full = df.loc[df['id_par']>=0, ['id_par', 'id_cell']].reset_index(drop=True)
    dflin = df_full.groupby('id_par').agg([min, max]).rename(columns={"min" : "d1", "max" : "d2"})
    dflin.columns = dflin.columns.droplevel()
    dflin = dflin.reset_index()
    dflin.head()

    df["id_d1"] = -1
    df["id_d2"] = -1
    df["id_sib"] = -1

    #add offspring to parent
    for mom in np.unique(dflin["id_par"]):
        if mom >= 0:
            df.loc[df["id_cell"] == mom, "id_d1"] = int(dflin.loc[dflin["id_par"] == mom, "d1"])
            df.loc[df["id_cell"] == mom, "id_d2"] = int(dflin.loc[dflin["id_par"] == mom, "d2"])
    
    #add siblings to d1    
    for cell in np.unique(dflin["d1"]):
        df.loc[df["id_cell"] == cell, "id_sib"] = int(dflin.loc[dflin["d1"] == cell, "d2"])
        
    #add siblings to d2    
    for cell in np.unique(dflin["d2"]):
        df.loc[df["id_cell"] == cell, "id_sib"] = int(dflin.loc[dflin["d2"] == cell, "d1"])

    #rearange columns
    new_cols = [c for c in df.columns.tolist() if "id_" in c ]
    [new_cols.append(c) for c in df.columns.tolist() if not "id_" in c]
    df = df[new_cols] 
    
    return df


def delta_to_df(pos, reader=None):
    #get lineage
    lin = pos.rois[0].lineage
    
    #add segment info
    add_segment_info(lin, pos.rois[0].label_stack)
    
    #split lineages:
    cell_list = split_lineages(lin)
        
    #convert to pandas dataframe
    df = lin_to_df(cell_list)
    
    #add extra lineage information
    df = add_exra_lin_info(df)
   
    return df

---

## Load Data
We can now load the data again. See [here](https://delta.readthedocs.io/en/latest/usage/analysis.html) for detailed instructions.
To load all data we can use the following command:

In [373]:
#find all position .pkl outputs:
file_dirs = sorted(processed_dir.glob('*.pkl'))
print(file_dirs)
# this loads the processed data
pos_name = to_str(file_dirs[0])
pos = delta_pos(None,None,None)
pos.load(pos_name)


[PosixPath('/Users/simonvanvliet/I2ICourse/Project2B/ProcessedData/Position000000.pkl')]


## Convert to Pandas dataframe

In the data frame we now have the following lineage information:

- `id_seg`: the ordinal lineage id assigned by delta (contains cell+offspring), this index (offset by 1) in label image. Don't use this unless you need to interface wit label image
- `id_cell`: a unique id for each cell, from birth to division. Always use this to access cell lineages.
- `id_par`: the `id_cell` number of a cell's parent
- `id_colony`: each cell in first frame is assigned a unique `id_colony` which is shared with all it's offspring. (e.g. use this to separate different colonies)
- `id_d1`: the `id_cell` number of a cell's first offspring (old-pole)
- `id_d2`: the `id_cell` number of a cell's second offspring (new-pole)
- `id_sib`: the `id_cell` number of a cell's sibling

In [374]:
df = delta_to_df(pos)
df.head(30)

Unnamed: 0,id_seg,id_cell,id_par,id_colony,id_d1,id_d2,id_sib,frames,new_pole,old_pole,edges,length,width,area,perimeter,fluo1,x_pos,y_pos
0,0,0,-1,0,1,16,-1,0,"[250, 310]","[236, 298]",,30.766232,11.088337,267.0,30,216.422819,305.362416,243.97651
1,0,0,-1,0,1,16,-1,1,"[253, 311]","[237, 297]",,31.876179,10.988653,273.5,27,234.334426,304.872131,245.321311
2,0,0,-1,0,1,16,-1,2,"[261, 310]","[246, 295]",,31.819805,10.606601,276.0,30,235.65798,303.0,253.693811
3,0,0,-1,0,1,16,-1,3,"[263, 306]","[248, 291]",,33.234016,11.313707,304.5,27,251.801187,299.240356,255.436202
4,0,0,-1,0,1,16,-1,4,"[263, 304]","[247, 287]",,33.941124,11.313707,314.5,31,244.034483,296.017241,254.451149
5,0,0,-1,0,1,16,-1,5,"[261, 304]","[246, 284]",,36.573315,11.401754,327.5,34,255.393939,295.068871,253.482094
6,0,0,-1,0,1,16,-1,6,"[267, 305]","[252, 285]",,37.44136,11.257707,341.5,35,246.783069,294.664021,258.902116
7,0,0,-1,0,1,16,-1,7,"[272, 307]","[256, 285]",,38.269344,11.229179,349.5,42,274.351421,296.069767,263.374677
8,0,0,-1,0,1,16,-1,8,"[268, 309]","[250, 284]",,40.78558,11.266735,366.5,41,269.231527,297.133005,258.623153
9,0,0,-1,0,1,16,-1,9,"[275, 307]","[257, 283]",,41.866165,11.421289,378.0,49,283.377088,295.455847,265.315036


## File Saving
This would be a good time to save your data.  
You can save the position file using `pos.save(filename=filename, save_format='pickle')`. We won't do this now as we do not want to accidentally corrupt our data.

Instead we just save the dataframe:

In [22]:
save_name = processed_dir / pos_name.replace('.pkl','_df.pkl')
df.to_pickle(save_name)