# Postprocess Delta Analysis

We will now look at the output of Delta.
You can also find a lot of extra info in the Delta documentation [here](https://delta.readthedocs.io/en/latest/usage/analysis.html)

---

## Import packages

Before starting the code we need to import all the required packages.

We use a number of important Python packages:
- [Numpy](https://numpy.org): Goto package for vector/matrix based calculations (heavily inspired by Matlab)
- [Pandas](https://pandas.pydata.org): Goto package for handling data tables (heavily inspired by R) 
- [Scipy](https://scipy.org): Numpy extensions for statistics, image analysis, and more
- [Scikit-image (skimage)](https://scikit-image.org): Goto package for image analysis
- [Matplotlib](https://matplotlib.org): Goto package for plotting data
- [Napari](https://napari.org): GUI based interactive image viewer
- [pathlib](https://docs.python.org/3/library/pathlib.html): Path handling made easy
- [pickle](https://docs.python.org/3/library/pickle.html): Read pkl file format
- [delta](https://delta.readthedocs.io/en/latest/usage/analysis.html): Delta pipeline

In [1]:

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
matplotlib.rc("figure", figsize=(10,5))

import seaborn as sns

import pathlib
import tifffile
import pickle

from skimage.measure import regionprops

from delta.pipeline import Position as delta_pos

In [5]:
def to_str(posixpath):
    return str(posixpath.resolve())    

---

## Setup Folders
As always we start with specifying the data paths:

In [3]:
proj_dir = pathlib.Path(pathlib.Path.home(), 'I2ICourse', 'Project2B')
processed_dir = proj_dir / 'ProcessedData'
image_dir = proj_dir / 'RawData'

---

## Load Data
We can now load the data again. See [here](https://delta.readthedocs.io/en/latest/usage/analysis.html) for detailed instructions.
To load all data we can use the following command:

In [7]:
#find all position .pkl outputs:
file_dirs = sorted(processed_dir.glob('*.pkl'))
print(file_dirs)
# this loads the processed data
pos_name = to_str(file_dirs[0])
pos = delta_pos(None,None,None)
pos.load(pos_name)

[PosixPath('/Users/simonvanvliet/I2ICourse/Project2B/ProcessedData/Position000000.pkl')]


---

## Convert Tracking Data to Standard Format

As you can clearly see from the saw-tooth pattern of cell length, a cell lineage in Delta is defined in a bit of a weird way as it continues across cell divisions. More commonly a cell lineage is defined to start at cell birth and stop and cell division. 

As a first step we thus have to split these lineages into segments.
Below we give a function that does this (no need to try to understand this code for now).

In [8]:
def split_lineages(lin):
    #first we give each sublineage a unique id
    unique_count = 0 
    #loop delta lineages
    for cell in lin.cells:

        ## create sublineage ID
        #find division events
        div_event = np.array([d is not None for d in cell['daughters']])
        #cumulative number of divisions gives a unique nr to each segment
        sublin_id = np.cumsum(div_event) 
        #now assign unique number accross all cell lineages
        unique_id = sublin_id + unique_count #unique cells id
        
        #update unique_count 
        unique_count += (np.sum(div_event) + 1)
        
        ##now we have to connect lineages together
        #find mother cell and birth frame
             
        if cell['mother']  is not None:
            mom = lin.cells[cell['mother'] ] #get properties of mom
            try:
                ## find unique cell id of the proper sub-segment in the mom lineage
                birth_frm = cell['frames'][0]   
                div_frm = mom['frames'].index(birth_frm-1)
                mom_lin_id = mom['lin_id'][div_frm] #this is proper unique lin id of mom
            except:
                mom_lin_id = -1
        else:
            #no mom found (first cell)
             mom_lin_id = -1   
                    
        
        ## add mother info
        #the id of mother is the id of prev segment
        mother_lin_id = unique_id.copy() - 1
        #except for first segment, there we use the lin_index of mother we found above
        mother_lin_id[sublin_id==0] = mom_lin_id
        
        #add d1 offspring number to linage 
        #this is simply next segment
        d1_lin_id = unique_id.copy() + 1
        #except for last segment, this does not have d1 offspring 
        d1_lin_id[d1_lin_id==d1_lin_id[-1]] = -1

        #now we assign properties to cell
        cell['sublin_idx'] = sublin_id.tolist()
        cell['lin_id'] = unique_id.tolist()
        cell['mother_lin_id'] = mother_lin_id.tolist()
        cell['d1_lin_id'] = d1_lin_id.tolist()
        
    return None #lin is updated in place

---

## Add Positional Information

If you have another look at the lineage object you can see it does not contain information on the location of cells. We add a function below to do this.

In [9]:
def add_segment_info(lin, label_stack):
    #lin: delta linage object
    #label_stack: list of label images = pos.rois[0].label_stack
   
    #initialize new property keys:
    for cells in lin.cells:
        cells.setdefault('x_pos',[])
        cells.setdefault('y_pos',[])
   
    #loop frames
    for label_im in label_stack:
        #get region properties
        rp_list = regionprops(label_im)
        
        #assign cell phenotypes
        for idx, rp in enumerate(rp_list):
            #get lineage number of cell (note labels are 1 based, cell lineages are 0 based!)
            cell_idx = rp.label-1
            #assign phenotypes
            lin.cells[cell_idx]['x_pos'].append(rp.centroid[1]) #order in centroid is (y,x)
            lin.cells[cell_idx]['y_pos'].append(rp.centroid[0]) #order in centroid is (y,x)
                    
    return None

It is always good to check things visually. Below we define a function that can be used to plot any cell property of choice as a spatial map:

In [10]:
def plot_spatial_map(pos, lin, property, frame=-1, axis=None):
   #pos: delta position object
   #lin: delta linage object
   #property: key of cell property contained in lineage object
   #frame: frame to show, if not specified last one is chosen
   #axis: axis to add plot to, if not specified new one is made
   
   #create color map where Nan is shown as black  
   colMap = cm.get_cmap("viridis").copy()
   colMap.set_bad(color='black')
   
   #get frame
   frame = len(pos.rois[0].label_stack)-1 if frame==-1 else frame

   # get label image:
   labels = pos.rois[0].label_stack[frame]

   spatial_map = np.full(labels.shape, np.nan)

   # Go over cells in selected frame:
   for cnb in lin.cellnumbers[frame]:
   
      #convert to numpy to allow for advanced indexing
      cell_frames = np.array(lin.cells[cnb]['frames'])
      cell_prop = np.array(lin.cells[cnb][property])
   
      #frame index      
      fr_idx = cell_frames==frame
      cell_prop = cell_prop[fr_idx]
      
      #assign cells mask area the phenotype of choice
      spatial_map[labels==cnb+1] = cell_prop

   #create new axis if needed
   if axis is None:
      fig, axis = plt.subplots()
   
   #make plot
   axis.imshow(spatial_map, cmap=colMap)
   axis.set_xlabel('x-pos') 
   axis.set_ylabel('y-pos') 
   axis.set_title(property)
   
   return None

---

## Convert to Pandas dataframe

Now that we have all the important cell properties we can convert the Delta output to a Pandas dataframe. We provide a function for this below:

In [11]:
def lin_to_df(lin):
    #find vector based data (only vector based data is compatible with dataframe)
    vector_data = []
    [vector_data.append(key) for key in lin.cells[0].keys() if isinstance(lin.cells[0][key], list)]
    #create data frame
    df = pd.DataFrame(lin.cells) 
    #this creates nested dataframe, we need to explode time into separate rows:
    df = df.explode(vector_data)
    #and reindex
    df = df.reset_index(drop=True)

    return df

---

## All-in-one processing

For future use, we provide here a single wrapper function that takes as entry a delta position object and outputs a Pandas dataframe by successively calling the functions we defined above.

In [13]:
def delta_to_df(pos, reader=None):
    
    lin = pos.rois[0].lineage
    
    #split lineages:
    split_lineages(lin)
    
    #add segment info
    add_segment_info(lin, pos.rois[0].label_stack)
    
    #convert to pandas dataframe
    df = lin_to_df(lin)
   
    return df

In [14]:
#let's first reload the position
df = delta_to_df(pos)
df.head(n=30)

Unnamed: 0,id,mother,frames,daughters,new_pole,old_pole,edges,length,width,area,perimeter,fluo1,sublin_idx,lin_id,mother_lin_id,d1_lin_id,x_pos,y_pos
0,0,,0,,"[250, 310]","[236, 298]",,30.766232,11.088337,267.0,30,216.422819,0,0,-1,1,305.362416,243.97651
1,0,,1,,"[253, 311]","[237, 297]",,31.876179,10.988653,273.5,27,234.334426,0,0,-1,1,304.872131,245.321311
2,0,,2,,"[261, 310]","[246, 295]",,31.819805,10.606601,276.0,30,235.65798,0,0,-1,1,303.0,253.693811
3,0,,3,,"[263, 306]","[248, 291]",,33.234016,11.313707,304.5,27,251.801187,0,0,-1,1,299.240356,255.436202
4,0,,4,,"[263, 304]","[247, 287]",,33.941124,11.313707,314.5,31,244.034483,0,0,-1,1,296.017241,254.451149
5,0,,5,,"[261, 304]","[246, 284]",,36.573315,11.401754,327.5,34,255.393939,0,0,-1,1,295.068871,253.482094
6,0,,6,,"[267, 305]","[252, 285]",,37.44136,11.257707,341.5,35,246.783069,0,0,-1,1,294.664021,258.902116
7,0,,7,,"[272, 307]","[256, 285]",,38.269344,11.229179,349.5,42,274.351421,0,0,-1,1,296.069767,263.374677
8,0,,8,,"[268, 309]","[250, 284]",,40.78558,11.266735,366.5,41,269.231527,0,0,-1,1,297.133005,258.623153
9,0,,9,,"[275, 307]","[257, 283]",,41.866165,11.421289,378.0,49,283.377088,0,0,-1,1,295.455847,265.315036


---

In [45]:
d1 = df.groupby('mother_lin_id').first()['lin_id']
d2 = df.groupby('mother_lin_id').last()['lin_id']

d3 = pd.concat([d1, d2], axis=1, keys=['d1_lin_id','d2_lin_id'])
d3 = d3.reset_index()

d3.head()


Unnamed: 0,mother_lin_id,d1_lin_id,d2_lin_id
0,-1,0,8
1,0,1,16
2,1,2,30
3,2,3,57
4,3,4,104


In [46]:
df[df['mother_lin_id']==-1] 

Unnamed: 0,id,mother,frames,daughters,new_pole,old_pole,edges,length,width,area,perimeter,fluo1,sublin_idx,lin_id,mother_lin_id,d1_lin_id,x_pos,y_pos
0,0,,0,,"[250, 310]","[236, 298]",,30.766232,11.088337,267.0,30,216.422819,0,0,-1,1,305.362416,243.97651
1,0,,1,,"[253, 311]","[237, 297]",,31.876179,10.988653,273.5,27,234.334426,0,0,-1,1,304.872131,245.321311
2,0,,2,,"[261, 310]","[246, 295]",,31.819805,10.606601,276.0,30,235.65798,0,0,-1,1,303.0,253.693811
3,0,,3,,"[263, 306]","[248, 291]",,33.234016,11.313707,304.5,27,251.801187,0,0,-1,1,299.240356,255.436202
4,0,,4,,"[263, 304]","[247, 287]",,33.941124,11.313707,314.5,31,244.034483,0,0,-1,1,296.017241,254.451149
5,0,,5,,"[261, 304]","[246, 284]",,36.573315,11.401754,327.5,34,255.393939,0,0,-1,1,295.068871,253.482094
6,0,,6,,"[267, 305]","[252, 285]",,37.44136,11.257707,341.5,35,246.783069,0,0,-1,1,294.664021,258.902116
7,0,,7,,"[272, 307]","[256, 285]",,38.269344,11.229179,349.5,42,274.351421,0,0,-1,1,296.069767,263.374677
8,0,,8,,"[268, 309]","[250, 284]",,40.78558,11.266735,366.5,41,269.231527,0,0,-1,1,297.133005,258.623153
9,0,,9,,"[275, 307]","[257, 283]",,41.866165,11.421289,378.0,49,283.377088,0,0,-1,1,295.455847,265.315036


2022-06-22 20:33:32.762 python[32980:1788741] AdjustToIronwoodHotKeyChange - CG (hotmod:1) HotKey : hotKey enabled = 1, keyChar=0xfffbffff, virtKey=0x40000, flags=0x0 


In [44]:
d3.iloc[[0]]['mother_lin_id']

KeyError: 'mother_lin_id'

In [37]:
d3[d3['mother_lin_id']==0]

KeyError: 'mother_lin_id'

## File Saving
This would be a good time to save your data.  
You can save the position file using `pos.save(filename=filename, save_format='pickle')`. We won't do this now as we do not want to accidentally corrupt our data.

Instead we just save the dataframe:

In [22]:
save_name = processed_dir / pos_name.replace('.pkl','_df.pkl')
df.to_pickle(save_name)