# Creating Approximated Gene Expression Trajectories AGETs

This Notebook explains how to create Inferred Gene Expression Trajectories with the help of ICP registration with the Python package *open3d*. An AGET is an artificially constructed data type that describes movement and gene expression of specific cells over a specific time period. With AGETs, we are able to combine spatial information about cell movement from video data with gene expression information from image data for an arbitrary number of genes. We thereby create a data type that represents gene expression dynamics of single cells as it moves through space. AGETs can be used to infer gene regulatory networks in dynamic tissues. An example of inferring GRN parameters using MCMC can be found in the notebook *MCMC with AGETs*.

To create AGETS, we need cell tracking data that describes the movement of cells for the time period of interest and expression data that describes gene expression at specific time points for that time period.

Our biological tissue of interest is the presomitic mesoderm (PSM) of the Zebrafish. In the image below you can see three AGETs of cells as they move through the PSM.

<br/><br/>
![title](img/3_IGETs.png)
<br/><br/>

In the following, a step by step explanation on how to create the AGETs from above is shown. In the first section, a number of helper functions for point cloud creation and registration are created. Parts of these functions have been taken from the [ICP-registration tutorial for open3d](http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?), which I can only recommend you to have a look at. For our use case, a point in a point cloud describes a cell. This cell always has 3-dimensional coordinates x, y & z. Some cells also have associated gene expression levels (cells in HCR images) or a track ID assigning it to a specific cell track (cells in tracking data). For most functions, there is a 'colored' version. Colored point clouds have positional as well as additional information. In our case that additional information is gene expression. The colored functions can be used to align two point clouds where gene expression information is available in both. Note that this is usually not the case for us, as we are aligning an HCR with gene expression information with a time point in the tracking data without gene expression information. I still left the colored versions in, because one can use them to create an average of multiple HCRs. I tried to average the HCRs first before aligning them with the tracking data and see whether that would lead to a better result. This was not the case. I mapped each HCR on each time point in the tracking data and averaged afterwards per time point. This is how the code below is structured. For other applications, averaging first might improve the result.


In [None]:
import numpy as np
import pandas as pd
import open3d as o3d
import os
import copy
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
from scipy.signal import savgol_filter
from scipy.spatial import KDTree
import pickle

## Registration and Mapping

### Define functions

In the following, a number of functions that are used to align two PSMs  and to map expression levels from one PSM to the other is defined. 


In [None]:
def create_pointclouds( path_source='Source_images/Tbox_SS23-25/Image_3/KS_23ss_2020-06-24 Tbx16 Tbx24 Tbxta Stage Range 40X 0_2020_06_24__09_15_02__p11.xls',
                        path_target='Tracking_data/Tracks_M4_Kay.csv', 
                        target_tracks=True, # Is the target a time point in the tracking data (or another HCR image '=False')
                        tracks_timepoint=1, # 1=start, 61=end, all between possible
                        source_avg_image=False, # Is the source already an average image? 
                        plot_pointclouds=False, # Should the created point clouds be plotted?
                        gene_type='Tbox'): # Either 'Tbox' or 'Wnt' 
    """
    - This function loads a source PSM and a target PSM and creates point cloud objects for both. The point cloud objects 
    are needed for the next steps.
    - The source file is an HCR image (.xls file) , it contains positional information and gene expression information. 
    - The target file is usually a specific time point in the tracking data (.csv file here), which doesn't have gene
    expression information.
    - This function can also be used to map HCR on HCR, then you would use target_tracks=False
    - source_avg_image=True would be used if the source was already an averaged HCR image, then the file type would be different.
    This was tried as an option, but didn't lead to better results. It can be ignored.
    - gene_type='Tbox' is added to distinguish between mapping the Tbox genes (which are 3 genes together in one file)
    or signals (FGF and Wnt which are both measured separately). If YOu want to map signals, just use 'Wnt' instead
    """
    
    '''
    Load target data:
    - If target_tracks=True then the target is the tracking data and a timepoint has to be chosen
    '''
    if target_tracks == True:
        df_tracks = pd.DataFrame(
                    pd.read_csv(path_target, sep=";"),
                    columns=[ 
                        "Position X Reference Frame",
                        "Time",
                        "TrackID",
                        "Position Y Reference Frame",
                        "Position Z Reference Frame",
                        "color",
                        ],
                    )
        df_target = df_tracks.loc[df_tracks['Time']==tracks_timepoint, ['Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame', 'TrackID']]
        df_original_coordinates = df_tracks.loc[df_tracks['Time']==tracks_timepoint, ['Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame', 'TrackID']]
        df_target.columns = [ 'X', 'Y', 'Z', 'TrackID']
        df_target = df_target.merge(df_original_coordinates, left_on='TrackID', right_on='TrackID')
        target_AP_distance = df_target.X.max() - df_target.X.min()
        xyz_tp = df_target.loc[:, ['X', 'Y', 'Z']].values
        target = o3d.geometry.PointCloud()
        target.points = o3d.utility.Vector3dVector(xyz_tp)
        target.paint_uniform_color((0, 0.651, 0.929))
    else:
        if gene_type=='Tbox':
            df_target_full = pd.read_excel(path_target, sheet_name='Position Reference Frame', skiprows=1)
        else:
            df_target_full = pd.read_csv(path_target)
        df_target = df_target_full[['Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame']]
        df_target.columns = [ 'X', 'Y', 'Z']
        data_AP_distance = df_target.X.max() - df_target.X.min() #X-distance will be used to re-scale source image to be approximately same size as target
        df_target.loc[:,'X'] = df_target.X/data_AP_distance * target_AP_distance
        df_target.loc[:,'Y'] = df_target.Y/data_AP_distance * target_AP_distance
        df_target.loc[:,'Z'] = df_target.Z/data_AP_distance * target_AP_distance
        xyz_tar = df_target.values
        target = o3d.geometry.PointCloud()
        target.points = o3d.utility.Vector3dVector(xyz_tar)
        target.paint_uniform_color((0, 0.651, 0.929))

    '''
    Load source data:
    - Source data is always an HCR image with positional and gene expression information
    '''
    if source_avg_image == True:
        df_source = pd.DataFrame(
                    pd.read_csv('Data_produced/HCR_on_HCR_mappings/HCRs_average_on_image_4.csv', sep=","),
                    columns=[
                        "X",
                        "Y",
                        "Z",
                        "mean_g1",
                        "mean_g2",
                        "mean_g3",
                        ],
                    )
        df_source.columns = ['X', 'Y', 'Z','g1_Median','g2_Median','g3_Median']
    else:
        if gene_type=='Tbox':
            which_measure = 'Median' 
            channel_tbxta, channel_tbx16, channel_tbx24 = [6, 7, 8]
            sheet_names = ['Position Reference Frame', 
                            f'Intensity {which_measure} Ch={channel_tbxta} Img=1',
                            f'Intensity {which_measure} Ch={channel_tbx16} Img=1',
                            f'Intensity {which_measure} Ch={channel_tbx24} Img=1']
            df_dict = pd.read_excel(path_source, sheet_name=sheet_names, skiprows=1)
            df_position = df_dict['Position Reference Frame'][['ID','Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame']]
            df_position.columns = ['ID', 'X', 'Y', 'Z']
            df_Tbxta = df_dict[f'Intensity {which_measure} Ch={channel_tbxta} Img=1'][['ID',f'Intensity {which_measure}']]
            df_Tbxta.columns = ['ID', f'g1_{which_measure}']
            df_Tbx16 = df_dict[f'Intensity {which_measure} Ch={channel_tbx16} Img=1'][['ID',f'Intensity {which_measure}']]
            df_Tbx16.columns = ['ID', f'g2_{which_measure}']
            df_Tbx24 = df_dict[f'Intensity {which_measure} Ch={channel_tbx24} Img=1'][['ID',f'Intensity {which_measure}']]
            df_Tbx24.columns = ['ID', f'g3_{which_measure}']
            df_source = df_position.merge(df_Tbxta, on='ID').merge(df_Tbx16, on='ID').merge(df_Tbx24, on='ID')
        
            # Normalize gene expression
            df_source.g1_Median = df_source.loc[:,'g1_Median']/np.max(df_source.loc[:,'g1_Median'])
            df_source.g2_Median = df_source.loc[:,'g2_Median']/np.max(df_source.loc[:,'g2_Median'])
            df_source.g3_Median = df_source.loc[:,'g3_Median']/np.max(df_source.loc[:,'g3_Median'])
        
        else:
            df_source = pd.read_csv(path_source)[['ID','Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame', 'Intensity Median']]
            df_source.columns = ['ID', 'X', 'Y', 'Z', 'Intensity']
            # Normalize gene expression
            df_source.loc[:, 'Intensity'] = (df_source.loc[:,'Intensity']-np.min(df_source.loc[:,'Intensity']))/(np.max(df_source.loc[:,'Intensity'])-np.min(df_source.loc[:,'Intensity']))

    source_AP_distance = df_source.X.max() - df_source.X.min() 
    df_source.loc[:,'X'] = df_source.X/source_AP_distance * target_AP_distance
    df_source.loc[:,'Y'] = df_source.Y/source_AP_distance * target_AP_distance
    df_source.loc[:,'Z'] = df_source.Z/source_AP_distance * target_AP_distance

    xyz_sou = df_source.loc[:, ['X', 'Y', 'Z']].values
    source = o3d.geometry.PointCloud()
    source.points = o3d.utility.Vector3dVector(xyz_sou)
    source.paint_uniform_color((1, 0.706, 0))

    if plot_pointclouds==True:
        print('Source (orange) \nTarget (blue)')
        o3d.visualization.draw_geometries([source])
        o3d.visualization.draw_geometries([target])
    
    return source, target, df_source, df_target

def draw_registration_result(source, target, transformation):
    '''
    Source of function: http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    This helper function visualizes the transformed source point cloud together with the target point cloud:
    blue is target.
    '''
    source_temp = copy.deepcopy(source)
    target_temp = copy.deepcopy(target)
    source_temp.paint_uniform_color([1, 0.706, 0])
    target_temp.paint_uniform_color([0, 0.651, 0.929])
    source_temp.transform(transformation)
    o3d.visualization.draw_geometries([source_temp, target_temp])

def preprocess_point_cloud(pcd, voxel_size):
    '''
    Source of function: http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    FPFH features are used to align the two point clouds
    We estimate normals, then compute a FPFH feature for each point.
    The FPFH feature is a 33-dimensional vector that describes the local geometric property of a point.
    A nearest neighbor query in the 33-dimensinal space can return points with similar local geometric structures
    '''

    radius_normal = voxel_size * 5 # value can be adjusted based on problem, http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    pcd.estimate_normals(
        o3d.geometry.KDTreeSearchParamHybrid(radius=radius_normal, max_nn=30))

    radius_feature = voxel_size * 10 # value can be adjusted based on problem, http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    pcd_fpfh = o3d.pipelines.registration.compute_fpfh_feature(
        pcd,
        o3d.geometry.KDTreeSearchParamHybrid(radius=radius_feature, max_nn=100))
    return pcd_fpfh

def execute_global_registration(source, target, source_fpfh,
                                target_fpfh, voxel_size):
    '''
    Source of function: http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    To initialize point-to-plane ICP, we need a rough alignment which is done here.
    '''
    distance_threshold = voxel_size * 5
    result = o3d.pipelines.registration.registration_ransac_based_on_feature_matching(
        source, target, source_fpfh, target_fpfh, True,
        distance_threshold,
        o3d.pipelines.registration.TransformationEstimationPointToPoint(False),
        3, [
            o3d.pipelines.registration.CorrespondenceCheckerBasedOnEdgeLength(
                0.9),
            o3d.pipelines.registration.CorrespondenceCheckerBasedOnDistance(
                distance_threshold)
        ], o3d.pipelines.registration.RANSACConvergenceCriteria(100000, 0.999))
    return result

def registration_source_target(source, target, plot_pointclouds=True, plot_convex_hulls=True):
    '''
    Aligning source and target, output transformed coordinates and transformation matrix
    '''
    voxel_size = 4  # value can be adjusted based on problem, http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    threshold = 10 # value can be adjusted based on problem, http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    source_fpfh = preprocess_point_cloud(source, voxel_size)
    target_fpfh = preprocess_point_cloud(target, voxel_size)
    result_ransac = execute_global_registration(source, target,
                                                source_fpfh, target_fpfh,
                                                voxel_size) # result_ransac.transformation is the transformation matrix
    reg_p2l = o3d.pipelines.registration.registration_icp(
        source, target, threshold, result_ransac.transformation,
        o3d.pipelines.registration.TransformationEstimationPointToPlane())
    coord_transf = np.asarray(copy.deepcopy(source).transform(reg_p2l.transformation).points)

    if plot_pointclouds == True:
        draw_registration_result(source, target, reg_p2l.transformation)

    if plot_convex_hulls == True:
        hull_source, _ = copy.deepcopy(source).transform(reg_p2l.transformation).compute_convex_hull()
        hull_ls_source = o3d.geometry.LineSet.create_from_triangle_mesh(hull_source)
        hull_ls_source.paint_uniform_color((1, 0.706, 0))
        hull_target, _ = target.compute_convex_hull()
        hull_ls_target = o3d.geometry.LineSet.create_from_triangle_mesh(hull_target)
        hull_ls_target.paint_uniform_color((0, 0.651, 0.929))
        print('Source (orange) \nTarget (blue)')
        o3d.visualization.draw_geometries([hull_ls_source, hull_ls_target])
    return coord_transf, reg_p2l

def create_colored_pointclouds( path_source,
                                path_target,
                                gene_type,  
                                plot_pointclouds=False):
    
    '''
    Same as create_pointclouds, but with color. That means that additionally to the location information, 
    we use the gene expression information to align the point clouds. This was used to generate an averaged PSM from
    all HCRs, but will not be used for the creation of AGETs.
    '''
    
    '''
    Load target data:
    '''
    if gene_type=='Tbox':
        which_measure = 'Median' 
        channel_tbxta, channel_tbx16, channel_tbx24 = [6, 7, 8]
        sheet_names = ['Position Reference Frame', 
                        f'Intensity {which_measure} Ch={channel_tbxta} Img=1',
                        f'Intensity {which_measure} Ch={channel_tbx16} Img=1',
                        f'Intensity {which_measure} Ch={channel_tbx24} Img=1']
        df_dict = pd.read_excel(path_target, sheet_name=sheet_names, skiprows=1)
        df_position = df_dict['Position Reference Frame'][['ID','Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame']]
        df_position.columns = ['ID', 'X', 'Y', 'Z']
        df_Tbxta = df_dict[f'Intensity {which_measure} Ch={channel_tbxta} Img=1'][['ID',f'Intensity {which_measure}']]
        df_Tbxta.columns = ['ID', f'g1_{which_measure}']
        df_Tbx16 = df_dict[f'Intensity {which_measure} Ch={channel_tbx16} Img=1'][['ID',f'Intensity {which_measure}']]
        df_Tbx16.columns = ['ID', f'g2_{which_measure}']
        df_Tbx24 = df_dict[f'Intensity {which_measure} Ch={channel_tbx24} Img=1'][['ID',f'Intensity {which_measure}']]
        df_Tbx24.columns = ['ID', f'g3_{which_measure}']
        df_target = df_position.merge(df_Tbxta, on='ID').merge(df_Tbx16, on='ID').merge(df_Tbx24, on='ID')
        target_AP_distance = df_target.X.max() - df_target.X.min()

        # Normalize gene expression
        df_target.g1_Median = df_target.loc[:,'g1_Median']/np.max(df_target.loc[:,'g1_Median'])
        df_target.g2_Median = df_target.loc[:,'g2_Median']/np.max(df_target.loc[:,'g2_Median'])
        df_target.g3_Median = df_target.loc[:,'g3_Median']/np.max(df_target.loc[:,'g3_Median'])

        xyz_tar = df_target.loc[:, ['X', 'Y', 'Z']].values
        target = o3d.geometry.PointCloud()
        target.points = o3d.utility.Vector3dVector(xyz_tar)
        target.paint_uniform_color((1, 0.706, 0))
        np.asarray(target.colors)[:] = np.array(df_target.loc[:, [f'g1_{which_measure}', f'g2_{which_measure}', f'g3_{which_measure}']].values, dtype=np.float64)
    else:
        df_target = pd.read_csv(path_target)[['ID','Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame', 'Intensity Median']]
        df_target.columns = ['ID', 'X', 'Y', 'Z', 'Intensity']

        # Normalize gene expression
        df_target.loc[:, 'Intensity'] = (df_target.loc[:,'Intensity']-np.min(df_target.loc[:,'Intensity']))/(np.max(df_target.loc[:,'Intensity'])-np.min(df_target.loc[:,'Intensity']))
        
        xyz_tar = df_target.loc[:, ['X', 'Y', 'Z']].values
        target = o3d.geometry.PointCloud()
        target.points = o3d.utility.Vector3dVector(xyz_tar)
        target.paint_uniform_color((1, 0, 0))
        np.asarray(target.colors)[:, 0] = np.array(df_target.loc[:, [f'Intensity']], dtype=np.float64).flatten()

    '''
    Load source data:
    '''
    if gene_type=='Tbox':
        which_measure = 'Median' 
        channel_tbxta, channel_tbx16, channel_tbx24 = [6, 7, 8]
        sheet_names = ['Position Reference Frame', 
                        f'Intensity {which_measure} Ch={channel_tbxta} Img=1',
                        f'Intensity {which_measure} Ch={channel_tbx16} Img=1',
                        f'Intensity {which_measure} Ch={channel_tbx24} Img=1']
        df_dict = pd.read_excel(path_source, sheet_name=sheet_names, skiprows=1)
        df_position = df_dict['Position Reference Frame'][['ID','Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame']]
        df_position.columns = ['ID', 'X', 'Y', 'Z']
        df_Tbxta = df_dict[f'Intensity {which_measure} Ch={channel_tbxta} Img=1'][['ID',f'Intensity {which_measure}']]
        df_Tbxta.columns = ['ID', f'g1_{which_measure}']
        df_Tbx16 = df_dict[f'Intensity {which_measure} Ch={channel_tbx16} Img=1'][['ID',f'Intensity {which_measure}']]
        df_Tbx16.columns = ['ID', f'g2_{which_measure}']
        df_Tbx24 = df_dict[f'Intensity {which_measure} Ch={channel_tbx24} Img=1'][['ID',f'Intensity {which_measure}']]
        df_Tbx24.columns = ['ID', f'g3_{which_measure}']
        df_source = df_position.merge(df_Tbxta, on='ID').merge(df_Tbx16, on='ID').merge(df_Tbx24, on='ID')
        source_AP_distance = df_source.X.max() - df_source.X.min() 
        df_source.loc[:,'X'] = df_source.X/source_AP_distance * target_AP_distance
        df_source.loc[:,'Y'] = df_source.Y/source_AP_distance * target_AP_distance
        df_source.loc[:,'Z'] = df_source.Z/source_AP_distance * target_AP_distance

        # Normalize gene expression
        df_source.g1_Median = df_source.loc[:,'g1_Median']/np.max(df_source.loc[:,'g1_Median'])
        df_source.g2_Median = df_source.loc[:,'g2_Median']/np.max(df_source.loc[:,'g2_Median'])
        df_source.g3_Median = df_source.loc[:,'g3_Median']/np.max(df_source.loc[:,'g3_Median'])

        xyz_sou = df_source.loc[:, ['X', 'Y', 'Z']].values
        source = o3d.geometry.PointCloud()
        source.points = o3d.utility.Vector3dVector(xyz_sou)
        source.paint_uniform_color((1, 0.706, 0))
        np.asarray(source.colors)[:] = np.array(df_source.loc[:, [f'g1_{which_measure}', f'g2_{which_measure}', f'g3_{which_measure}']].values, dtype=np.float64)
    else:
        df_source = pd.read_csv(path_source)[['ID','Position X Reference Frame', 'Position Y Reference Frame', 'Position Z Reference Frame', 'Intensity Median']]
        df_source.columns = ['ID', 'X', 'Y', 'Z', 'Intensity']
        
        # Normalize gene expression
        df_source.loc[:, 'Intensity'] = (df_source.loc[:,'Intensity']-np.min(df_source.loc[:,'Intensity']))/(np.max(df_source.loc[:,'Intensity'])-np.min(df_source.loc[:,'Intensity']))
        
        xyz_tar = df_source.loc[:, ['X', 'Y', 'Z']].values
        source = o3d.geometry.PointCloud()
        source.points = o3d.utility.Vector3dVector(xyz_tar)
        source.paint_uniform_color((1, 0, 0))
        np.asarray(target.colors)[:, 0] = np.array(df_target.loc[:, [f'Intensity']], dtype=np.float64).flatten()

    if plot_pointclouds==True:
        o3d.visualization.draw_geometries([source])
        o3d.visualization.draw_geometries([target])
    
    return source, target, df_source, df_target

def draw_registration_result_original_color(source, target, transformation):
    '''
    Source: http://www.open3d.org/docs/release/tutorial/pipelines/colored_pointcloud_registration.html#Helper-visualization-function
    In order to demonstrate the alignment between colored point clouds, draw_registration_result_original_color renders point clouds with their original color.
    '''
    source_temp = copy.deepcopy(source)
    source_temp.transform(transformation) 
    o3d.visualization.draw_geometries([source_temp, target],
                                      zoom=0.5,
                                      front=[-0.2458, -0.8088, 0.5342],
                                      lookat=[1.7745, 2.2305, 0.9787],
                                      up=[0.3109, -0.5878, -0.7468])

def colored_registration_source_target(source, target, plot_pointclouds=True, plot_convex_hulls=True):
    radius = 4 # value can be adjusted based on problem, http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    voxel_size = 4 # value can be adjusted based on problem, http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html?#
    iter = 100
    source.estimate_normals(
            o3d.geometry.KDTreeSearchParamHybrid(radius=radius * 10, max_nn=30))
    target.estimate_normals(
            o3d.geometry.KDTreeSearchParamHybrid(radius=radius * 10, max_nn=30))
    result_icp = o3d.pipelines.registration.registration_colored_icp(
            source, target, 19, np.identity(4),
            o3d.pipelines.registration.TransformationEstimationForColoredICP(),
            o3d.pipelines.registration.ICPConvergenceCriteria(relative_fitness=1e-6,
                                                            relative_rmse=1e-6,
                                                            max_iteration=iter))

    coord_transf = np.asarray(copy.deepcopy(source).transform(result_icp.transformation).points)

    if plot_pointclouds == True:
        draw_registration_result_original_color(source, target, result_icp.transformation)

    if plot_convex_hulls == True:
        hull_source, _ = copy.deepcopy(source).transform(result_icp.transformation).compute_convex_hull()
        hull_ls_source = o3d.geometry.LineSet.create_from_triangle_mesh(hull_source)
        hull_ls_source.paint_uniform_color((1, 0.706, 0))
        hull_target, _ = target.compute_convex_hull()
        hull_ls_target = o3d.geometry.LineSet.create_from_triangle_mesh(hull_target)
        hull_ls_target.paint_uniform_color((0, 0.651, 0.929))
        print('Source (orange) \nTarget (blue)')
        o3d.visualization.draw_geometries([hull_ls_source, hull_ls_target])

    return coord_transf, result_icp

def map_gene_expression_from_source_to_target(path_source='Source_images/Tbox_SS23-25/Image_3/KS_23ss_2020-06-24 Tbx16 Tbx24 Tbxta Stage Range 40X 0_2020_06_24__09_15_02__p11.xls', # path to csv (track) or xls (HCR) file
                                                path_target='Tracking_data/Tracks_M4_Kay.csv', # path to csv (track) or xls (HCR) file
                                                target_track=True, # Whether target is tracking data, normally true 
                                                tracks_timepoint=1, # Which time point in the tracking data
                                                type_icp='icp', # 'icp' for HCR to track, 'colored_icp' for HCR to HCR
                                                n_nearest_neighbors=5, # number of nearest neighbors for the mapping of gene expression from source to target
                                                type_statistic='median', # median or mean of nearest neighbors for averaging
                                                source_avg_image=False, # is the source image already an average HCR?
                                                gene_type='Tbox'): # Tbox for tbxta, tbx16 & tbx24, 'Wnt' or 'FGF' for the signals. Should be generalized for random number of genes in future
    '''
    This functions brings together all of the above defined helper functions into on neat package
    '''
    
    # Create Point Clouds
    # Registration and coordinate transformation
    if type_icp == 'icp':
        source, target, df_source, df_target =  create_pointclouds( path_source,
                                                                path_target, 
                                                                target_track, 
                                                                tracks_timepoint,
                                                                source_avg_image=source_avg_image,
                                                                plot_pointclouds=False,
                                                                gene_type=gene_type)
        transf_coordinates = registration_source_target(source, target, plot_pointclouds=False, plot_convex_hulls=False)[0]
    if type_icp == 'colored_icp':
        source, target, df_source, df_target = create_colored_pointclouds(  path_source,
                                                                            path_target,
                                                                            gene_type=gene_type)
        transf_coordinates = colored_registration_source_target(source, target, plot_pointclouds=False, plot_convex_hulls=False)[0]
        

    df_source.loc[:, 'X_transf'] = np.nan
    df_source.loc[:, 'Y_transf'] = np.nan
    df_source.loc[:, 'Z_transf'] = np.nan
    df_source.loc[:, ['X_transf', 'Y_transf', 'Z_transf']] = transf_coordinates
    
    # Find nearest neighbors in source for every target point
    data_source = df_source.loc[:, ['X_transf', 'Y_transf', 'Z_transf']].values
    data_target = df_target.loc[:, ['X', 'Y', 'Z']].values
    kdB = KDTree(data_source)
    nearest_neighb_ind = kdB.query(data_target, k=n_nearest_neighbors)[-1]
    
    # Map gene expression from source to target based on nearest neighbors
    if gene_type == 'Tbox':
        df_target.loc[:, 'g1_map'] = np.nan
        df_target.loc[:, 'g2_map'] = np.nan 
        df_target.loc[:, 'g3_map'] = np.nan
        col_ind_g1 = df_target.columns.get_loc('g1_map')
        col_ind_g2 = df_target.columns.get_loc('g2_map')
        col_ind_g3 = df_target.columns.get_loc('g3_map')

        for i_df in range(df_target.shape[0]):
            nearest_neighbor_data = df_source.loc[nearest_neighb_ind[i_df], ['g1_Median', 'g2_Median', 'g3_Median']]
            if n_nearest_neighbors == 1:
                df_target.iloc[i_df, col_ind_g1] = nearest_neighbor_data.loc['g1_Median'] 
                df_target.iloc[i_df, col_ind_g2] = nearest_neighbor_data.loc['g2_Median']
                df_target.iloc[i_df, col_ind_g3] = nearest_neighbor_data.loc['g3_Median']
            else:
                if type_statistic == 'mean':
                    df_target.iloc[i_df, col_ind_g1] = np.mean(nearest_neighbor_data.loc[:, 'g1_Median'])
                    df_target.iloc[i_df, col_ind_g2] = np.mean(nearest_neighbor_data.loc[:, 'g2_Median']) 
                    df_target.iloc[i_df, col_ind_g3] = np.mean(nearest_neighbor_data.loc[:, 'g3_Median']) 
                if type_statistic == 'median':
                    df_target.iloc[i_df, col_ind_g1] = np.median(nearest_neighbor_data.loc[:, 'g1_Median']) 
                    df_target.iloc[i_df, col_ind_g2] = np.median(nearest_neighbor_data.loc[:, 'g2_Median']) 
                    df_target.iloc[i_df, col_ind_g3] = np.median(nearest_neighbor_data.loc[:, 'g3_Median']) 

    else:
        df_target.loc[:, 'Intensity'] = np.nan
        col_ind_gene = df_target.columns.get_loc('Intensity')
        for i_df in range(df_target.shape[0]):
            nearest_neighbor_data = df_source.loc[nearest_neighb_ind[i_df], 'Intensity']
            if n_nearest_neighbors == 1:
                df_target.iloc[i_df, col_ind_g1] = nearest_neighbor_data
            else:
                if type_statistic == 'mean':
                    df_target.iloc[i_df, col_ind_gene] = np.mean(nearest_neighbor_data)
                if type_statistic == 'median':
                    df_target.iloc[i_df, col_ind_gene] = np.median(nearest_neighbor_data) 


    return df_target

### Map gene expression from HCR to tracking data

With the defined functions from above, I loop through all time points in the tracking data (one time point = one frame in tracking data) and all source HCR images I want to include. The output of the code below is one data frame with mapped expression from HCR to tracking data for every HCR image and every frame of the tracking data. I additionally plot the data, so that you can visually inspect whether the mapping was successful. I would recommend to verify that the mapped data looks approximately as you expect it. This can be a lot of images, in my case 13x61=793, but remember: Good research is only possible with good data. And unfortunately the ICP-registration is not perfect. So if you happen to have a certain image that registers badly on the tracking data, it is better to exclude it in the following steps. Also note that this step takes quite a while, because evry image is mapped on every time point.


In [None]:
# Create directories for plots and produced data for tbox genes
path = 'Plots_produced'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)
    
path = 'Plots_produced/Tbox_profiles'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)
    
path = 'Plots_produced/Tbox_3D'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)

path = 'Data_produced'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)

path = 'Data_produced/Tbox'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)
    
# Loop to create plots and data
for i_time in range(1,62): # All time points in the tracking data
    for i_image in range(3,16): # The subset of good images with nice expression signal I chose
        print(f'Time:   {i_time} \nImage:  {i_image}')
        folder_path = f'Source_images/Tbox_SS23-25/Image_{i_image}'
        df_comb = map_gene_expression_from_source_to_target(path_source=f'{folder_path}/{os.listdir(folder_path)[0]}',
                                                            path_target='Tracking_data/Tracks_M4_Kay.csv', 
                                                            target_track=True,
                                                            tracks_timepoint=i_time,
                                                            type_icp='icp',
                                                            n_nearest_neighbors=5, 
                                                            type_statistic='median',
                                                            source_avg_image=False).reset_index()
        
        # Normalize PSM
        x_min = np.min(df_comb.loc[:,f'X'])
        x_max = np.max(df_comb.loc[:,f'X'])
        df_comb.loc[:,f'X'] = (df_comb.loc[:,f'X']-x_min)/(x_max-x_min)
        df_comb.loc[:,f'Y'] = df_comb.loc[:,f'Y']/(x_max-x_min)
        df_comb.loc[:,f'Z'] = df_comb.loc[:,f'Z']/(x_max-x_min)

        
        # Profile plot, gene expression vs A-P axis
        which_measure = 'map'
        f, axes = plt.subplots(2, 2, figsize=(16,16))
        
        filter_size = 401
        yhat_1 = savgol_filter(df_comb[f'g1_{which_measure}'][np.argsort(df_comb[f'X'])], filter_size, 3) 
        yhat_1_max = np.amax(yhat_1)
        yhat_1_norm = yhat_1/yhat_1_max
        df_comb.loc[:, f'g1_{which_measure}'] = df_comb.loc[:, f'g1_{which_measure}']/yhat_1_max
        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'g1_{which_measure}'], color='red', alpha=0.7, ax=axes[0,0])
        axes[0,0].set_ylabel('Tbxta')
        axes[0,0].set_ylim(0,1.5)

        yhat_2 = savgol_filter(df_comb[f'g2_{which_measure}'][np.argsort(df_comb[f'X'])], filter_size, 3) 
        yhat_2_max = np.amax(yhat_2)
        yhat_2_norm = yhat_2/yhat_2_max
        df_comb.loc[:, f'g2_{which_measure}'] = df_comb.loc[:, f'g2_{which_measure}']/yhat_2_max
        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'g2_{which_measure}'], color='orange', alpha=0.7, ax=axes[0,1])
        axes[0,1].set_ylabel('Tbx16')
        axes[0,1].set_ylim(0,1.5)

        yhat_3 = savgol_filter(df_comb[f'g3_{which_measure}'][np.argsort(df_comb[f'X'])], filter_size, 3) 
        yhat_3_max = np.amax(yhat_3)
        yhat_3_norm = yhat_3/yhat_3_max
        df_comb.loc[:, f'g3_{which_measure}'] = df_comb.loc[:, f'g3_{which_measure}']/yhat_3_max
        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'g3_{which_measure}'], color='blue', alpha=0.7, ax=axes[1,0])
        axes[1,0].set_ylabel('Tbx24')
        axes[1,0].set_ylim(0,1.5)

        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'g1_{which_measure}'], color='red', alpha=0.2, ax=axes[1,1])
        axes[1,1].plot(df_comb[f'X'][np.argsort(df_comb[f'X'])], yhat_1_norm, color='red')
        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'g2_{which_measure}'], color='orange', alpha=0.2, ax=axes[1,1])
        axes[1,1].plot(df_comb[f'X'][np.argsort(df_comb[f'X'])], yhat_2_norm, color='orange')
        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'g3_{which_measure}'], color='blue', alpha=0.2, ax=axes[1,1])
        axes[1,1].plot(df_comb[f'X'][np.argsort(df_comb[f'X'])], yhat_3_norm, color='blue')
        axes[1,1].set_ylim(0,1.5)
        f.suptitle(f'Image {i_image} mapped on timepoint {i_time}')
        
        # Save plot
        plt.savefig(f'Plots_produced/Tbox_profiles/Image_{i_image}_timepoint_{i_time}_mapped_expressions_Profiles.png')
        
        
        # 3D plot

        fig = plt.figure(figsize=(16,16), dpi=160)

        # Plot tbxta
        ax = fig.add_subplot(3, 2, 1, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                c=df_comb.loc[:,f'g1_{which_measure}'], cmap='Reds', alpha=0.4, s=50)   
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'Tbxta')
        ax.view_init(90, 270)

        ax = fig.add_subplot(3, 2, 2, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                c=df_comb.loc[:,f'g1_{which_measure}'], cmap='Reds', alpha=0.4, s=50)
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'Tbxta')
        ax.view_init(0, 270)

        # Plot tbx16
        ax = fig.add_subplot(3, 2, 3, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                    c=df_comb.loc[:,f'g2_{which_measure}'], cmap='Oranges', alpha=0.4, s=50)
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'Tbx16')
        ax.view_init(90, 270)

        ax = fig.add_subplot(3, 2, 4, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                    c=df_comb.loc[:,f'g2_{which_measure}'], cmap='Oranges', alpha=0.4, s=50)
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'Tbx16')
        ax.view_init(0, 270)

        # Plot tbx24
        ax = fig.add_subplot(3, 2, 5, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                    c=df_comb.loc[:,f'g3_{which_measure}'], cmap='Blues', alpha=0.4, s=50)
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'Tbx24')
        ax.view_init(90, 270)

        ax = fig.add_subplot(3, 2, 6, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                    c=df_comb.loc[:,f'g3_{which_measure}'], cmap='Blues', alpha=0.4, s=50)
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'Tbx24')
        ax.view_init(0, 270)

        fig.suptitle(f'Image {i_image} mapped on timepoint {i_time}', fontsize=12)
        plt.savefig(f'Plots_produced/Tbox_3D/Image_{i_image}_timepoint_{i_time}_mapped_expressions_3D.png')
        df_comb.to_csv(f'Data_produced/Tbox/Image_{i_image}_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')


Exactly the same can be done with the signals.

In [None]:
# Create directories for plots and produced data for signals   
path = 'Plots_produced/Signals_profiles'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)
    
path = 'Plots_produced/Signals_3D'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)

path = 'Data_produced/Signals'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)

# Loop to create plots and data

for i_time in range(1,62):
    for i_image in ['FGF1', 'FGF2', 'FGF3', 'Wnt1', 'Wnt2', 'Wnt3']:
        path_source = f'Source_images/Signals/FGF_Wnt_prepared/{i_image}.csv'
        df_comb = map_gene_expression_from_source_to_target(path_source=path_source,
                                                            path_target='Tracking_data/Tracks_M4_Kay.csv', 
                                                            target_track=True,
                                                            tracks_timepoint=i_time,
                                                            type_icp='icp',
                                                            n_nearest_neighbors=5, 
                                                            type_statistic='median',
                                                            source_avg_image=False,
                                                            gene_type='Wnt').reset_index() # gene type 'Wnt' can also be used for FGF
        # Normalize PSM
        x_min = np.min(df_comb.loc[:,f'X'])
        x_max = np.max(df_comb.loc[:,f'X'])
        df_comb.loc[:,f'X'] = (df_comb.loc[:,f'X']-x_min)/(x_max-x_min)
        df_comb.loc[:,f'Y'] = df_comb.loc[:,f'Y']/(x_max-x_min)
        df_comb.loc[:,f'Z'] = df_comb.loc[:,f'Z']/(x_max-x_min)

        # Profile plot
        fig = plt.figure()
        filter_size = 401
        yhat_1 = savgol_filter(df_comb[f'Intensity'][np.argsort(df_comb[f'X'])], filter_size, 3) 
        yhat_1_max = np.amax(yhat_1)
        yhat_1_norm = yhat_1/yhat_1_max
        df_comb.loc[:, f'Intensity'] = df_comb.loc[:, f'Intensity']/yhat_1_max
        for i_cell in range(len(df_comb.loc[:, f'Intensity'])):
            df_comb.loc[i_cell, f'Intensity'] = max(df_comb.loc[i_cell, f'Intensity']-yhat_1_norm[-1], 0)
        yhat_1_norm = yhat_1_norm-yhat_1_norm[-1]
        sns.scatterplot(x=df_comb[f'X'], y=df_comb[f'Intensity'], color='black', alpha=0.7)
        plt.plot(df_comb[f'X'][np.argsort(df_comb[f'X'])], yhat_1_norm)
        plt.ylabel(f'Intensity')
        #axes[0,0].set_ylim(0,1.5)

        plt.title(f'{i_image} mapped on timepoint {i_time}')
        

        plt.savefig(f'Plots_produced/Signals_profiles/Signal_{i_image}_timepoint_{i_time}_mapped_expressions_Profiles.png')
        plt.close()
        
        # 3d plot
        fig = plt.figure(figsize=(12,16), dpi=160)

        # Plot signal  
        ax = fig.add_subplot(2, 1, 1, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                c=df_comb.loc[:,f'Intensity'], cmap='Reds', alpha=0.4, s=50)   
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'{i_image}')
        ax.view_init(90, 270)

        ax = fig.add_subplot(2, 1, 2, projection='3d')
        sequence_containing_x_vals = df_comb.loc[:,'X']
        sequence_containing_y_vals = df_comb.loc[:,'Y']
        sequence_containing_z_vals = df_comb.loc[:,'Z']
        ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals, 
                c=df_comb.loc[:,f'Intensity'], cmap='Reds', alpha=0.4, s=50)
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
        ax.set_zlabel('Z')
        plt.title(f'{i_image}')
        ax.view_init(0, 270)


        fig.suptitle(f'{i_image} mapped on timepoint {i_time}', fontsize=12)
        plt.savefig(f'Plots_produced/Signals_3D/Signal_{i_image}_timepoint_{i_time}_mapped_expressions_3D.png')
        plt.close()
        df_comb.to_csv(f'Data_produced/Signals/Signal_{i_image}_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')



We have now mapped the tbox expression from 13 images and the Wnt & FGF signals from 3 images onto every time point in the tracking data. For example, we have the gene expression from source image 5 mapped to time point 18, saved as a data frame. In a next step, we will summarize the different data frames to one averaged data frame for every time point. I am excluding source images 7, 9 and 15, because the registration didn't work well for multiple time points. This was defined by visual inspection. The visual inspection works best with the point clouds in 3D space, where the PSMs can be turned around. The code can be adjusted to show the point clouds together in an open3d window. I have chosen to include profile and '3D' plots in this notebook for visual inspection purposes, because they are static images.

In [None]:
# Create directory for averaged data frames for tbox genes
path = 'Data_produced/Tbox_median_every_timepoint'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)

# Summarizing the tbox expressions (from 10 source images) into one by taking the median
for i_time in range(1, 62):
    # Load positional information from first mapping (image 3)
    df_im3 = pd.read_csv(f'Data_produced/Tbox/Image_3_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')
    df_skeleton = df_im3.drop(axis=1, labels=['g1_map', 'g2_map', 'g3_map'])
    print(df_skeleton.head())
    
    g1_summary = df_im3[['g1_map']]
    g2_summary = df_im3[['g2_map']]
    g3_summary = df_im3[['g3_map']]

    # Loop through the dataframes for all 13 (or fewer) images (expression from image 3 already included)
    chosen_images = [4, 5, 6, 8, 10, 11, 12, 13, 14]
    for i_image in chosen_images:
        df_full = pd.read_csv(f'Data_produced/Tbox/Image_{i_image}_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')[['g1_map', 'g2_map', 'g3_map']]
        df_full.columns = [f'g1_map_{i_image}', f'g2_map_{i_image}', f'g3_map_{i_image}']
        g1_summary[f'g1_map_{i_image}'] = df_full[f'g1_map_{i_image}']
        g2_summary[f'g2_map_{i_image}'] = df_full[f'g2_map_{i_image}']
        g3_summary[f'g3_map_{i_image}'] = df_full[f'g3_map_{i_image}']

    df_skeleton['g1_map'] = g1_summary.median(axis=1)
    df_skeleton['g2_map'] = g2_summary.median(axis=1)
    df_skeleton['g3_map'] = g3_summary.median(axis=1)

    df_skeleton.to_csv(f'Data_produced/Tbox_median_every_timepoint/HCRs_chosen_images_summarized_timepoint_{i_time}.csv')


Same for the signals.

In [None]:
# Create directory for averaged data frames for tbox genes
path = 'Data_produced/Signals_median_every_timepoint'
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)

# Summarizing the signals (from 3 source images) into one by taking the median
for i_time in range(1, 62):
    # Load positional information from first mapping (image 3)
    df_FGF1 = pd.read_csv(f'Data_produced/Signals/Signal_FGF1_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')
    df_skeleton = df_FGF1.drop(axis=1, labels=['Intensity'])
    FGF_summary = df_FGF1[['Intensity']]
    df_Wnt1 = pd.read_csv(f'Data_produced/Signals/Signal_Wnt1_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')
    Wnt_summary = df_Wnt1[['Intensity']]
    

    # Loop through the dataframes for all 13 (or fewer) images (expression from image 3 already included)
    signal_sources = ['FGF2', 'FGF3', 'Wnt2', 'Wnt3']
    for i_image in range(2):
        df_full_FGF = pd.read_csv(f'Data_produced/Signals/Signal_{signal_sources[i_image]}_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')[['Intensity']]
        df_full_FGF.columns = [signal_sources[i_image]]
        FGF_summary[f'{signal_sources[i_image]}'] = df_full_FGF[f'{signal_sources[i_image]}']
        df_full_Wnt = pd.read_csv(f'Data_produced/Signals/Signal_{signal_sources[i_image+2]}_timepoint_{i_time}_mapped_expressions_dataframe.csv', sep=';')[['Intensity']]
        df_full_Wnt.columns = [signal_sources[i_image+2]]
        Wnt_summary[f'{signal_sources[i_image+2]}'] = df_full_Wnt[f'{signal_sources[i_image+2]}']

    df_skeleton['FGF_summary'] = FGF_summary.median(axis=1)
    df_skeleton['Wnt_summary'] = Wnt_summary.median(axis=1)

    df_skeleton.to_csv(f'Data_produced/Signals_median_every_timepoint/Signals_summarized_timepoint_{i_time}.csv')

### Combine averaged tbox expression and signals and create AGETs

Now we have an average data frame for tbox expression and for Wnt & FGF expression for every time point (1-61). In the next step, we will combine this data and create the AGETs. For modelling purposes it is helpful to have cell tracks that are as long as possible. In the following, I will first generate list of all cell tracks that start in the first frame and go until the last frame of the tracking data. Then I will create a list that contains all cell tracks.


In [None]:
# Get track IDs from first and last time point
df_tp1 = pd.read_csv('Data_produced/Tbox_median_every_timepoint/HCRs_chosen_images_summarized_timepoint_1.csv', sep=",")
tp1_unique_trackids = df_tp1.TrackID.unique()
df_tp61 = pd.read_csv('Data_produced/Tbox_median_every_timepoint/HCRs_chosen_images_summarized_timepoint_61.csv', sep=",")
tp61_unique_trackids = df_tp61.TrackID.unique()

# Only take tracks that go from start to end
trackids_start_to_end = list(set(tp1_unique_trackids).intersection(tp61_unique_trackids))


list_track_df = []
i_run = 1
for i_trackid in trackids_start_to_end:
    df_tracks_expression = pd.DataFrame(columns=['Time', 'TrackID', 'X', 'Y', 'Z', 'g1', 'g2', 'g3', 'Wnt', 'FGF'])
    for i_time in range(1, 62):
        df_timepoint = pd.read_csv(f'Data_produced/Tbox_median_every_timepoint/HCRs_chosen_images_summarized_timepoint_{i_time}.csv', sep=",")
        df_timepoint_signals = pd.read_csv(f'Data_produced/Signals_median_every_timepoint/Signals_summarized_timepoint_{i_time}.csv', sep=',')
        df_extract = df_timepoint.loc[df_timepoint.TrackID==i_trackid, :].reset_index()
        df_extract_signals = df_timepoint_signals.loc[df_timepoint_signals.TrackID==i_trackid, :].reset_index()

        if not df_extract.empty:
            df_tracks_expression = df_tracks_expression.append({'Time': i_time,
                                                                'TrackID': i_trackid, 
                                                                'X': df_extract.at[0, 'Position X Reference Frame'], 
                                                                'Y': df_extract.at[0, 'Position Y Reference Frame'], 
                                                                'Z': df_extract.at[0, 'Position Z Reference Frame'], 
                                                                'g1': df_extract.at[0, 'g1_map'], 
                                                                'g2': df_extract.at[0, 'g2_map'], 
                                                                'g3': df_extract.at[0, 'g3_map'],
                                                                'Wnt': df_extract_signals.at[0, 'Wnt_summary'],
                                                                'FGF': df_extract_signals.at[0, 'FGF_summary']
                                                                },
                                                                ignore_index=True)
    list_track_df.append(df_tracks_expression)
    print(f'Cell track {i_run}/{len(trackids_start_to_end)}')
    i_run += 1
    
print(len(list_track_df))
with open("Dependencies_simulator/List_of_all_cell_tracks_starttoend.txt", "wb") as fp:   #Pickling
    pickle.dump(list_track_df, fp)


Same, but now for all cell tracks (1903), no matter when they start or end. The track IDs are loaded from a text file, they were extracted from a different program.

In [None]:
with open("Dependencies_simulator/List_of_all_unique_trackids.txt", "rb") as fp:   # Unpickling
            list_of_celltrack_ids = pickle.load(fp)
print(len(list_of_celltrack_ids)) 

list_track_df = []
i_run = 1
for i_trackid in list_of_celltrack_ids:
    df_tracks_expression = pd.DataFrame(columns=['Time', 'TrackID', 'X', 'Y', 'Z', 'g1', 'g2', 'g3', 'Wnt', 'FGF'])
    for i_time in range(1, 62):
        df_timepoint = pd.read_csv(f'Data_produced/Tbox_median_every_timepoint/HCRs_chosen_images_summarized_timepoint_{i_time}.csv', sep=",")
        df_timepoint_signals = pd.read_csv(f'Data_produced/Signals_median_every_timepoint/Signals_summarized_timepoint_{i_time}.csv', sep=',')
        df_extract = df_timepoint.loc[df_timepoint.TrackID==i_trackid, :].reset_index()
        df_extract_signals = df_timepoint_signals.loc[df_timepoint_signals.TrackID==i_trackid, :].reset_index()

        if not df_extract.empty:
            df_tracks_expression = df_tracks_expression.append({'Time': i_time,
                                                                'TrackID': i_trackid, 
                                                                'X': df_extract.at[0, 'Position X Reference Frame'], 
                                                                'Y': df_extract.at[0, 'Position Y Reference Frame'], 
                                                                'Z': df_extract.at[0, 'Position Z Reference Frame'], 
                                                                'g1': df_extract.at[0, 'g1_map'], 
                                                                'g2': df_extract.at[0, 'g2_map'], 
                                                                'g3': df_extract.at[0, 'g3_map'],
                                                                'Wnt': df_extract_signals.at[0, 'Wnt_summary'],
                                                                'FGF': df_extract_signals.at[0, 'FGF_summary']
                                                                },
                                                                ignore_index=True)
    list_track_df.append(df_tracks_expression)
    print(f'Cell track {i_run}/{len(list_of_celltrack_ids)}')
    i_run += 1
    
print(len(list_track_df))
with open("Dependencies_simulator/List_of_all_cell_tracks.txt", "wb") as fp:   #Pickling
    pickle.dump(list_track_df, fp)


Now we have create two files, both are lists containing AGETs. File 1 (Dependencies_simulator/List_of_all_cell_tracks_starttoend.txt) contains all 829 AGETs that go from the first frame of the tracking data until the last frame of the tracking data. File 2 (Dependencies_simulator/List_of_all_cell_tracks.txt) contains all 1903 AGETs, including ones that are only present in parts of the tracking data.

In [None]:
with open("Dependencies_simulator/List_of_all_cell_tracks.txt", "rb") as fp:   # Unpickling
            list_of_cell_tracks = pickle.load(fp)[0:1]
print(list_of_cell_tracks[0].head())