# MitoTNT: Mitochondrial Temporal Network Tracking

**MitoTNT** is a Python-based pipeline for the tracking, visualization, and dynamic analysis of 4D mitochondrial network data.

It is built upon mitochondria segmentation provided by MitoGraph, and visualization engine provided by ChimeraX.  

MitoTNT is written by Zichen (Zachary) Wang (ziw056@ucsd.edu), with the help from people in the [Johannes Schöneberg lab](https://www.schoeneberglab.org/) at UCSD.


# Installation

Software requirements:
- **[Jupyter Notebook](https://jupyter.org/)** or get it from **[Anaconda](https://www.anaconda.com/products/distribution)**

- **[MitoGraph](https://github.com/vianamp/MitoGraph/)** for mitochondria segmentation

- **[ChimeraX](https://www.cgl.ucsf.edu/chimerax/)** for tracking visualization

The following python packages are needed:

- **numpy**

- **scipy**

- **pandas**

- **igraph**

- **fastdist**

To install all packages, open command line and go to the root directory of MitoTNT:  

``
pip install -r python_dependencies.txt
``


In [None]:
import os
import numpy as np
import pandas as pd
import igraph as ig
from tqdm.notebook import trange
from mitotnt import generate_tracking_inputs, network_tracking, tracking_visualization, detect_fusion_fission, post_analysis

# update functions everytime after changes made
%load_ext autoreload
%autoreload 2

# Network Tracking
## 1. Generate inputs for tracking
**In this section we will process the raw data into a format that is used for the subsequent tracking.**

First specify the directories we will use:
- `work_dir`: the directory where data will be processed and stored. For test data, you can use the directory of `test_data` on your machine
- `data_dir`: the directory where MitoGraph segmented mitochondria is stored. For test data, this is `test_data/mitograph`.
- `input_dir`: the directory where the processed inputs used for tracking will be stored. For test data, you can use `test_data/tracking_input`. This is an empty folder that will be created.

After specifying the folders, we need to set a few parameters:
- `start_frame`, `end_frame`: the range of frames to process.
- `frame_interval`: frame interval for the movie in seconds
- `node_gap_size`: the number of nodes to skip when creating full-resolution graphs from mitograph `.gnet` files. Default to 0 (use all nodes).

All processed inputs will be saved as a single compressed `.npz` file in `input_dir`.

In [None]:
# specify your desired directories
work_dir = 'D:/Python Scripts/Mito/MitoTNT/test_data/'
data_dir = work_dir+'mitograph/'
input_dir = work_dir+'tracking_input/'
if not os.path.isdir(input_dir):
    os.mkdir(input_dir)

# select frames to process
start_frame = 0
end_frame = 10
frame_interval = 3.253

print('Tracking frame {} to {}'.format(start_frame, end_frame))

In [None]:
# run the function
generate_tracking_inputs.generate(data_dir, input_dir,
                                  start_frame, end_frame,
                                  node_gap_size=0 # choose the number of nodes to skip when creating full-resolution graphs, default to 0 (use all nodes)
                                  )

## 2. Frame-to-frame tracking
**In this section we will perform node assignments for each consecutive frames.**

In addition to the directories declared above, we will create `output_dir` to store the tracking outputs. For test data, you can use `test_data/tracking_output`.

Additional parameters needed for frame-to-frame tracking:
- `tracking_interval`: the frame interval between the two frames to be tracked. Default to 1 (every consecutive frame).
- `distance_cutoff_mode`: cutoff used to eliminate nodes too far away.  
If 'neighbor', use the distance to N-th closest neighbor, where N is given by `cutoff_num_neighbor`, default to 10.  
If 'speed', use the frame interval (s) x maximum allowed speed given by `cutoff_speed`, defualt to 1 μm/s.
- `graph_matching_depth`: the maximum level used for graph comparison. Default to 2 (usually sufficient).
- `dist_exponent`, `top_exponent`: the final cost term is given by D<sup>dist_exponent</sup> x T<sup>top_exponent</sup>, where D, T are the distance and topology costs respectively. Default both to 1 (equal weighting).


In [None]:
# specify additional directories
output_dir = work_dir+'tracking_output/'
if not os.path.isdir(output_dir):
    os.mkdir(output_dir)

In [None]:
# run the function
network_tracking.frametoframe_tracking(input_dir, output_dir, start_frame, end_frame, frame_interval,
                                       distance_cutoff_mode='neighbor', cutoff_num_neighbor=10,
                                       graph_matching_depth=2, dist_exponent=1, top_exponent=1)

## 3. Gap closing
**In this section we attempt to merge tracks that are mistakenly terminated during frame-to-frame tracking.**

Additional parameters need to be set:
- `min_track_size`: the minimum number of frames for the tracks to be kept. Default to 5.
- `max_gap_size`: the maximum number of frames for which gap closing is allowed. Default to 3. Value of 1 indicates no gap closing.
- `memory_efficient_gap_closing`: if true use sliding block implementation of gap closing to prevent memory overflow. Default to false.

The final node trajectories are saved in `final_node_tracks.csv` file.  
Each row is one node at one time point.  
Each column is an attribute of the given node, described below.

In [None]:
network_tracking.gap_closing(input_dir, output_dir, start_frame, end_frame,
                             min_track_size=4, max_gap_size=3, memory_efficient_gap_closing=True)

## 4. Evaluate output

The final node trajectories are saved in final_node_tracks.csv file.  
Each row is one node at one time point.  
Each column is an attribute of the given node, described below.  

### Columns

- `frame_id`: frame number of the node.
- `frame_node_id`: node id at the given frame. Each frame has its own indexing.
- `unique_node_id`: node id shared by all the nodes in the same track at different frames. Each track is uniquely indexed throughout the whole trajectory. This is essetially the tracking information.
- `frame_seg_id`: segment id for all the nodes in the same segment. Each frame has its own indexing. The branching points are not assigned.
- `frame_frag_id`: fragment id for all the nodes in the same connected component. Each frame has its own indexing.
- `connected_unique_node_id`: unique_node_id for neigboring nodes in the graph. This has all the topology information.
- `x`, `y`, `z`: coordinates for the node.
- `intensity`, `width`: pixel intensity and tubular width for the node given by MitoGraph.

# Visualization
**In this section we will visualize the tracked mitochondrial networks in ChimeraX**

**Please first download [ChimeraX](https://www.cgl.ucsf.edu/chimerax/)**

We need to specify the directory to save visualization files
- `vis_dir`: store `.cxc` commands to load in ChimeraX. You can use `work_dir/chimerax_visualization/` for example.
- `vis_data_dir`: store `.cmap ` and `.bild` files created for each frame and used for visualization. You can use `vis_dir/data/` for example.

In [None]:
# specify the directory for storing processed data
vis_dir = work_dir+'chimerax_visualization/'
vis_data_dir = vis_dir+'data/'
if not os.path.isdir(vis_dir):
    os.mkdir(vis_dir)
if not os.path.isdir(vis_data_dir):
    os.mkdir(vis_data_dir)

# visualizing tracks can take some time, recommend to start with a few frames
start_frame = 0
end_frame = 5

## 1. Transform .tif to match MitoGraph coordinates (optional)
Because MitoGraph does coordinate transformation, original `.tif` files need to be transformed.\
This is only needed if you want to show fluorescence cloud when visualizing tracking.
- `voxel_size`: provide the voxel_size same as inputs for MitoGraph segmentation, in the format of 'x_size y_size z_size'.  
For example, `voxel_size='0.2 0.2 0.4'` refers to lateral pixel size 0.2 μm and axial pixel size 0.4 μm.

In [None]:
tracking_visualization.generate_transformed_tif(data_dir, vis_dir, vis_data_dir,
                                                start_frame, end_frame, voxel_size=['0.145', '0.145', '0.145'])

## 2. Create ChimeraX rendering of the skeleton (optional)
We can use MitoGraph-generated `*skeleton.vtk` files for visualizing skeleton, but this is not ideal because it has fixed width and color.\
Alternatively here, we can render the skeleton using BILD format in ChimeraX. This allows us to set the skeleton sizes, node sizes and color. However, it also takes much longer to load in ChimeraX.
- `skeleton_colors`: a list of colors to render. Typically two colors are needed in order to differentiate current and next frames.\
We use blue for current frame and red for next frame.
- `skeleton_size`: diameter of the cynlinder that connects nodes. Default to 0.2.
- `node_size`: diameter of the spheres that make up the nodes. \
If `node_size`= `skeleton_size`, the nodes are not obvious (but needed to fill the holes between skeletons). Default to 0.2.

In [None]:
tracking_visualization.generate_chimerax_skeleton(input_dir, vis_dir, vis_data_dir,
                                                  start_frame, end_frame,
                                                  skeleton_colors=['blue','red'], # colors for current and next frames
                                                  skeleton_size=0.2, node_size=0.2)

## 3. Create ChimeraX rendering of tracking vectors
We will use the frame-to-frame node assignments to draw the tracking vectors for two frames.
- `arrow_color`: color of the tracking arrows. Default to black.
- `arrow_size`: diameter of the arrow head. Default to 0.3.

In [None]:
tracking_visualization.generate_tracking_arrows(input_dir, output_dir, vis_data_dir, start_frame, end_frame,
                                                arrow_color='black', arrow_size=0.3)

## 4. Visualize network tracking in ChimeraX
Now we can combine the visualization files created above to visualize the tracking of timeseries data.
- `show_tif`: if true include fluorescence cloud in background
- `use_chimerax_skeleton`: if true use BILD format skeleton which is more flexible but slower to load, if false use mitograph-generated .vtk files of fixed color and size  

**Open chimerax_visualization/visualize_tracking.cxc in ChimeraX. This may take some time. Click Home -> Backgound -> White to see it better.**

In [None]:
tracking_visualization.visualize_tracking(data_dir, input_dir, vis_dir, vis_data_dir,
                                          start_frame, end_frame,
                                          skeleton_colors = ['blue','red'], 
                                          show_tif=False, # whether to include fluorescence cloud
                                          use_chimerax_skeleton=False # whether to use chimerax skeleton which is more flexible but slower to load, if false mitograph .vtk files are used
                                          )

# Detect Remodeling Events
**In this section we will detect nodes that undergo fusion or fission events based on the tracking results.**

This is done using a sliding-window approach to identify nodes that undergo persistent structural changes as opposed to transient segmentation differences.  
First, the fragment indices for each node are recorded for the `half_win_size` frames before and after the current frame, to form the fragment list.  
Second, for each network edge, the fragment lists for the connected nodes are compared.  
Finally, Fission will be declared if the fragment lists before the current frame are strictly identical, as well as the fragment lists after the current frame are strictly non-overlapping. 
Since fusion events can be considered as fission events reversed in time, the opposite criterion is used for fusion detection.

Note because of the sliding window approach:
`start_frame` must be >= `half_win_size` and `end_frame` must be <= total number of frames - `half_win_size`

Please specify:

- `stride_size`: step size for sliding the window in number of frames. Default to 1 (to detect events happening in every frame).

- `half_win_size`: size of the half sliding window in number of frames. The higher the value the stricter the requirement for calling fusion/fission. Default to 4.

- `min_tracked_frames`: minimum number of frames that are tracked in both half window in order to declare an event. Default to 2.

In [None]:
detect_fusion_fission.detect(input_dir, output_dir, start_frame=4, end_frame=89,
                             stride_size=1, # step size in frames for the sliding window
                             half_win_size=4, # size in frames for the half sliding window
                             min_tracked_frames=2 # minimum number of tracked frames in each half window
                             )

The remodeling events are saved under `tracking_output/remodeling_events.csv`  
Multiple remodeling nodes located in proximity (less than 5 edges away) are grouped into a single fission/fusion site.
Columns in the output:

- `type`: fusion or fission

- `frame_id`: a single frame number for describing when the event happens

- `frame_id_before`: the frame numbers before the event for each detected node (may be different due to gap closing)

- `frame_id_after`: the frame numbers after the event for each detected node (may be different due to gap closing)

- `node_id_before`: the `frame_node_id` at corresponding frame before the event for each detected node

- `node_id_after`: the `frame_node_id` at corresponding frame after the event for each detected node

- `frag_id_before`: the `frame_frag_id` at corresponding frame before the event for each detected node

- `frag_id_after`: the `frame_frag_id` at corresponding frame after the event for each detected node

- `unique_node_id`: the `unique_node_id` for each detected node

`frame_node_id`, `frame_frag_id`, `unique_node_id` are as defined in the node tracking outputs


# Post Analysis
## Motility Measurements
**In this section we will use the tracking results to compute diffusivity at three levels of description and visualize motility in space.**

The diffusision coeffients can be computed for 1) nodes, 2) segments, 3) fragments, in the order of higher level of coarse graining.  

We need to specify the directory for save post analysis results:

- `analysis_dir`: umbrella directory for post analysis. A good choice is `work_dir+'post_analysis/'`. 

- `analy_motility_dir`: directory for saving motility measurements. This is usually a subfolder in `analysis_dir`, for example `analysis_dir+'motility/'`.  

In [None]:
# specify directories
analysis_dir = work_dir+'post_analysis/'
analy_motility_dir = analysis_dir+'motility/'
if not os.path.isdir(analysis_dir):
    os.mkdir(analysis_dir)
if not os.path.isdir(analy_motility_dir):
    os.mkdir(analy_motility_dir)

Information needed for computing MSD vs. time delay curve:

- `max_tau`: maximum number of frames/datapoints used for linear regression. Default to 5.

Additional information for computing segment and fragment level diffusivity:

- `selected_frames`: because segment and fragments undergo constant remodeling, we need to select the frames at which the segment and fragments are evaluated. This is recommended to be separated by 2x half window size (see below). Each frame will be visualized separately.

- `half_win_size `: the time window size (frames) before and after the selected center frames for collecting track coordinates. Default to 10.

- `tracked_ratio`: the minimum ratio of tracked nodes in each segment/fragment to be qualified for calculating diffusivity. Default to 0.5. 

### node diffusivity

In [None]:
post_analysis.compute_node_diffusivity(output_dir, analysis_dir, analy_motility_dir, frame_interval, max_tau=5)

### segment diffusivity

In [None]:
post_analysis.compute_segment_diffusivity(input_dir, output_dir, analysis_dir, analy_motility_dir,
                                          frame_interval, max_tau=5, tracked_ratio=0.3, half_win_size=10, selected_frames=[10])

### fragment diffusivity

In [None]:
post_analysis.compute_fragment_diffusivity(input_dir, output_dir, analysis_dir, analy_motility_dir,
                                           frame_interval, max_tau=5, tracked_ratio=0.3, half_win_size=10, selected_frames=[10])

The data is saved in `analy_motility_dir/*diffusivity.csv` files. Each row is a node/segment/fragment. The columns are explained below.
### Columns

 - `center_frame_id`: selected frame for determining the segment/fragment diffusivity. N/A for node diffusivity.
 
 - `unique_node_id` as in `final_node_tracks.csv`. `seg_id`, `frag_id` are specific to `center_frame_id`.

 - `diffusivity`: slope of MSD vs. time delay curve divided by 6 to account for 3D random walk.

 - `msd`: MSD per frame, euqal to 6 x diffusivity x frame interval

 - `r_squared`: coefficient of determination for the linear regression.

 - `num_points`: number of points in MSD vs. time delay curve used for linear regression.