# STARDUST interactive

This Jupyter Notebook provides an interactive version of STARDUST signal extraction. Functions are defined and imported from util.py. Use help() to see the documentation. For a more detailed description of the STARDUST pipeline, please see the biorxiv paper and github page for more information. 

## 1. Environment set up

In [None]:
import pandas as pd, numpy as np, seaborn as sns
from src.STARDUST.util import * 

## 2. Data input

Run the next code block to read in input files and information of the experiment. Enter the information accordingly in the prompted text boxes. 

In [None]:
time_series_path, ROA_mask_path, cell_mask_path, output_path = prompt_input()
drug_frame, frame_rate, spatial_resolution = get_metadata()

Next, read in ROA mask and cell mask. 

In [None]:
ROA_map_array, ROA_map_labeled, ROA_map_count = read_tif(ROA_mask_path, "ROA")
cell_map_array, cell_map_labeled, cell_count = read_tif(cell_mask_path, "cell")

Optional: visualize ROA and cell masks. 

In [None]:
visualize_map(ROA_map_array = ROA_map_array, cell_map_array = cell_map_array)

## 3. Signal preprocessing

In [None]:
# find raw traces and create filtered traces
raw_traces, filtered_traces = raw_to_filtered(time_series_path, order = 4, cutoff = 0.4)

In [None]:
ROA_count, frame_count = check_traces(filtered_traces)

### Optinal: Signal correction using linear regression

This optional step uses the correct_shift() function to detect and correct gradual linear drift in the traces. Ideally, the slope distribution histogram should roughly center around zero. If the distribution is not centered around zero, it might indicate photobleaching or a significant z drift during the recording. 

In [None]:
# optional: correct traces for shift using linear regression
corrected_traces, reg = correct_shift(filtered_traces, correction_factor = 0.5)

## 4. Baseline determination and signal detection

In [None]:
# baseline determination
dff_traces, baselines, thresholds, signal_frames, signal_boundaries, signal_threshold = iterative_baseline(corrected_traces, 
                                                                                                           baseline_start = 0, 
                                                                                                           baseline_end = -1, 
                                                                                                           include_incomplete = False)

## Checkpoint 1: dF/F traces heatmap

Visualze dF/F traces using heatmap. Each row represents one ROA and each column represents one frame.  

Note that for heatmap coloring, *vmin* is set at 0, and *vmax* is set at (signal_threshold + 2) * average thresholds across all ROAs. For example, if the signal_threshold is set at 3SD, any transients that has a dF/F value above baseline + 5SD will be colored red to facilitate visualization. You can adjust the vmax parameter if needed. 

In [None]:
sns.heatmap(dff_traces, vmin = 0, vmax = (signal_threshold + 2) * thresholds.mean(), 
            xticklabels=100, yticklabels= False, cmap = 'jet', square = True);

## 5. Signal feature extraction

In [None]:
signal_features = analyze_signal(dff_traces, signal_frames, signal_boundaries, frame_rate, drug_frame)

In [None]:
signal_features.head()

## 6. ROA-based analysis

In [None]:
# add corresponding cell ID to the signal stats
df_ROA_cell = align_ROA_cell(ROA_map_labeled, cell_map_labeled, ROA_map_count, spatial_resolution)

In [None]:
signal_features = pd.merge(df_ROA_cell, signal_features, on = 'ROA_ID', how = 'right')

In [None]:
signal_features.head(10)

### ROA-based analysis

Note that ROAs that do not have any cell assignment is listed as cell 0 for cell ID. 

In [None]:
ROA_based, df_ROA_cell = ROA_analysis(signal_features, df_ROA_cell, frame_count, frame_rate, drug_frame)

In [None]:
ROA_based.head()

### Cell-based averaging of ROA analysis

Note that ROAs that do not have any cell assignment is listed as cell 0 for cell ID. This cell should be omitted in later analysis. 

In [None]:
cell_based = cell_analysis(signal_features, df_ROA_cell)

In [None]:
cell_based.head()

## Checkpoint 2: ROA type summary

In [None]:
ROA_summary = ROA_type_summary(df_ROA_cell)
ROA_summary

## Checkpoint 3: Individual traces
Use the inspect_trace() function to visualize traces. 

### Visualize inactive ROAs
The following example checks all ROAs that are "inactive" based on our pipeline but was intially determined as active ROA by AQuA. 

In [None]:
inactive_ROAs = df_ROA_cell[df_ROA_cell['ROA_type'] == 'inactive']['ROA_ID'].to_list()
inspect_trace(inactive_ROAs, dff_traces, baselines, thresholds, drug_frame)

### Visualize ROAs with large slope during optional correction

In [None]:
check_ROAs = pull_largeslope_traces(ROA_count, reg)
inspect_trace(check_ROAs, dff_traces, baselines, thresholds, drug_frame)

### Visualize a few randomly selected ROAs

In [None]:
random_ROAs = np.random.choice(ROA_count, 10)
random_ROAs.sort()
inspect_trace(random_ROAs, dff_traces, baselines, thresholds, drug_frame)

## 7. Data output

In [None]:
metadata = metadata_todf()
output_data(save_as = 'csv')