# STARDUST (cell-based version)

This Jupyter Notebook provides an interactive version of STARDUST for analysis using cell-based fluorescence data (average fluorescence per cell). The required input for this analysis one mask/map and the time series generated from the map. Functions used in this script are defined and imported from util.py. Use help() to see the documentation. 

For a more detailed description of the STARDUST pipeline, please see the biorxiv paper and github page for more information. 

## 1. Environment set up

In [None]:
import pandas as pd, numpy as np, seaborn as sns
from src.STARDUST.util import * 

## 2. Data input

Run the next code block to read in input files and information of the experiment. Enter the information accordingly in the prompted text boxes. 

In [None]:
time_series_path, _, cell_mask_path, output_path = prompt_input(analysis_type = 'cell-based')
drug_frame, frame_rate, spatial_resolution = get_metadata()

In [None]:
cell_map_array, cell_map_labeled, cell_count = read_tif(cell_mask_path, "cell")

## 3. Signal preprocessing

In [None]:
# find raw traces and create filtered traces
raw_traces, filtered_traces = raw_to_filtered(time_series_path)

Check number of cells and number of frames from the input traces. Either raw_traces and filtered_traces works for this step and should give the same results. 

In [None]:
ROA_count, frame_count = check_traces(filtered_traces)

### Optinal: Signal correction using linear regression

This optional step uses the correct_shift() function to detect and correct gradual linear drift in the traces. Ideally, the slope distribution histogram should roughly center around zero. If the distribution is not centered around zero, it might indicate photobleaching or a significant z drift during the recording. 

In [None]:
# optional: correct traces for shift using linear regression
corrected_traces, reg = correct_shift(filtered_traces, correction_factor = 0.5)

## 4. Baseline determination and signal detection

In [None]:
# baseline determination
dff_traces, baselines, thresholds, signal_frames, signal_boundaries, signal_threshold = iterative_baseline(filtered_traces, include_incomplete = True, baseline_end = 200)

## Checkpoint 1: dF/F traces heatmap

Visualze dF/F traces using heatmap. Each row represents one cell and each column represents one frame.  

Note that for heatmap coloring, *vmin* is set at 0, and *vmax* is set at (signal_threshold + 2) * average thresholds across all cells. For example, if the signal_threshold is set at 3SD, any transients that has a dF/F value above baseline + 5SD will be colored red to facilitate visualization. You can adjust the vmax parameter if needed. 

In [None]:
sns.heatmap(dff_traces, vmin = 0, vmax = (signal_threshold + 2) * thresholds.mean(), 
            xticklabels=100, yticklabels= False, cmap = 'jet');

## 5. Signal feature extraction

In [None]:
signal_features = analyze_signal(dff_traces, signal_frames, signal_boundaries, frame_rate, drug_frame)

In [None]:
signal_features.head()

## Checkpoint 2: Individual traces
Use the inspect_trace() function to visualize all traces. 

In [None]:
all_cells = range(1, ROA_count + 1)
inspect_trace(all_cells, dff_traces, baselines, thresholds, drug_frame)

### Visualize a few randomly selected ROAs

In [None]:
random_ROAs = np.random.choice(ROA_count, 10)
random_ROAs.sort()
inspect_trace(random_ROAs, dff_traces, baselines, thresholds, drug_frame)

## 6. Data output

In [None]:
metadata = pd.DataFrame({'frame_rate': [frame_rate], 'spatial_resolution': [spatial_resolution],
                        'drug_frame': [drug_frame], 'drug_time': [drug_frame/frame_rate], 
                        'signal_threshold': [signal_threshold]})
output_data(output_path, metadata, dff_traces, signal_features, save_as = 'csv')