# LBM Step 3: Segmentation

## Segmentation: Extract neuronal locations and planar time-traces.

- Apply the constrained nonnegative matrix factorization (CNMF) source separation algorithm to extract initial estimates of neuronal spatial footprints and calcium traces.
- Apply quality control metrics to evaluate the initial estimates, and narrow down to the final set of estimates.

# Caiman docs on component eval

https://caiman.readthedocs.io/en/latest/Getting_Started.html#component-evaluation

> The quality of detected components is evaluated with three parameters:
>
> Spatial footprint consistency (rval): The spatial footprint of the component is compared with the frames where this component is active. Other component’s signals are subtracted from these frames, and the resulting raw data is correlated against the spatial component. This ensures that the raw data at the spatial footprint aligns with the extracted trace.
>
> Trace signal-noise-ratio (SNR): Peak SNR is calculated from strong calcium transients and the noise estimate.
>
> CNN-based classifier (cnn): The shape of components is evaluated by a 4-layered convolutional neural network trained on a manually annotated dataset. The CNN assigns a value of 0-1 to each component depending on its resemblance to a neuronal soma.
> 
> Each parameter has a low threshold:
> - (rval_lowest (default -1), SNR_lowest (default 0.5), cnn_lowest (default 0.1))
>
> and high threshold
> 
> - (rval_thr (default 0.8), min_SNR (default 2.5), min_cnn_thr (default 0.9))
> 
> A component has to exceed ALL low thresholds as well as ONE high threshold to be accepted.

In [2]:
import sys
from pathlib import Path
import os
import numpy as np
import zarr

import logging
import mesmerize_core as mc
import matplotlib.pyplot as plt

try:
    import cv2
    cv2.setNumThreads(0)
except():
    pass

logging.basicConfig()

from mesmerize_core.caiman_extensions.cnmf import cnmf_cache

os.environ["CONDA_PREFIX_1"] = ""
if os.name == "nt":
    # disable the cache on windows, this will be automatic in a future version
    cnmf_cache.set_maxsize(0)

raw_data_path = Path().home() / "caiman_data"
movie_path = raw_data_path / 'animal_01' / "session_01" / 'save_gui.zarr'

batch_path = raw_data_path / 'batch.pickle'
mc.set_parent_raw_data_path(str(raw_data_path))

PosixPath('/home/mbo/caiman_data')

In [3]:
# create a new batch
try:
    df = mc.load_batch(batch_path)
except (IsADirectoryError, FileNotFoundError):
    df = mc.create_batch(batch_path)

df=df.caiman.reload_from_disk()
df

Unnamed: 0,algo,item_name,input_movie_path,params,outputs,added_time,ran_time,algo_duration,comments,uuid
0,mcorr,save_gui,animal_01/session_01/save_gui.zarr,"{'main': {'var_name_hdf5': 'plane_2', 'max_shi...",{'mean-projection-path': 9781250d-22cb-4ebc-9d...,2024-09-06T00:19:51,2024-09-06T00:33:05,764.98 sec,,9781250d-22cb-4ebc-9de1-ad97e101ddee
1,mcorr,save_gui,animal_01/session_01/save_gui.zarr,"{'main': {'var_name_hdf5': 'plane_2', 'max_shi...",{'mean-projection-path': 5ffcad60-5506-4243-93...,2024-09-06T00:19:51,2024-09-06T00:35:31,142.87 sec,,5ffcad60-5506-4243-936c-f8473ff4ab50
2,mcorr,save_gui,animal_01/session_01/save_gui.zarr,"{'main': {'var_name_hdf5': 'plane_2', 'max_shi...",{'mean-projection-path': 0a463c4f-985d-47a4-9e...,2024-09-06T00:19:51,2024-09-06T00:48:21,766.17 sec,,0a463c4f-985d-47a4-9e95-cf776d98a1a3
3,mcorr,save_gui,animal_01/session_01/save_gui.zarr,"{'main': {'var_name_hdf5': 'plane_2', 'max_shi...",{'mean-projection-path': 8cb260f3-ced8-4433-b9...,2024-09-06T00:19:51,2024-09-06T00:50:47,142.94 sec,,8cb260f3-ced8-4433-b9bf-fd6116b20964


# Optional, cleanup DataFrame

Use the index that works best and all other items.

Remove batch items (i.e. rows) using `df.caiman.remove_item(<item_uuid>)`. This also cleans up the output data in the batch directory.

In [None]:
# make a list of rows we want to keep using the uuids
rows_keep = [df.iloc[0].uuid]
rows_keep

In [None]:
# CAUTION: remove all rows in in rows_keep
erasing_all_non_keep_data_flag = False

if erasing_all_non_keep_data_flag:
    for i, row in df.iterrows():
        if row.uuid not in rows_keep:
            df.caiman.remove_item(row.uuid, safe_removal=False)

df

In [None]:
# some params for CNMF
params_cnmf = {
    'main': # indicates that these are the "main" params for the CNMF algo
        {
            'fr': 10, # framerate, very important that this is correct!
            'p': 1,
            'nb': 2,
            'merge_thr': 0.85,
            'rf': 15,
            'stride': 6, # "stride" for cnmf, "strides" for mcorr
            'K': 4,
            'gSig': [4, 4],
            'ssub': 1,
            'tsub': 1,
            'method_init': 'greedy_roi',
            'min_SNR': 1.5,
            'rval_thr': 0.7,
            'use_cnn': False,
            'decay_time': 0.4,
        },
    'refit': True, # If `True`, run a second iteration of CNMF
}

In [None]:
good_mcorr_index = 6

# add a batch item
df.caiman.add_item(
    algo='cnmf', # algo is cnmf
    input_movie_path=df.iloc[good_mcorr_index],  # use mcorr output from a completed batch item
    params=params_cnmf,
    item_name=df.iloc[good_mcorr_index]["item_name"], # use the same item name
)