In [None]:
from mesmerize_core import *
import numpy as np
from copy import deepcopy
import pandas as pd

**You will need `fastplotlib` installed for the visualizations**

In [None]:
import fastplotlib as fpl
from ipywidgets import VBox, IntSlider, Layout

In [None]:
pd.options.display.max_colwidth = 120

# Paths
These are the only variables you will need to modify in this demo notebook. You will need to set the paths according to your own `caiman_data` dir path

Explanation:

`set_parent_raw_data_path()` - This function from `mesmerize_core` sets the **top level raw data directory**. It is generally the top level directory for your raw experimental data. This allows you to move your experiment directory structure between computers, as long as you keep everything under the parent path the same.

For example,

On Linux based systems if you have your experimental data in the following dir:

`/data/my_name/exp_top_level/....`

You could set `/data/my_name` as the "parent raw data path", and you can then move `exp_top_level/...` between computers.

On windows:

`D:/my_name/exp_top_level/...`

You could set `D:/my_name` as the parent raw data path, and you can then move `exp_top_level/...` between computers.

**Even on windows just use `/`, you do not have to worry about the annoying issue of `\\` and `\` on windows if you use `pathlib` :D**

In [None]:
# for this demo set this dir as the path to your `caiman_data` dir
set_parent_raw_data_path("/home/kushal/caiman_data/")

### Batch path, this is where caiman outputs will be organized

This can be anywhere, it does not need to be under the parent raw data path.

**We recommend using [pathlib](https://docs.python.org/3/library/pathlib.html) instead of manually managing paths as strings. `pathlib` is just a part of the Python standard library, it makes it much easier to deal with paths and saves a lot of time in the long-run! It also makes your paths compatible across operating systems.**

In [None]:
batch_path = get_parent_raw_data_path().joinpath("mesmerize-batch/batch.pickle")

# Create a new batch

This creates a new pandas `DataFrame` with the columns that are necessary for mesmerize. In mesmerize we call this the **batch DataFrame**. You can add additional columns relevant to your experiment, but do not modify columns used by mesmerize.

Note that when you create a DataFrame you will need to use `load_batch()` to load it later. You cannot use `create_batch()` to overwrite an existing batch DataFrame

In [None]:
# create a new batch
df = create_batch(batch_path)
# to load existing batches use `load_batch()`
#df = load_batch(batch_path)

# View the dataframe

It is empty with the appropriate columns for mesmerize

In [None]:
df

# Path to the input movie

An input movie must be anywhere within `raw data path` or `batch path`. We will use the Sue 2p example.

In [None]:
# We'll use teh Sue movie from caiman
# download it if you don't have it
from caiman.utils.utils import download_demo
download_demo()

In [None]:
movie_path = get_parent_raw_data_path().joinpath("example_movies/Sue_2x_3000_40_-46.tif")

# Motion correction parameters

Parameters for all algos have the following structure:

```python
{"main": {... params directly passed to caiman}}
```

In [None]:
# We will start with one version of parameters
mcorr_params1 =\
{
  'main': # this key is necessary for specifying that these are the "main" params for the algorithm
    {
        'max_shifts': [24, 24],
        'strides': [48, 48],
        'overlaps': [24, 24],
        'max_deviation_rigid': 3,
        'border_nan': 'copy',
        'pw_rigid': True,
        'gSig_filt': None
    },
}

# Add a "batch item", this is the combination of:
* algorithm to run, `algo`
* input movie to run the algorithm on, `input_movie_path`
* parameters for the specified algorithm, `params`
* a name for you to keep track of things, usually the same as the movie filename, `item_name`

In [None]:
# add an item to the batch
df.caiman.add_item(
    algo='mcorr',
    input_movie_path=movie_path,
    params=mcorr_params1,
    item_name=movie_path.stem,  # filename of the movie, but can be anything
)

df

## We can now see that there is one item, a.k.a. row or pandas `Series`, in the batch dataframe, we can add another item with the same input movie but with different parameters.

### **When adding batch items with the same `input_movie_path` (i.e. same input movie but different parameters) it is useful to give them the same `item_name`.**

In [None]:
# We create another set of params, useful for gridsearches for example
mcorr_params2 =\
{
  'main':
    {
        'max_shifts': [24, 24],
        'strides': [24, 24],
        'overlaps': [12, 12],
        'max_deviation_rigid': 3,
        'border_nan': 'copy',
        'pw_rigid': True,
        'gSig_filt': None
    },
}

# add other param variant to the batch
df.caiman.add_item(
  algo='mcorr',
  item_name=movie_path.stem,
  input_movie_path=movie_path,
  params=mcorr_params2
)

df

## We can see that there are two batch items for the same input movie.

### We can also use a `for` loop to add multiple different parameter variants more efficiently.

In [None]:
# copy the mcorr_params2 dict to make some changes
new_params = deepcopy(mcorr_params2)

# some variants of max_shifts
for shifts in [1, 6, 12]: 
    # deep copy is the safest way to copy dicts
    new_params = deepcopy(new_params)
    
    # assign the "max_shifts"
    new_params["main"]["max_shifts"] = (shifts, shifts)
    
    df.caiman.add_item(
      algo='mcorr',
      item_name=movie_path.stem,
      input_movie_path=movie_path,
      params=new_params
    )

In [None]:
df

## Now we can see that there are many parameter variants, but it is not easy to see the differences in parameters between the rows that have the same `item_name`.

### We can use the `caiman.get_params_diffs()` extension to see the unique parameters between rows with the same `item_name`

In [None]:
diffs = df.caiman.get_params_diffs(algo="mcorr", item_name=df.iloc[0]["item_name"])
diffs

# Indexing rows and running batch item(s)
#### You can run a single batch item by calling `caiman.run()` on a `Series` (row) of the DataFrame. One way to get the row is integer indexing using `df.iloc[index]`

In [None]:
# get the first batch item
row = df.iloc[0]

### You can see how the various `pandas.Series` extensions are accessible at the level of dataframe rows.

Move the cursor to the end of the following line and press `Tab` on your keyboard. You can select the `caiman.run()` function and press `Shift + Tab` to see the docstring. You can also instead refer to the API docs. https://mesmerize-core.readthedocs.io/en/latest/api/common.html#mesmerize_core.CaimanSeriesExtensions

Note tab completion doesn't work if you use `df.iloc[i].caiman.<method_name>`, you need to apply the indexer to see the docstring.

In [None]:
row.caiman.

# Run a single batch item

Run the row that we have selected above, on Linux & Mac it will run in subprocess but on Windows it will run in the local kernel. If using the subprocess backend you can use `run(wait=False)` to not block the current kernel when running.

In [None]:
# run the first "batch item"
# this will run in a subprocess by default on Linux & Mac
# on windows it will run locally
process = row.caiman.run()

# reload dataframe from disk when done
df = df.caiman.reload_from_disk()

# Use a loop to run multiple batch items.

`df.iterrows()` iterates through rows and returns the numerical index and row for each iteration

In [None]:
for i, row in df.iterrows():
    if not i > 0: # skip the first item since we've run it already
        continue
    process = row.caiman.run()
    
    # on Windows you MUST reload the batch dataframe after every iteration because it uses the `local` backend.
    # this is unnecessary on Linux & Mac
    # "DummyProcess" is used for local backend so this is automatic
    if process.__class__.__name__ == "DummyProcess":
        df = df.caiman.reload_from_disk()

# Reload the DataFrame to see the outputs information for the mcorr batch item
### It is necessary to ALWAYS use `df = df.caiman.reload_from_disk()` after running a single batch item or a loop of batch items. You must not add new batch items until you reload it if you have ran items!

In [None]:
df = df.caiman.reload_from_disk()

## We can see that the `outputs` column has been filled in

In [None]:
df

# Check if the algorithm ran successfully for a item

In [None]:
# True if the algo ran succesfully
df.iloc[0]["outputs"]["success"]

# Visualization using `fastplotlib`
You will need `fastplotlib` installed for this, see https://github.com/kushalkolar/fastplotlib

# Get the input movie and mcorr so we can visualize them

Note that you DO NOT need to manually work with file paths. For tiff input files it returns it as a memmaped array (if possible) with lazy loading. It will try to use a mesmerize `LazyArray` if the file cannot be memmaped.

In [None]:
# you can change the index to look at the mcorr results of different batch items
index = 0

# get input movie as memmap
input_movie = df.iloc[index].caiman.get_input_movie()

# load mcorr output movie, also as a memmaped array
mcorr_movie = df.iloc[index].mcorr.get_output()

# Visualize raw & MCorr movie side-by-side

### fastplotlib `ImageWidget` to visualize raw & mcorr movie side by side

`ImageWidget` assumes `"txy"` dimension order by default for 2D movies. You can set other orders using the `dims_order` kwarg

In [None]:
mcorr_iw = fpl.ImageWidget(
    data=[input_movie, mcorr_movie], 
    vmin_vmax_sliders=True, 
    cmap="gnuplot2"
)
mcorr_iw.show()

# Frame averaging with a rolling window using `ImageWidget` "window functions".

## This makes it easier to visually inspect motion

In [None]:
# window function on the "t" (time) dimension, using mean of 17 frames
mcorr_iw.window_funcs = {"t": (np.mean, 17)}

## Close the plot to free up GPU processing time, not necessary if you have a powerful GPU

In [None]:
mcorr_iw.gridplot.close()

## With `ImageWidget` you can view all 5 mcorr results simultaneously!

### This depends on your hard drive's capabilities

In [None]:
# first item is just the raw movie
movies = [df.iloc[0].caiman.get_input_movie()]

# subplot titles
subplot_names = ["raw"]

# we will use the mean images later
means = [df.iloc[0].caiman.get_projection("mean")]

# add all the mcorr outputs to the list
for i, row in df.iterrows():
    # add to the list of movies to plot
    movies.append(row.mcorr.get_output())
    
    # subplot title to show dataframe index
    subplot_names.append(f"ix: {i}")
    
    # mean images which we'll use later
    means.append(row.caiman.get_projection("mean"))

# create the widget
mcorr_iw_multiple = fpl.ImageWidget(
    data=movies,  # list of movies
    window_funcs={"t": (np.mean, 17)}, # window_funcs is also a kwarg
    vmin_vmax_sliders=True,
    names=subplot_names,  # subplot names used for titles
    cmap="gnuplot2"
)

mcorr_iw_multiple.show()

In [None]:
df.caiman.get_params_diffs(algo="mcorr", item_name=df.iloc[0]["item_name"])

### Modify the `window_funcs` at any time

In [None]:
mcorr_iw_multiple.window_funcs["t"].window_size = 5

## There is some motion on the left side of of `ix: 2` at timepoint `1452`, `2037` and a few others. This will be more obvious if we substract a mean image from each frame. You can use `frame_apply` to apply a function before displaying frames in the `ImageWidget`

This can be combined with `window_funcs` or used by itself. If used in combination with `window_funcs`, the window functions are computed first and then fed to `frame_apply`.

For this example the `frame_apply` functions subtract the mean image for each movie.

General form:

```python
{
    data_ix: function() # returns 2D frame
    ...
}
```

In [None]:
subtract_means = {
    0: lambda x: x - means[0],
    1: lambda x: x - means[1],
    2: lambda x: x - means[2],
    3: lambda x: x - means[3],
    4: lambda x: x - means[4],
    5: lambda x: x - means[5]
}

In [None]:
mcorr_iw_multiple.frame_apply = subtract_means

### Different colormaps can make the motion more obvious

In [None]:
for sp in mcorr_iw_multiple.gridplot:
    sp.graphics[0].cmap = "jet"

In [None]:
# disable frame apply
mcorr_iw_multiple.frame_apply = dict()

# ix `3` seems to work the best so we will cleanup the DataFrame and remove all other items.

### You can remove batch items (i.e. rows) using `df.caiman.remove_item(<item_uuid>)`

**Note that this also cleans up the output data in the batch directory!**

In [None]:
# make a list of rows we want to keep using the uuids
rows_keep = [df.iloc[3].uuid]
rows_keep

### On windows calling `remove_item()` will raise a `PermissionError` if you have the memmap file open.
### Unfortunately the current workaround is to kill the kernel if you want to delete batch items with open memmaps.

There is currently no way to close a `numpy.memmap`, even if you remove all references to it.

In [None]:
for i, row in df.iterrows():
    if row.uuid not in rows_keep:
        df.caiman.remove_item(row.uuid)

df

### As you can see above, the numerical index changed for what was previously item 3. Indices are always reset when you use `caiman.remove_item()`. However, UUIDs are always maintained.

# CNMF

## Continue from mcorr above and perform CNMF using the good mcorr output.

First, the params for CNMF. Put the CNMF params within the `main` key, `refit` is if you want to run CNMF for a second iteration.

In [None]:
# some params for CNMF
params_cnmf =\
{
    'main': # indicates that these are the "main" params for the CNMF algo
        {
            'fr': 30, # framerate, very important!
            'p': 1,
            'nb': 2,
            'merge_thr': 0.85,
            'rf': 15,
            'stride': 6, # "stride" for cnmf, "strides" for mcorr
            'K': 4,
            'gSig': [4, 4],
            'ssub': 1,
            'tsub': 1,
            'method_init': 'greedy_roi',
            'min_SNR': 2.0,
            'rval_thr': 0.7,
            'use_cnn': True,
            'min_cnn_thr': 0.8,
            'cnn_lowest': 0.1,
            'decay_time': 0.4,
        },
    'refit': True, # If `True`, run a second iteration of CNMF
}

### Add a single cnmf item to the batch

In [None]:
# add a batch item
df.caiman.add_item(
    algo='cnmf', # algo is cnmf
    input_movie_path=df.iloc[0],  # use mcorr output from a completed batch item
    params=params_cnmf,
    item_name=df.iloc[0]["item_name"], # use the same item name
)

### Just like with motion correction, we can use loops to add multiple parameter variants. This is useful to perform a parameter search to find the params that work best for your dataset. Here I will use `itertools.product` which is better than deeply nested loops.

In [None]:
from itertools import product

# variants of several parameters
gSig_variants = [6, 8]
K_variants = [4, 8]
merge_thr_variants = [0.8, 0.95]

# always use deepcopy like before
new_params_cnmf = deepcopy(params_cnmf)

# create a parameter grid
parameter_grid = product(gSig_variants, K_variants, merge_thr_variants)

# a single for loop to go through all the various parameter combinations
for gSig, K, merge_thr in parameter_grid:
    # deep copy params dict just like before
    new_params_cnmf = deepcopy(new_params_cnmf)
    
    new_params_cnmf["main"]["gSig"] = [gSig, gSig]
    new_params_cnmf["main"]["K"] = K
    new_params_cnmf["main"]["merge_thr"] = merge_thr
    
    # add param combination variant to batch
    df.caiman.add_item(
        algo="cnmf",
        item_name=df.iloc[0]["item_name"],
        input_movie_path=df.iloc[0],
        params=new_params_cnmf
    )

### See that there are a lot of new cnmf batch items

In [None]:
df

## Since it is difficult to see the different parameter variants above, we can just view the diffs

### The index numbers on the diffs correspond to the indices in the parent DataFrame above

In [None]:
df.caiman.get_params_diffs(algo="cnmf", item_name=df.iloc[1]["item_name"])

# Run the added `cnmf` batch items

### First, this is how you can filter a pandas DataFrame using multiple columns. This gives you the rows (batch items) using the "cnmf" `"algo"` and those that match a particular `"item_name"`

In [None]:
df[
    (df["algo"] == "cnmf") &  # algo
    (df["item_name"] == df.iloc[0]["item_name"])  # item name
]

## Run only these items

In [None]:
for i, row in df[
    (df["algo"] == "cnmf") &
    (df["item_name"] == df.iloc[0]["item_name"])
].iterrows():
    
    process = row.caiman.run()
    
    # on Windows you MUST reload the batch dataframe after every iteration because it uses the `local` backend.
    # this is unnecessary on Linux & Mac
    # "DummyProcess" is used for local backend so this is automatic
    if process.__class__.__name__ == "DummyProcess":
        df = load_batch(df.paths.get_batch_path())

### We now have CNMF outputs

In [None]:
df = df.caiman.reload_from_disk()
df[df["algo"] == "cnmf"]

In [None]:
# see which batch items completed succcessfully
df[df["algo"] == "cnmf"]["outputs"].apply(lambda x: x["success"])