In [None]:
# Be sure to clear all outputs before pushing
# !git branch

# CNMF demo pipeline
This demo presents a full pipeline for the analysis of a two-photon calcium imaging dataset using the CaImAn (**Ca**lcium **Im**aging **An**alysis) software package. It demonstrates how to use Caiman's built-in tools for motion correction, source separation (extraction of the location and calcium trace from each discovered component), and deconvolution (estimation of the spikes that generated the calcium signal). Graphically, we can represent the pipeline as follows:

Raw Movie -> Motion Correction -> CNMF|Deconvolution <-> Component Evaluation <-> Final Estimate -> $\Delta{F}/F$

**[insert updated pipeline pic here]**

Below, we walk through how to use Caiman to implement these steps using a dataset provided courtesy of Sue Ann Koay and David Tank (Princeton University). 

This demo uses the constrained nonnegative matrix factorization (CNMF) algorithm, which is best for data sets with low background noise such as two-photon data (or some 1p data such as some light sheet). For a demo pipeline of a 1p microendoscopic data set see `demo_pipeline_cnmfE.ipynb`.

## Getting help
More detailed background information can be found in the [original CNMF paper](https://pubmed.ncbi.nlm.nih.gov/26774160/) and [the Caiman paper](https://pubmed.ncbi.nlm.nih.gov/30652683/). If you have questions about this demo, or the underlying algorithms, you can ask questions at our [gitter channel](https://app.gitter.im/#/room/#agiovann_Constrained_NMF:gitter.im). If you run into problems, or there are features you would like to see added, feel free to [open an issue](https://github.com/flatironinstitute/CaImAn/issues).

In [84]:
import numpy as np
import os
import psutil
from pathlib import Path
import glob
import logging
import matplotlib.pyplot as plt
import cv2
import bokeh.plotting as bpl
try:
    cv2.setNumThreads(0)
except():
    pass

try:
    if __IPYTHON__:
        # reloads modules automatically when they are changed
        ipython().magic('load_ext autoreload')
        ipython().magic('autoreload 2')
except NameError:
    pass

import caiman as cm
from caiman.motion_correction import MotionCorrect
from caiman.source_extraction.cnmf import cnmf as cnmf
from caiman.source_extraction.cnmf import params as params
from caiman.utils.utils import download_demo
from caiman.utils.visualization import plot_contours, nb_view_patches, nb_plot_contour

bpl.output_notebook()

## Set up logger (optional)
Before getting started, we can optionally set up a logger. Skip this section if you don't want a logger.

Python has a powerful built-in [logging module](https://docs.python.org/3/library/logging.html) for generating log messages while a program is running: it lets you print customized statements and set a logging level to determine how verbose the outputs will be. This way, you will only receive messages above the severity threshold you set: `logging.DEBUG`, `logging.INFO`, `logging.WARNING`, `logging.ERROR`, or `logging.CRITICAL`. For instance, seetting the threshold to `logging.DEBUG` will print out every logging statement, while setting it to `logging.ERROR` will print out only errors and critical problems. This system gives much more flexibility and control than interspersing `print()` statements throughought your code when debugging. 

Our custom formatted log string is defined in the `log_format` parameter below, which draws from a predefined [set of attributes](https://docs.python.org/3/library/logging.html#logrecord-attributes) provided by the logging module. We have set each log to display the time, severity level, filename/function name/line number of the file creating the log, process ID, and the actual log message. 

While logging is especially helpful when running code on a server, it can also be helpful to get feedback in real time on your personal machine, either to audit progress or diagnose problems when debugging. If you set this feature up by running the following cell, the logs will by default go to console. If you want to direct your log to file (which you can indicate with `use_logfile = True`), then it will automatically be directed to your `caiman_data/temp` directory as defined in the `caiman.paths` module. You can set another path with the `filename` parameter.

In [None]:
use_logfile = False # If set to True, will log to file
if use_logfile:
    log_file = Path(cm.paths.get_tempdir()) / 'cnmf_demo.log' # 
    print(f"Will save logging data to {tmp_file}")
else:
    log_file = None
log_format = "{asctime} - {levelname} - [{filename} {funcName}() {lineno}] - pid {process} - {message}"
logging.basicConfig(format=log_format,
                    filename=log_file, 
                    level=logging.INFO, style="{") #DEBUG, INFO, WARNING, ERROR, CRITICAL

## Select files for processing
Many acquisition systems break up data from a single session across multiple files. This demo shows how to work with lists of filepaths that represent multiple movies from the same recording session. While this demo works with `tif` files, Caiman can handle movies in multiple common formats such as:

    tiff, hdf5, nwb, avi, zarr, h5, npz, n5
    
In the following cell, Caimain's `download_demo()` function will first check to see if the demo movies exist in the specified directory. If they do not, the files will be downloaded. The function returns the full file path which we append to the `filenames` list.

If you adapt this demo for your data make sure to include the complete path(s) when you place them in the `filenames` variable. There are also functions that take in single filenames instead of lists, as discussed in [Caiman's documentation](https://caiman.readthedocs.io/en/master/Handling_Movies.html#). 

In [None]:
filenames = []
save_folder = 'cnmf_demo_data'  # folder inside .caiman_data/example_movies where files will be saved
try:
    filenames.append(download_demo('Sue_Split1.tif',save_folder))
    filenames.append(download_demo('Sue_Split2.tif',save_folder))
except KeyError:
    # this is temporary until Sue_Split makes it into main repo list
    dirname = cm.paths.caiman_datadir()
    save_path = dirname + save_folder
    filenames = [save_path + 'Sue_Split1.tif', save_path + 'Sue_Split2.tif']

Note if you have recorded data across many days or weeks (for instance), and you need to register neurons across multiple recording sessions, this is a different use case. We do have a demo for that: see `demo_multisession_registration.ipynb`. 

## Load and play the movie (optional)   
Once you have set up the desired list of movie(s) for analysis, you can load the data, concatenated into a single movie object, for viewing.  This will require loading all of the data into memory -- in general this is not needed by Caiman's pipeline, which uses out-of-core processing to avoid overwhelming RAM. But for the relatively small data files used in this demo, it should be fine on most computers. Collectively, they take up about 350MB of RAM: if you aren't sure how much memory you have available, you can use the `psutil.virtual_memory()` function.

Once the movie object is generated (with `load_movie_chain()`), it can then be played using `movie.play()` which has multiple parameters you can play with, including: 

    gain (brightness) 
    fr (frame rate)    
    magnification (int)  
    qmax, q_min (percentile for vmax, vmin plotting values)    
    plot_text (Bool) to show the frame number    
    
As always, to see the full documentation in Jupyter, you can enter `movie.play?` in a cell.

Displaying the movie uses the OpenCV library, so if you set `display_movie` to True, the following cell will run a blocking function (a function that blocks execution of all other code until it is stopped), opening a separate PyQt window which doesn't run in Jupyter. You will need to press `q` on that window to close it. 

We resample the movie in time according by `downsample_ratio` using `resize()` before playing it.

In [82]:
display_movie = True
if display_movie:
    movie_orig = cm.load_movie_chain(fnames_full_path)
    downsampling_ratio = 0.2
    movie_orig.resize(fz=downsampling_ratio).play(gain=1.3,
                                                  q_max=99.5, 
                                                  fr=30, 
                                                  plot_text=True,
                                                  magnification=2)

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.85it/s]


## Set up some parameters
We set some parameters that are relevant to the file, and then parameters for motion correction, processing with CNMF and component quality evaluation. Note that the dataset `Sue_2x_3000_40_-46.tif` has been spatially downsampled by a factor of 2 and has a lower than usual spatial resolution (2um/pixel). As a result several parameters (`gSig, strides, max_shifts, rf, stride_cnmf`) have lower values (halved compared to a dataset with spatial resolution 1um/pixel).

In [61]:
# dataset dependent parameters
fr = 30                             # imaging rate in frames per second
decay_time = 0.4                    # length of a typical transient in seconds

# motion correction parameters
strides = (48, 48)          # start a new patch for pw-rigid motion correction every x pixels
overlaps = (24, 24)         # overlap between pathes (size of patch strides+overlaps)
max_shifts = (6,6)          # maximum allowed rigid shifts (in pixels)
max_deviation_rigid = 3     # maximum shifts deviation allowed for patch with respect to rigid shifts
pw_rigid = True             # flag for performing non-rigid motion correction

# parameters for source extraction and deconvolution
p = 1                       # order of the autoregressive system
gnb = 2                     # number of global background components
merge_thr = 0.85            # merging threshold, max correlation allowed
rf = 15                     # half-size of the patches in pixels. e.g., if rf=25, patches are 50x50
stride_cnmf = 6             # amount of overlap between the patches in pixels
K = 4                       # number of components per patch
gSig = [4, 4]               # expected half size of neurons in pixels
method_init = 'greedy_roi'  # initialization method (if analyzing dendritic data using 'sparse_nmf')
ssub = 1                    # spatial subsampling during initialization
tsub = 1                    # temporal subsampling during intialization

# parameters for component evaluation
min_SNR = 2.0               # signal to noise ratio for accepting a component
rval_thr = 0.85              # space correlation threshold for accepting a component
cnn_thr = 0.99              # threshold for CNN based classifier
cnn_lowest = 0.1 # neurons with cnn probability lower than this value are rejected

## Create a parameters object
You can creating a parameters object by passing all the parameters as a single dictionary. Parameters not defined in the dictionary will assume their default values. The resulting `params` object is a collection of subdictionaries pertaining to the dataset to be analyzed `(params.data)`, motion correction `(params.motion)`, data pre-processing `(params.preprocess)`, initialization `(params.init)`, patch processing `(params.patch)`, spatial and temporal component `(params.spatial), (params.temporal)`, quality evaluation `(params.quality)` and online processing `(params.online)`

In [66]:
opts_dict = {'fnames': filenames,
            'fr': fr,
            'decay_time': decay_time,
            'strides': strides,
            'overlaps': overlaps,
            'max_shifts': max_shifts,
            'max_deviation_rigid': max_deviation_rigid,
            'pw_rigid': pw_rigid,
            'p': p,
            'nb': gnb,
            'rf': rf,
            'K': K, 
            'gSig': gSig,
            'stride': stride_cnmf,
            'method_init': method_init,
            'rolling_sum': True,
            'only_init': True,
            'ssub': ssub,
            'tsub': tsub,
            'merge_thr': merge_thr, 
            'min_SNR': min_SNR,
            'rval_thr': rval_thr,
            'use_cnn': True,
            'min_cnn_thr': cnn_thr,
            'cnn_lowest': cnn_lowest}

In [67]:
opts = params.CNMFParams(params_dict=opts_dict)

2023-05-06 19:38:21,564-INFO-[params.py set() 976]-pid 20124-Changing key fnames in group data from None to ['C:\\Users\\Eric\\caiman_data\\example_movies\\cnmf_demo_data\\Sue_Split1.tif', 'C:\\Users\\Eric\\caiman_data\\example_movies\\cnmf_demo_data\\Sue_Split2.tif']
2023-05-06 19:38:21,565-INFO-[params.py set() 976]-pid 20124-Changing key rf in group patch from None to 15
2023-05-06 19:38:21,566-INFO-[params.py set() 976]-pid 20124-Changing key stride in group patch from None to 6
2023-05-06 19:38:21,567-INFO-[params.py set() 976]-pid 20124-Changing key p in group preprocess from 2 to 1
2023-05-06 19:38:21,567-INFO-[params.py set() 976]-pid 20124-Changing key nb in group init from 1 to 2
2023-05-06 19:38:21,568-INFO-[params.py set() 976]-pid 20124-Changing key K in group init from 30 to 4
2023-05-06 19:38:21,568-INFO-[params.py set() 976]-pid 20124-Changing key gSig in group init from [5, 5] to [4, 4]
2023-05-06 19:38:21,569-INFO-[params.py set() 976]-pid 20124-Changing key ssub in g

## Set up a cluster
To enable parallel processing a (local) cluster needs to be set up. This is done with the `setup_cluster()` function in the cell below, which sets up a pool of processors using the multiprocessing package. 

The variable `backend` determines the type of cluster used. The default value `'local'` uses the multiprocessing package. The `ipyparallel` option is also available. More information on these choices can be found [here](https://github.com/flatironinstitute/CaImAn/blob/master/docs/CLUSTER.md). You can set the number of processes (cpu cores) to use with the `n_processes` variable: if you feed the default `None` it will automatically select the number available minus 1. 

The resulting variable `dview` represents the multicore processing engine object to be used in subsequent processing steps, and it will be fed into subsequent stages in the processing pipeline. The name stands for `DirectView` which is a name from the ipyparallel package (it lets the user have a direct view of the different processes in a cluster of processes). 

You will note many times in later pipleine steps that there will be a `dview` parameter. If you use `dview=dview`, feeding your newly created parallel processing engine when this parameter is invoked, then parallel processing will be used. If instead you use `dview=None` then no parallel processing will be employed. This latter option can sometimes be useful when debugging, as the error messages can often be simpler when using a single processor. 

One note for performance: if you hit memory issues, you may want to lower the number of processors you are using. You can see how many total processors you have available with `psutil.cpu_count()`. Each processor uses more RAM, and When I am on a workstation with many processors, you can often get much better performance by cutting `n_processes`, and the best way to determine the optimal number is by coarse trial and error. 

In [88]:
print(f"You have {psutil.cpu_count()} CPUs available in your current environment")
processors_to_use = None  # Set to smaller number if you have memory problems

You have 16 CPUs available in your current environment


In [89]:
#start a cluster for parallel processing 
# note if a cluster already exists it will be closed so a new session will be opened
if 'dview' in locals():  # locals contains list of current local variables
    print('Closing previous cluster')
    cm.stop_server(dview=dview)
print("setting up cluster")
c, dview, n_processes = cm.cluster.setup_cluster(backend='local', 
                                                 n_processes=processors_to_use, 
                                                 single_thread=False,
                                                 ignore_preexisting=False)

2023-05-07 00:32:27,651-INFO-[cluster.py stop_server() 343]-pid 20124-stop_cluster(): done


Closing previous cluster
setting up cluster


## Motion Correction
First we create a motion correction object with the parameters specified. Note that the file is not loaded in memory

In [None]:
# first we create a motion correction object with the parameters specified
mc = MotionCorrect(fnames, dview=dview, **opts.get_group('motion'))
# note that the file is not loaded in memory

Now perform motion correction. From the movie above we see that the dateset exhibits non-uniform motion. We will perform piecewise rigid motion correction using the NoRMCorre algorithm. This has already been selected by setting `pw_rigid=True` when defining the parameters object.

In [None]:
# %%capture
#%% Run piecewise-rigid motion correction using NoRMCorre
mc.motion_correct(save_movie=True)
m_els = cm.load(mc.fname_tot_els)
border_to_0 = 0 if mc.border_nan == 'copy' else mc.border_to_0 
    # maximum shift to be used for trimming against NaNs

Inspect the results by comparing the original movie. A more detailed presentation of the motion correction method can be found in the [demo motion correction](./demo_motion_correction.ipynb) notebook.

In [None]:
#%% compare with original movie
display_movie = False
if display_movie:
    movie_orig = cm.load_movie_chain(fnames)
    ds_ratio = 0.2
    cm.concatenate([movie_orig.resize(1, 1, ds_ratio) - mc.min_mov*mc.nonneg_movie,
                    m_els.resize(1, 1, ds_ratio)], 
                   axis=2).play(fr=60, gain=15, magnification=2, offset=0)  # press q to exit

## Memory mapping 

The cell below memory maps the file in order `'C'` and then loads the new memory mapped file. The saved files from motion correction are memory mapped files stored in `'F'` order. Their paths are stored in `mc.mmap_file`.

In [None]:
#%% MEMORY MAPPING
# memory map the file in order 'C'
fname_new = cm.save_memmap(mc.mmap_file, base_name='memmap_', order='C',
                           border_to_0=border_to_0, dview=dview) # exclude borders

# now load the file
Yr, dims, T = cm.load_memmap(fname_new)
images = np.reshape(Yr.T, [T] + list(dims), order='F') 
    #load frames in python format (T x X x Y)

Now restart the cluster to clean up memory

In [None]:
#%% restart cluster to clean up memory
cm.stop_server(dview=dview)
c, dview, n_processes = cm.cluster.setup_cluster(
    backend='multiprocessing', n_processes=None, single_thread=False)

## Run CNMF on patches in parallel

- The FOV is split is different overlapping patches that are subsequently processed in parallel by the CNMF algorithm.
- The results from all the patches are merged with special attention to idendtified components on the border.
- The results are then refined by additional CNMF iterations.

In [None]:
%%capture
#%% RUN CNMF ON PATCHES
# First extract spatial and temporal components on patches and combine them
# for this step deconvolution is turned off (p=0). If you want to have
# deconvolution within each patch change params.patch['p_patch'] to a
# nonzero value
cnm = cnmf.CNMF(n_processes, params=opts, dview=dview)
cnm = cnm.fit(images)

## Run the entire pipeline up to this point with one command
It is possible to run the combined steps of motion correction, memory mapping, and cnmf fitting in one step as shown below. The command is commented out since the analysis has already been performed. It is recommended that you familiriaze yourself with the various steps and the results of the various steps before using it.

In [None]:
# cnm1 = cnmf.CNMF(n_processes, params=opts, dview=dview)
# cnm1.fit_file(motion_correct=True)

### Inspecting the results
Briefly inspect the results by plotting contours of identified components against correlation image.
The results of the algorithm are stored in the object `cnm.estimates`. More information can be found in the definition of the `estimates` object and in the [wiki](https://github.com/flatironinstitute/CaImAn/wiki/Interpreting-Results).

In [None]:
#%% plot contours of found components
Cn = cm.local_correlations(images.transpose(1,2,0))
Cn[np.isnan(Cn)] = 0
cnm.estimates.plot_contours_nb(img=Cn)

## Re-run (seeded) CNMF  on the full Field of View  
You can re-run the CNMF algorithm seeded on just the selected components from the previous step. Be careful, because components rejected on the previous step will not be recovered here.

In [None]:
%%capture
#%% RE-RUN seeded CNMF on accepted patches to refine and perform deconvolution 
cnm2 = cnm.refit(images, dview=dview)

## Component Evaluation

The processing in patches creates several spurious components. These are filtered out by evaluating each component using three different criteria:

- the shape of each component must be correlated with the data at the corresponding location within the FOV
- a minimum peak SNR is required over the length of a transient
- each shape passes a CNN based classifier

In [None]:
#%% COMPONENT EVALUATION
# the components are evaluated in three ways:
#   a) the shape of each component must be correlated with the data
#   b) a minimum peak SNR is required over the length of a transient
#   c) each shape passes a CNN based classifier

cnm2.estimates.evaluate_components(images, cnm2.params, dview=dview)

Plot contours of selected and rejected components

In [None]:
#%% PLOT COMPONENTS
cnm2.estimates.plot_contours_nb(img=Cn, idx=cnm2.estimates.idx_components)

View traces of accepted and rejected components. Note that if you get data rate error you can start Jupyter notebooks using:
'jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10'

In [None]:
# accepted components
cnm2.estimates.nb_view_components(img=Cn, idx=cnm2.estimates.idx_components)

In [None]:
# rejected components
if len(cnm2.estimates.idx_components_bad) > 0:
    cnm2.estimates.nb_view_components(img=Cn, idx=cnm2.estimates.idx_components_bad)
else:
    print("No components were rejected.")

### Extract DF/F values

In [None]:
#%% Extract DF/F values
cnm2.estimates.detrend_df_f(quantileMin=8, frames_window=250)

### Select only high quality components

In [None]:
cnm2.estimates.select_components(use_object=True)

## Display final results

In [None]:
cnm2.estimates.nb_view_components(img=Cn, denoised_color='red')
print('you may need to change the data rate to generate this one: use jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10 before opening jupyter notebook')

## Closing, saving, and creating denoised version
### You can save an hdf5 file with all the fields of the cnmf object

In [None]:
save_results = False
if save_results:
    cnm2.save('analysis_results.hdf5')

### Stop cluster and clean up log files

In [None]:
#%% STOP CLUSTER and clean up log files
cm.stop_server(dview=dview)
log_files = glob.glob('*_LOG_*')
for log_file in log_files:
    os.remove(log_file)

### View movie with the results
We can inspect the denoised results by reconstructing the movie and playing alongside the original data and the resulting (amplified) residual movie

In [None]:
cnm2.estimates.play_movie(images, q_max=99.9, gain_res=2,
                                  magnification=2,
                                  bpx=border_to_0,
                                  include_bck=False)

The denoised movie can also be explicitly constructed using:

In [None]:
#%% reconstruct denoised movie
denoised = cm.movie(cnm2.estimates.A.dot(cnm2.estimates.C) + \
                    cnm2.estimates.b.dot(cnm2.estimates.f)).reshape(dims + (-1,), order='F').transpose([2, 0, 1])