In [None]:
# Be sure to clear all outputs before pushing
# !git branch

# CNMF demo pipeline: Intro
This demo presents a full pipeline for the analysis of a two-photon calcium imaging dataset using the CaImAn (**Ca**lcium **Im**aging **An**alysis) software package. Starting with loading the original movie, it demonstrates how to use Caiman's built-in tools for the following analysis steps:

![temporary workflow image](../../docs/img/quickintro.png)

- Using the NoRMCorre (nonrigid motion correction) algorithm for motion correction.
- Using constrained nonnegative matrix factorization (CNMF) algorithm to extract an initial estimate of the neurons' location, calcium traces, and firing rates.  
- Use quality control metrics to evaluate the initial estimates, and narrow down to the final set of estimates.
- Extract normalized traces $\Delta F/F$.

The CNMF algorithm is best for data with relatively low background noise, like most two-photon data and *some* one photon data like certain light sheet data. For a demo analysis pipeline of a one-photon microendoscopic data set see `demo_pipeline_cnmfE.ipynb`.

The dataset used in this demo was provided courtesy of Sue Ann Koay and David Tank (Princeton University). 

<div class="alert alert-info">
    <h2 style="margin-top: 0;">Getting more help</h2>
    More detailed background information about CNMF can be found in the <a href="https://pubmed.ncbi.nlm.nih.gov/26774160/">original CNMF paper</a> and <a href="https://pubmed.ncbi.nlm.nih.gov/30652683/">the Caiman paper</a>. If you have specific questions about this demo, or the underlying algorithms, you can ask questions at our <a href="https://app.gitter.im/#/room/#agiovann_Constrained_NMF:gitter.im">Gitter channel</a>. If you find a bug or you have a feature request, feel free to <a href="https://github.com/flatironinstitute/CaImAn/issues">open an issue at our Github repo</a>.
</div>

## Imports and general setup
We first need to import the Python libraries we will use in the rest of the notebook and tweak some general settings. Don't worry about these details now, we will explain the important things when they come up.  

In [None]:
import bokeh.plotting as bpl
import cv2
import glob
import logging
import matplotlib.pyplot as plt
import numpy as np
import os
import psutil
from pathlib import Path

try:
    cv2.setNumThreads(0)
except():
    pass

try:
    if __IPYTHON__:
        # reloads modules automatically when they are changed
        ipython().magic('load_ext autoreload')
        ipython().magic('autoreload 2')
except NameError:
    pass

import caiman as cm
from caiman.motion_correction import MotionCorrect
from caiman.source_extraction.cnmf import cnmf as cnmf
from caiman.source_extraction.cnmf import params as params
from caiman.utils.utils import download_demo
from caiman.utils.visualization import plot_contours, nb_view_patches, nb_plot_contour

bpl.output_notebook()

Continuing with our basic setup, we will set up a logger, and also set environment variables in case that wasn't done already in your shell. If you want to learn more about Caiman's logger functionality, or to tweak your logger, see [Appendix 1: Logging](#logging_explained). 

In [None]:
# set up logging
logging.basicConfig(format="{asctime} - {levelname} - [{filename} {funcName}() {lineno}] - pid {process} - {message}",
                    filename=None, 
                    level=logging.DEBUG, style="{") #logging level can be DEBUG, INFO, WARNING, ERROR, CRITICAL

# set env variables 
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["VECLIB_MAXIMUM_THREADS"] = "1"

## Specify data to be processed
Now that our general setup is done, it's time to specify what movie file we want to process. 

For this demo, we are going to use Caiman's built-in `download_demo()` function to download our demo data file 
'Sue_2x_3000_40_-46.tif'. This is  to `~/caiman_data/example_movies/` where `~` is your home directory (the path format and home directory will depend on your operating system).

This movie, provided by Sue Koay and David Tank, is two-photon data from parietal cortex of a transgenic GCaMP6f expressing mouse during a virtual reality task. It was collected at 30Hz, originally with dimensions 512x512 pixels (500umx500um FOV imaged at a depth of 125 um). Note that to save space, the demo has been spatially cropped and downsampled by a factor of 2, so the resolution is lower than the original.

In [None]:
movie_path = download_demo('Sue_2x_3000_40_-46.tif')
print(f"Original movie for demo is in {movie_path}")

If you are adapting this demo for your own data, you can drop in a path to your movie in the `movie_path` variable in the above cell. E.g.:

    movie_path = 'full/path/to/your/movie.filetype'

If you have a recording system that breaks up the data across multiple files, see [Appendix 2: Working with multiple files](#multiple_files). 

<div class="alert alert-info">
    <h2 style="margin-top: 0;">File types that Caiman can read</h2>
    While this demo uses a movie that has been stored in <i>tif</i> format, Caiman can handle movies in multiple common (and not so common) formats, including:

    hdf5/h5, n5, zarr, avi, nwb, npz
    
</div>

## Load and play the movie
[Temporary: this will change a lot using movie_play()]    
Caiman has a built-in movie class that you can use to view your movie. 

The movie object has many convenient features. Once you have loaded the movie (using `cm.load()`), you can view your raw data using `movie.play()`. This `play()` function has multiple parameters you can explore, including: 

    gain: brightness 
    fr:  frame rate
    magnification: scale the size of the display  
    qmax, q_min: percentile for setting vmax, vmin -- below vmin is black, above vmax is white
    plot_text (Bool): show the frame number
    
The movie object also has a `resize()` method, which we use in the following to downsample the movie by 5x before playing using the `downsampling_ratio`. 

Playing the movie uses the OpenCV library, so if you set `display_movie` to `True`, the following cell will run a blocking function (a function that blocks execution of all other code until it is stopped), opening a separate window which doesn't run in Jupyter. You will need to press `q` on that window to close it. 

In [None]:
display_movie = True
if display_movie:
    movie_orig = cm.load(movie_path)  # add subindices here if you want to load partial movie for viewing
    downsampling_ratio = 0.2
    movie_orig.resize(fz=downsampling_ratio).play(gain=1.3,
                                                  q_max=99.5, 
                                                  fr=30, 
                                                  plot_text=True,
                                                  magnification=2,
                                                  backend='opencv')

In [None]:
movie_orig.shape

<div class="alert alert-info" markdown="1">
    <h2 style="margin-top: 0;">Displaying large files</h2>
    Loading a movie pulls all of the data into memory, so you might need to adapt the above code when working with extremely large files. For tips on how to display large files, see <a href="#display_large">Appendix 3: Displaying large files</a>.
</div>

## Set up parameters
Algorithms like motion correction and CNMF are run using estimators that are initialized with a set of parameters. Here, we'll define a parameter object that will subsequently be used to initialize our estimators. You will notice they are broken up into different categories (we will not discuss them in detail here, but will go over them when we reach the relevant stages of the pipeline, with this notebook focused on those relevant for CNMF):

In [None]:
# generic dataset-dependent parameters
fr = 30                     # imaging rate in frames per second
decay_time = 0.4            # length of a typical transient in seconds
dxy = (2., 2.)              # spatial resolution in x and y in (um per pixel)

# motion correction parameters
strides = (48, 48)          # start a new patch for pw-rigid motion correction every x pixels
overlaps = (24, 24)         # overlap between pathes (size of patch strides+overlaps)
max_shifts = (6,6)          # maximum allowed rigid shifts (in pixels)
max_deviation_rigid = 3     # maximum shifts deviation allowed for patch with respect to rigid shifts
pw_rigid = True             # flag for performing non-rigid motion correction

# parameters for source extraction and deconvolution
p = 1                       # order of the autoregressive system
gnb = 2                     # number of global background components
merge_thr = 0.85            # merging threshold, max correlation allowed
rf = 15                     # half-size of the patches in pixels. e.g., if rf=25, patches are 50x50
stride_cnmf = 6             # amount of overlap between the patches in pixels 
K = 4                       # number of components per patch
gSig = [4, 4]               # expected half size of neurons in pixels
method_init = 'greedy_roi'  # initialization method (if analyzing dendritic data using 'sparse_nmf')
ssub = 1                    # spatial subsampling during initialization
tsub = 1                    # temporal subsampling during intialization

# parameters for component evaluation
min_SNR = 2.0               # signal to noise ratio for accepting a component
rval_thr = 0.85              # space correlation threshold for accepting a component
cnn_thr = 0.99              # threshold for CNN based classifier
cnn_lowest = 0.1 # neurons with cnn probability lower than this value are rejected

We place the above parameter values in a dictionary, which we then pass to the `CNMFParams` class that defines our parameters object (the parameters *not* explicitly defined in the dictionary will assume default values): 

In [None]:
parameter_dict = {'fnames': movie_path,
            'fr': fr,
            'dxy': dxy,
            'decay_time': decay_time,
            'strides': strides,
            'overlaps': overlaps,
            'max_shifts': max_shifts,
            'max_deviation_rigid': max_deviation_rigid,
            'pw_rigid': pw_rigid,
            'p': p,
            'nb': gnb,
            'rf': rf,
            'K': K, 
            'gSig': gSig,
            'stride': stride_cnmf,
            'method_init': method_init,
            'rolling_sum': True,
            'only_init': True,
            'ssub': ssub,
            'tsub': tsub,
            'merge_thr': merge_thr, 
            'min_SNR': min_SNR,
            'rval_thr': rval_thr,
            'use_cnn': True,
            'min_cnn_thr': cnn_thr,
            'cnn_lowest': cnn_lowest}

parameters = params.CNMFParams(params_dict=parameter_dict) # CNMFParams is the parameters class

This parameters object (`parameters`) is effectively a collection of dictionaries that each contains parameters relevant to different settings, and these different collections can be accessed using dot notation.  Some are related to the dataset in general (`parameters.data`), while most are related to specific aspects of the workflow depicted in the introductory section above such as motion correction (`parameters.motion`), quality evaluation (`parameters.quality`), and others.

For instance, if you want to inspact the dataset-dependent params:

In [None]:
# check a parameter set, if you want
parameters.data

<div class="alert alert-info" markdown="1">
    <h2 style="margin-top: 0;">To dig deeper into this design</h2>  
    To see more about the design of Caiman, and the decoupling of estimators and parameters, see <a href="#caiman_estimators">Appendix 4: Estimators and parameters</a>.
</div>

## Setting up a cluster
Caiman is optimized for parallel computing, using multiple CPU cores for motion correction and CNMF (there is also an option to run motion correction on the GPU, but we will not focus on that here). Setting up the multicore processing is done with the `setup_cluster()` function below. We will just set it up quickly with the defaults, but if you want more details, please see [Appendix 5: Cluster Setup](#the_cluster). 

In [None]:
print(f"You have {psutil.cpu_count()} CPUs available in your current environment")
num_processors_to_use = 8  #None

In [None]:
#start a cluster for parallel processing 
# note if a cluster already exists it will be closed so a new session will be opened
if 'dview' in locals():  # locals contains list of current local variables
    print('Closing previous cluster')
    cm.stop_server(dview=dview)
print("Setting up new cluster")
c, dview, n_processes = cm.cluster.setup_cluster(backend='multiprocessing', 
                                                 n_processes=num_processors_to_use, 
                                                 single_thread=False,
                                                 ignore_preexisting=False)
print(f"Successfully set up cluster with {n_processes} processes")

We won't focus much on the outputs, except for `dview`: `dview` is the understated name of the  multicore processing object that will be fed into Caiman's subsequent processing steps. In these later steps, if you set `dview=dview`, then parallel processing will be used. If instead you use `dview=None` then no parallel processing will be used. This latter option is crucial when debugging, as the logger doesn't work for many multiprocessing functions, among other reasons.

<div class="alert alert-info" markdown="1">
    <h2 style="margin-top: 0;">Optimizing performance</h2>  
If you hit memory issues later, there are a few things you can do. First, you may want to lower the number of processors you are using. Each processor uses more RAM, and on a workstation with many processors, you can sometimes get better performance by reducing <em>num_processors_to_use</em>. Unfortunately, this is a bit of an art form, so the best way to determine the optimal number is by trial and error. When you set <em>num_processors_to_use</em> variable to <em>None</em>, it defaults to <i>one</i> less than the total number of CPU cores available (the reason we don't automatically set it to the total number of cores is because in practice this almost universally leads to worse performance). 

Second, if your system has less than 32GB of RAM, and things are running slowly or you are running out of memory, then get more RAM. While you can sometimes get away with less, we recommend a *bare minimum* level of 16GB of RAM, but more is better. 32GB RAM is acceptable, 64GB is good, 128GB is great. Obviously, this will depend on the size of your data sets. 
</div>

## Motion Correction
First, we initialize a motion correction object by providing the the filename(s), the multiprocessing pool `dview`, and the the motion parameters as inputs. Note that the file is not loaded in memory, the file is sent to the object which will later perform out-of-core computations on memory mapped files when the algorithm is implemented. This is a general feature of all larger-scale memory-intensive computations in Caiman.

In [None]:
# first we create a motion correction object with the parameters specified
mot_correct = MotionCorrect(movie_path, dview=None, **parameters.motion)

Now perform motion correction. From the movie above we see that the dateset exhibits non-uniform motion. We will perform piecewise rigid motion correction using the NoRMCorre algorithm. This has already been selected by setting `pw_rigid=True` when defining the parameters object.

You may see some warnings about negative movie averages: you can ignore them.

In [None]:
parameters.motion

In [None]:
For motion correction patches (patch width is sum of overlap and stride)

In [None]:
# strides = (48, 48)          # start a new patch for pw-rigid motion correction every x pixels
# overlaps = (24, 24)         # overlap between pathes (size of patch strides+overlaps)

In [None]:
windowSize = np.add(overlaps, strides)
range_1 = list(range(0, dims[0] - windowSize[0], strides[0])) + [dims[0] - windowSize[0]]
range_2 = list(range(0, dims[1] - windowSize[1], strides[1])) + [dims[1] - windowSize[1]]
for dim_1, x in enumerate(range_1):
    for dim_2, y in enumerate(range_2):
        # yield the current window
        yield (dim_1, dim_2, x, y, image[x:x + windowSize[0], y:y + windowSize[1]])

In [None]:
For cnmf patches

In [None]:
dims = movie_orig.shape[1:]
rf, stride_cnmf, dims

In [None]:
iters = [list(range(rf[i], dims[i] - rf[i], 2 * rf[i] - stride[i])) + [dims[i] - rf[i]] for i in range(len(dims))]

In [None]:
# rf = 15                     # half-size of the patches in pixels. e.g., if rf=25, patches are 50x50
# stride_cnmf = 6             # amount of overlap between the patches in pixels 
# actual stride: 2*rf-stride_cnmf

In [None]:
range_i = list(range(rf, dims[0] - rf, 2*rf - stride_cnmf)) + [dims[0] - rf] # stride_cnmf is actually overlap
range_i

In [None]:
%%time
#%% Run piecewise-rigid motion correction using NoRMCorre
mot_correct.motion_correct(save_movie=True)

Optionally inspect the results by comparing the original movie: note we are turning the gain up here to highlight motion.

In [None]:
#%% compare with original movie
display_movie = False
if display_movie:
    movie_corrected = cm.load(mot_correct.fname_tot_els)
    movie_orig = cm.load(movie_path)
    ds_ratio = 0.2
    cm.concatenate([movie_orig.resize(1, 1, ds_ratio) - mot_correct.min_mov*mot_correct.nonneg_movie,
                    movie_corrected.resize(1, 1, ds_ratio)], 
                    axis=2).play(fr=30, 
                                 gain=5, 
                                 magnification=2)  # press q to exit

<div class="alert alert-info" markdown="1">
    <h2 style="margin-top: 0;">For more on motion correction</h2>  
In this CNMF demo, we are mainly <em>applying</em> motion correction without going in-depth about how it is done. For a demo that provides more fine-grained analysis of Caiman's motion correction pipeline, see <a href="./demo_motion_correction.ipynb">demo_motion_correction.ipynb</a>.
</div>

## Memory (re)mapping 
When you ran motion correction, many things went on behind the scenes. One: it saved the motion corrected data as a memory mapped file in `'F'` order, in a file named `mc.mmap_file`. The `F` array is optimal for motion correction. The cell below saves the same motion corrected data in another memory mapped file in `'C'` order, which is optimal for CNMF, generating a new filename `fname_new`.

In [None]:
#%% MEMORY MAPPING
# memory map the file in order 'C'
border_to_0 = 0 if mot_correct.border_nan == 'copy' else mot_correct.border_to_0 # trim border against NaNs
motion_corrected_fname = cm.save_memmap(mot_correct.mmap_file, 
                                       base_name='memmap_', 
                                       order='C',
                                       border_to_0=border_to_0,  # exclude borders, if that was done
                                       dview=dview)

In [None]:
print(f"Memory mapped file inventory:")
print(f"\nF-order from motion correction:\n{mot_correct.mmap_file}")
print(f"\nC-order to be used for CNMF:\n{motion_corrected_fname}")

In [None]:
# now load the file
Yr, dims, T = cm.load_memmap(motion_corrected_fname)
images = np.reshape(Yr.T, [T] + list(dims), order='F') #reshape frames in standard 3d format (T x X x Y)

# note images are memmaped, so not loaded into memory
print(images.shape, type(images))

Restart the cluster to clean up memory in prep for CNMF run.

In [None]:
#%% restart cluster to clean up memory
cm.stop_server(dview=dview)
c, dview, n_processes = cm.cluster.setup_cluster(backend='multiprocessing', 
                                                 n_processes=num_processors_to_use, 
                                                 single_thread=False)

## Run CNMF on patches in parallel
- The FOV is split is different overlapping patches that are subsequently processed in parallel by the CNMF algorithm.
- The results from all the patches are merged with special attention to idendtified components on the border.
- The results are then refined by additional CNMF iterations.

Note that was a comment

     First extract spatial and temporal components on patches and combine them
     for this step deconvolution is turned off (p=0). If you want to have
     deconvolution within each patch change `params.patch['p_patch']` to a
     nonzero value

To do:
- Make a general patch drawing mechanism: maybe put into utils, have it take in stride, overlap.
- Plot stuff. 
- Put caveats in main text about this. 
- Explain these params better

Places in code:
- [CNMF Patch](https://github.com/flatironinstitute/CaImAn/blob/80e1681bbce8fb4a36b04c57d9e42ffb75a32d58/caiman/cluster.py#LL66C1-L66C119)
- [Motion patch](https://github.com/flatironinstitute/CaImAn/blob/80e1681bbce8fb4a36b04c57d9e42ffb75a32d58/caiman/motion_correction.py#L2197)

For figure: combinhe the stride and overlap picture from the motoin correction slide, with the patch picture used for CNMF. 

In [None]:
sliding_window?

In [None]:
parameters.patch

In [None]:
[slice(None)] * 2

In [None]:
from caiman.motion_correction import sliding_window

In [None]:
sliding_window?

In [None]:
def motion_patch_coords(template, patch_stride, patch_overlap):
        """

        template: ndarray
            reference image mot_correct.total_template_els

        strides: tuple
            strides of the patches in which the FOV is subdivided

        overlaps: tuple
            amount of pixel overlaping between patches along each dimension
        """ 
        
        overlaps = (patch_overlap, patch_overlap)
        strides = (patch_stride, patch_stride)
        # extract patches
        templates = [it[-1] for it in sliding_window(template, overlaps=overlaps, strides=strides)]
        xy_grid = [(it[0], it[1]) for it in sliding_window(template, overlaps=overlaps, strides=strides)]
        xy_corners = [(it[2], it[3]) for it in sliding_window(template, overlaps=overlaps, strides=strides)]
        num_tiles = np.prod(np.add(xy_grid[-1], 1))
        imgs = [it[-1]
                for it in sliding_window(template, overlaps=overlaps, strides=strides)]
        dim_grid = tuple(np.add(xy_grid[-1], 1))
        
        return imgs, xy_corners
print('done')

In [None]:
patch_stride = 15
patch_overlap = 5
patch_width = patch_stride + patch_overlap
window_dim = 150

In [None]:
range_col = list(range(0, window_dim - patch_width, patch_stride)) + [window_dim - patch_width] # concatenates final elt

In [None]:
range_col

In [None]:
patch_stride = 15
patch_overlap = 6

In [None]:
imgs, other_thing = motion_patch_coords(correlation_image,
                           patch_stride,
                           patch_overlap)


In [None]:
other_thing

In [None]:
def get_patch_ranges(im_dims, overlap, stride):
    """
    overlaps = (24, 24)     patch overlap    # overlap between pathes (size of patch strides+overlaps)
    strides = (48, 48)      patch width (sort of)   # rf start a new patch for pw-rigid motion correction every overlap+stride pixels


    """
    overlaps = (overlap, overlap)
    strides = (stride, stride)
    window_step =  = np.add(overlaps, strides)
    range_1 = list(range(
        0, im_dims[0] - windowSize[0], strides[0])) + [im_dims[0] - windowSize[0]]
    range_2 = list(range(
        0, im_dims[1] - windowSize[1], strides[1])) + [im_dims[1] - windowSize[1]]
    return np.array(range_1), np.array(range_2)

In [None]:
def get_patch_ranges(im_dims, patch_width, patch_overlap):
    """
    Simple patch extraction (assumes overlap is less than width)
    Assumes x/y width/overlap are symmetric
    
    im_dims (rows, cols)
    patch_width (int) pixels
    patch_overlaps (int) pixels
    """
    step_size = patch_width - patch_overlap
    start_range_rows = np.arange(0, im_dims[0], step_size) 
    stop_range_rows = np.arange(patch_width, im_dims[0], step_size)
    range_rows = (start_range_rows[:-1], stop_range_rows)
    
    start_range_cols = np.arange(0, im_dims[1], step_size) 
    stop_range_cols = np.arange(patch_width, im_dims[1], step_size)
    range_cols = (start_range_cols[:-1], stop_range_cols)
    return range_rows, range_cols
print('done')

In [None]:
dims

In [None]:
patch_overlap = parameters.patch['stride']
patch_width = parameters.patch['rf']*2
print(patch_overlap, patch_width)

In [None]:
range_rows, range_cols = get_patch_ranges(dims, patch_width, patch_overlap)
print('done')

In [None]:
Get Cn

In [None]:
cm.local_correlations?

In [None]:
correlation_image = cm.local_correlations(images, swap_dim=False)
correlation_image[np.isnan(correlation_image)] = 0

In [None]:
plt.imshow(correlation_image, cmap='gray', vmax='0.6')
for startloc in range_rows[0]:
    plt.axhline(startloc, color='pink', linewidth=2)
for stoploc in range_rows[1]:
    plt.axhline(stoploc, color='pink', linestyle='--', linewidth=0.5)
    
for startloc in range_cols[0]:
    plt.axvline(startloc, color='green')
for stoploc in range_cols[1]:
    plt.axvline(stoploc, color='lime', linestyle='--', linewidth=0.5)

In [None]:
%%time
cnm = cnmf.CNMF(n_processes, 
                params=parameters, 
                dview=dview)

images and such might want to futz with parameters: show table of parameters

In [None]:
cnm = cnm.fit(images)

## Run the entire pipeline up to this point with one command
It is possible to run the combined steps of motion correction, memory mapping, and cnmf fitting in one step as shown below. The command is commented out since the analysis has already been performed. It is recommended that you familiriaze yourself with the various steps and the results of the various steps before using it.

In [None]:
# cnm1 = cnmf.CNMF(n_processes, params=parameters, dview=dview)
# cnm1.fit_file(motion_correct=True)

### Inspecting the results
Briefly inspect the results by plotting contours of identified components against correlation image.
The results of the algorithm are stored in the object `cnm.estimates`. More information can be found in the definition of the `estimates` object and in the [wiki](https://github.com/flatironinstitute/CaImAn/wiki/Interpreting-Results).

In [None]:
#%% plot contours of found components
Cn = cm.local_correlations(images.transpose(1,2,0))
Cn[np.isnan(Cn)] = 0
cnm.estimates.plot_contours_nb(img=Cn)

## Re-run (seeded) CNMF  on the full Field of View  
You can re-run the CNMF algorithm seeded on just the selected components from the previous step. Be careful, because components rejected on the previous step will not be recovered here.

In [None]:
cnmf.CNMF.refit?

In [None]:
%%capture
#%% RE-RUN seeded CNMF on accepted patches to refine and perform deconvolution 
cnm2 = cnm.refit(images, dview=dview)

## Component Evaluation

The processing in patches creates several spurious components. These are filtered out by evaluating each component using three different criteria:

- the shape of each component must be correlated with the data at the corresponding location within the FOV
- a minimum peak SNR is required over the length of a transient
- each shape passes a CNN based classifier

In [None]:
#%% COMPONENT EVALUATION
# the components are evaluated in three ways:
#   a) the shape of each component must be correlated with the data
#   b) a minimum peak SNR is required over the length of a transient
#   c) each shape passes a CNN based classifier

cnm2.estimates.evaluate_components(images, cnm2.params, dview=dview)

Plot contours of selected and rejected components

In [None]:
#%% PLOT COMPONENTS
cnm2.estimates.plot_contours_nb(img=Cn, idx=cnm2.estimates.idx_components)

View traces of accepted and rejected components. Note that if you get data rate error you can start Jupyter notebooks using:
'jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10'

In [None]:
# accepted components
cnm2.estimates.nb_view_components(img=Cn, idx=cnm2.estimates.idx_components)

In [None]:
# rejected components
if len(cnm2.estimates.idx_components_bad) > 0:
    cnm2.estimates.nb_view_components(img=Cn, idx=cnm2.estimates.idx_components_bad)
else:
    print("No components were rejected.")

### Extract DF/F values

In [None]:
#%% Extract DF/F values
cnm2.estimates.detrend_df_f(quantileMin=8, frames_window=250)

### Select only high quality components

In [None]:
cnm2.estimates.select_components(use_object=True)

## Display final results

In [None]:
cnm2.estimates.nb_view_components(img=Cn, denoised_color='red')
print('you may need to change the data rate to generate this one: use jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10 before opening jupyter notebook')

## Closing, saving, and creating denoised version
### You can save an hdf5 file with all the fields of the cnmf object

In [None]:
save_results = False
if save_results:
    cnm2.save('analysis_results.hdf5')

### Stop cluster and clean up log files

In [None]:
#%% STOP CLUSTER and clean up log files
cm.stop_server(dview=dview)
log_files = glob.glob('*_LOG_*')
for log_file in log_files:
    os.remove(log_file)

### View movie with the results
We can inspect the denoised results by reconstructing the movie and playing alongside the original data and the resulting (amplified) residual movie

In [None]:
cnm2.estimates.play_movie(images, q_max=99.9, gain_res=2,
                                  magnification=2,
                                  bpx=border_to_0,
                                  include_bck=False)

The denoised movie can also be explicitly constructed using:

In [None]:
#%% reconstruct denoised movie
denoised = cm.movie(cnm2.estimates.A.dot(cnm2.estimates.C) + \
                    cnm2.estimates.b.dot(cnm2.estimates.f)).reshape(dims + (-1,), order='F').transpose([2, 0, 1])

# Appendices

<a id=logging_explained></a>
## Appendix 1: Logging
Python has a powerful built-in [logging module](https://docs.python.org/3/library/logging.html) for generating log messages while a program is running. It lets you generate custom log messages, and set a threshold to determine which logs you will see. You will only receive messages above the severity threshold you set: you can choose from: `logging.DEBUG`, `logging.INFO`, `logging.WARNING`, `logging.ERROR`, or `logging.CRITICAL`. For instance, setting the threshold to `logging.DEBUG` will print out every logging statement, while setting it to `logging.ERROR` will print out only errors and critical messages. This system gives much more flexibility and control than interspersing `print()` statements throughought your code when debugging. 

Our custom formatted log string is defined in the `log_format` parameter below, which draws from a predefined [set of attributes](https://docs.python.org/3/library/logging.html#logrecord-attributes) provided by the logging module. We have set each log to display the time, severity level, filename/function name/line number of the file creating the log, process ID, and the actual log message. 

While logging is especially helpful when running code on a server, it can also be helpful to get feedback in real time on your personal machine, either to audit progress or diagnose problems when debugging. If you set this feature up by running the following cell, the logs will by default go to console. If you want to direct your log to file (which you can indicate with `use_logfile = True`), then it will automatically be directed to your `caiman_data/temp` directory as defined in the `caiman.paths` module. You can set another path with the `filename` parameter.

In [None]:
use_logfile = False # If set to True, will log to file
if use_logfile:
    log_file = Path(cm.paths.get_tempdir()) / 'cnmf_demo.log' # 
    print(f"Will save logging data to {tmp_file}")
else:
    log_file = None
log_format = "{asctime} - {levelname} - [{filename} {funcName}() {lineno}] - pid {process} - {message}"
logging.basicConfig(format=log_format,
                    filename=log_file, 
                    level=logging.WARNING, style="{") #DEBUG, INFO, WARNING, ERROR, CRITICAL

<div class="alert alert-info">
   
Once you have set up your logging configuration, you can change the level (say, from `WARNING` to `DEBUG`) using the following: `logging.getLogger().setLevel(logging.DEBUG)`. 
    
</div> 

<a id=multiple_files></a>
## Appendix 2: Working with multiple files
Many acquisition systems break up data from a single continuous session across multiple files. It is relatively easy to adapt the current demo to work with multiple files so you can see what the workflow would be like. There are a couple of changes you would need to make. First, instead of just downloading a single demo file, you would want to create a *list* of files that would later be treated as a single continuous data stream. While in the main demo, we have a `movie_path` variable that contains a single path, in this case we want to create a *list* of such paths in a `movie_paths` variable. We have split the main demo movie into two `Sue_split1.tif` and `Sue_split2.tif` to show how this can be done using similar mechanisms to the main one-file pipeline:

    movie_path1 = download_demo('Sue_Split1.tif')
    movie_path2 = download_demo('Sue_Split2.tif')
    movie_paths = [movie_path1, movie_path2]

Then, when creating the movie object instead of using `cm.load()` you would use `cm.load_movie_chain()` which takes in a list as an argument:

    movie_orig = cm.load_movie_chain(movie_paths)

If your data is too large to fit in RAM and you only want to load a subset, please see Appendix 3. 

<div class="alert alert-info">
   
If you have <b>noncontiguous</b> recording sessions, for instance data in files from sessions separated by many days or weeks, and you need to register/match the neurons from these sessions, this is a different use case. We do have a demo for that: see [demo_multisession_registration.ipynb](./demo_multisession_registration.ipynb).
    
</div> 



<a id=display_large></a>
## Appendix 3: Displaying large files
Loading movie objects requires loading all of the data you want to view into memory, and this will not be possible with extremely large data sets. But even with very large data sets, you typically want to visualize what is going on, make sure things seem reasonable, etc. Caiman has built-in tools to just load some of a movie into a movie object using the `subindices` argument to the `load()` function. For example, if you just want to load the first 500 frames of a movie, you can send it `subindices=np.arange(0,500)`. 

<div class="alert alert-info">
   
If are working with a list of movies, the `subindices` filter can also be applied with `load_movie_chain()` and the filter will be applied to each movie in the list and `load_movie_chain()` will return the concatenated result.
    
</div> 

<a id=caiman_estimators></a>
## Appendix 4: Estimators and parameters

For the main stages of the pipeline -- like motion correction and CNMF -- Caiman breaks things into two steps:

- Construct estimator object (e.g., `MotionCorrect`, `CNMF`) by sending it the set of parameters it will use. 
- Run the method on the object to generate the results. For `CNMF` this will be the `fit()` method, for motion correction it is `motion_correct()`.

This modular architecture, where models are initialized with parameters, and then computations are run with a separate call to a method that carries out the actual computation, is useful for a few reasons. One, it allows for more efficient exploration of parameter space. Often, after setting some *initial* set of parameters, you will want to modify the parameters after exploring your original data (e.g., after viewing the size of the neurons, or looking at effects of changing correlation thresholds when running `CNMFE`). 

Note that our API is similar to the interface used by the [scikit-learn](https://scikit-learn.org/stable/) machine learning library. From their [manuscript on api design](https://arxiv.org/abs/1309.0238):

    Estimator initialization and actual learning are strictly separated...The constructor of an estimator does 
    not see any actual data, nor does it perform any actual learning. All it does is attach the given parameters 
    to the object....Actual learning is performed by the `fit` method. p 4-5

Thanks to Kushal Kolar for pointing out this document.

<a id=the_cluster></a>
## Appendix 4: More on cluster setup
Caiman is optimized for parallelization and works well at HPC centers as well as laptops with multiple CPU cores. The cluster is set up with the `setup_cluster()` function, which takes in multiple parameters.

The `backend` parameter determines the type of cluster used. The default value `'multiprocessing'` uses the multiprocessing package. The `ipyparallel` option is also available. More information on these choices can be found [here](https://github.com/flatironinstitute/CaImAn/blob/master/docs/CLUSTER.md). You can set the number of processes (cpu cores) to use with the `n_processes` variable: the default value `None` will lead to the function selecting one less than the total number of logical cores available. 

The output variable `dview` returned by the function is the multicore processing object that will be used in subsequent processing steps (for multiprocessing, it is a multiprocessing pool). The tepid name is something of a misnomer: it stands for `DirectView` which is from the ipyparallel package.

`dview`, despite its underwhelming name, is an important variable: it is the multicore processor that will be passed around in subsequent stages in Caiman's processing pipeline: it is the engine that drives parallelization. 