# Pure Python Pipeline
Ripple noise removal, motion correction, trace deconvoloution and extraction in one notebook. This should replace demo_pipeline as the standard analysis notebook! 

## TODO list (from demo_pipeline)
- TODO: test string attributes in moco_pars.h5 file. Are they complete (check dimensions of arrays like coord_shifts_els.
- TODO: clean up memmap and temporary files after opened in caim (F-order mmap stays until end). Need good naming here! Create ok/cancel dialog to confirm if delete temporary files
- TODO: first, MoCO result is exported to C-order mmap (motion_correct() save_file=True), then F-order, this latter one is opened in CaImAn. Do we need both?
- TODO: rnr save file: should have rnr in the hdf5 file name, otherwise too confusing!
- TODO: create a small txt file with the file names and their purpose (whether they can be deleted, how to use them, what do they contain) at some point (early) in the analysis, save it in export folder.
- FIXME: _pars.json and _results.hdf5 contain nd2 file name twice: T301_tmev_d1T301_tmev_d1.270820.1110_22-10-20_14-18-40_pars.json and T301_tmev_d1T301_tmev_d1.270820.1110_22-10-20_14-18-40_results.hdf5
- IMPORTANT: make it more convenient to enter pipeline from any point. This includes defining parameters in one location, naming variables appropriately (F memmap, C memmap, nd2 file, hd5 file...) so user is aware which file they are supposed to open at which point of entry into analysis!
- memmap_ results in conflicting names if recordings are from same day. Include date time of analysis in name? Not a big problem as it is temporary file
- Study parallel processing of caiman (start server step, cleaning up server). It might be useful for RNR too.
- Evaluating components: cnm2.estimates.evaluate_components(images, cnm2.params, dview=dview), the contents of model/ in CaImAn are used but looking for the model files in another directory (in my case, Users/Bence/caiman_data/model/)
- Export h5 file should have date time in filename to avoid overwriting. Both raw data and final results!
- Plot with slider: watch all the frames, compare RNR and original, then MC and original/RNR... QC
- Save RNR directly to memmap (opening as caiman movie, save to memmap?)? Although the problem is how slow RNR is...
- Maybe working with numpy array in motion correction (movie.motion_correct) is not that bad? Although no parameters...
- Plot frame before RNR and after RNR to set parameters... Interactive?
- Read Tips on analysis: https://caiman.readthedocs.io/en/master/CaImAn_Tips.html#motion-correction-tips
- RNR results in 4x size (uint16 to float64)! Need to clean up or use uint16 again.
- Check 2-channel recordings. Might want to save red channel, too, for matching?
- Save memmap files is inconsistent in naming (C order is memmap__d1_512_d2_512_d3_1_order_C_frames_577_ instead of T386_20211202_green_ex_els__d1_512_d2_512_d3_1_order_C_frames_577_)
- Include nd2 to h5 here (from nd2 to multipage tiff test.ipynb)
- It takes a lot of time to open nd2 file. Useful to copy data to be analyzed to local HDD on a previous day?
- way to manually reject/accept components
- IMPORTANT: https://caiman.readthedocs.io/en/master/On_file_types_and_sizes.html caiman works best when files are 1-2 GB big! It means we might want to split them in small pieces, or make sure they are multi-page tiff files!

## Import packages

In [None]:
#Auto-reload modules (used to develop functions outside this notebook)
%load_ext autoreload
%autoreload 2

In [None]:
from RippleNoiseRemoval import RNR
import h5py
from time import time

import bokeh.plotting as bpl
import cv2
import glob
import logging
import matplotlib.pyplot as plt
import numpy as np
import os
from labrotation import file_handling as fh
from copy import deepcopy
try:
    cv2.setNumThreads(0)
except():
    pass

import caiman as cm
from caiman.motion_correction import MotionCorrect
from caiman.source_extraction.cnmf import cnmf as cnmf
from caiman.source_extraction.cnmf import params as params
from caiman.utils.utils import download_demo
from caiman.utils.visualization import plot_contours, nb_view_patches, nb_plot_contour

from movie_splitting import numpy_to_hdf5

import json  # for exporting parameters

# for exporting moco data:
from caiman.motion_correction import sliding_window
import cv2

import pandas as pd  # for opening data documentation
import warnings
import uuid  # for generating UUID in case of missing value
bpl.output_notebook()

# If exists, load environmental variables from .env file

In [None]:
env_dict = dict()
if not os.path.exists("./.env"):
    print(".env does not exist")
else:
    with open("./.env", "r") as f:
        for line in f.readlines():
            l = line.rstrip().split("=")
            env_dict[l[0]] = l[1]
print(env_dict.keys())

In [None]:
log_fname = fh.get_filename_with_date("caim_log", ".txt")
if "LOG_FOLDER" in env_dict.keys():
    log_fname = os.path.join(env_dict["LOG_FOLDER"], log_fname)
else:
    log_fname = fh.choose_dir_for_saving_file("Select folder to save log file", log_fname)
print(f"Saving log file to\n{log_fname}")

## Set up logging (optional)

In [None]:
logging.basicConfig(format=
                          "%(relativeCreated)12d [%(filename)s:%(funcName)20s():%(lineno)s] [%(process)d] %(message)s",\
                    filename=log_fname,
                    level=logging.WARNING)

## Set input and output files

In [None]:
# nd2 input file
nd2_fpath = fh.open_file("Select nd2 file")

# set folder to export temporary and result files
export_folder = fh.open_dir("Select folder to save results", True)

In [None]:
nd2_fname = os.path.split(nd2_fpath)[-1]
# export_fname: get rid of .nd2 extension, append date and .h5 extension
#export_fname = fh.get_filename_with_date(os.path.splitext(os.path.split(nd2_fpath)[1])[0] + "_caim", ".h5")
#export_hd5_fpath = os.path.join(export_folder, export_fname)
#print(f"Export file selected: {export_hd5_fpath}")

results_root = fh.get_filename_with_date(os.path.splitext(os.path.split(nd2_fpath)[1])[0], "")

rnr_fname = results_root + "_rnr.hdf5"
rnr_fpath = os.path.join(export_folder, rnr_fname)

# input for motion correction; moco comes after RNR
moco_fnames = [rnr_fpath]

# rnr_fpath should be hdf5 for now. Not sure if MoCo/CaImAn supports h5.
assert rnr_fpath.split(".")[-1] in ["h5", "hdf5"], f"Invalid file extension: .{rnr_fpath.split('.')[-1]}, expected .h5"

cnmf_results_save_path = os.path.join(export_folder, results_root + "_cnmf.hdf5")  # caiman only supports saving hdf5, not h5

json_fname = results_root + "_pars.json"
json_fpath = os.path.join(export_folder, json_fname)

moco_pars_fname = results_root + "_moco_pars.h5"
moco_pars_fpath = os.path.join(export_folder, moco_pars_fname)

denoised_optional_fpath = os.path.join(export_folder, results_root + "_denoised.tif") 


print(f"Input file selected:\n\t{nd2_fpath}")

print(f"Temporary file after RNR will be saved as\n\t{rnr_fpath}")
print(f"Going to perform MoCo on\n\t{moco_fnames}")
print(f"Results of trace extraction will be saved as\n\t{cnmf_results_save_path}")
print(f"Parameters will be saved as\n\t{json_fpath}")
print(f"MoCo parameters will be saved as\n\t{moco_pars_fpath}")
print(f"\nOptional denoised results will be saved as\n\t{denoised_optional_fpath}")

# Add UUID

In [None]:
if "DATA_DOCU_FOLDER" in env_dict:  # try default location
    data_docu_folder = env_dict["DATA_DOCU_FOLDER"]
else:
    data_docu_folder = fh.open_dir("Open Data Documentation folder")

In [None]:
docu_files_list = []
session_uuid = None
for root, dirs, files in os.walk(data_docu_folder):
    for name in files:
        if "grouping" in name:
            if "~" in name: # "~" on windows is used for temporary files that are opened in excel
                docu_files_list = []
                raise Exception(f"Please close all excel files and try again. Found temporary file in:\n{os.path.join(root, name)}")
            fpath = os.path.join(root, name)
            df = pd.read_excel(fpath)
            df = df[df["nd2"] == nd2_fname]
            if len(df) > 0:
                if len(df) > 1:
                    raise Exception(f"File name appears several times in data documentation:\n\t{nd2_fname}\n{df}")
                else:
                    session_uuid = df["uuid"].iloc[0]
                break
            docu_files_list.append(fpath)
if session_uuid is None:
    session_uuid = uuid.uuid4().hex 
    warnings.warn(f"Warning: movie does not have entry (uuid) in data documentation!\nYou should add data to documentation. The generated uuid for this session is: {session_uuid}", UserWarning)
print(f"UUID is {session_uuid}")

## Ripple Noise Removal

In [None]:
win = 40
amplitude_threshold = 10.8

In [None]:
rnr = RNR(win, amplitude_threshold) 

In [None]:
t0_open = time()
rnr.open_recording(nd2_fpath)  # opens usual recording size (8.8-9 GB) in about 830 s
print(f"File opened in {time() - t0_open} s")

In [None]:
t0_single = time()
rnr_data = rnr.rnr_singlethread()  # a bit faster than opening file, around 500s for 8.8-9 GB
t1_single = time()
print(f"RNR single thread finished in {t1_single - t0_single} s")
print(f"Result is a {type(rnr_data)} with datatype {rnr_data.dtype}")
print(f"Shape: {rnr_data.shape[0]} frames of {rnr_data.shape[1]}x{rnr_data.shape[2]} pixels")

In [None]:
type(rnr_data)

In [None]:
rnr_data.shape

### Export RNR movie to hd5 file.
The reason to this otherwise unnecessary step is that motion correction cannot work from numpy array... Or at least the movie.motion_correct() does not have many options. See https://caiman.readthedocs.io/en/master/core_functions.html#movie-handling motion_correct

In [None]:
numpy_to_hdf5(rnr_data.astype(np.uint16), rnr_fpath)

## Motion Correction

### For compatibility of exported moco params file (with pure python pipeline splitting), need to define splitting variables

In [None]:
moco_intervals = [(1, len(rnr.nd2_data))]
moco_flags = [True]
cnmf_intervals = moco_intervals.copy()
cnmf_flags = [True]

# in RNR (first import of nd2 file), there is the choice to import part of the file
begin_end_frames = (1, len(rnr.nd2_data))  

### Optional: Play the movie

In [None]:
display_movie = False
if display_movie:
    ds_ratio = 0.2
    movie.resize(1, 1, ds_ratio).play(
        q_max=99.5, fr=30, magnification=2)  # this should not change size of movie itself

### Setup some parameters
We set some parameters that are relevant to the file, and then parameters for motion correction, processing with CNMF and component quality evaluation. Note that the dataset `Sue_2x_3000_40_-46.tif` has been spatially downsampled by a factor of 2 and has a lower than usual spatial resolution (2um/pixel). As a result several parameters (`gSig, strides, max_shifts, rf, stride_cnmf`) have lower values (halved compared to a dataset with spatial resolution 1um/pixel).

In [None]:
# dataset dependent parameters
fr = 15                             # imaging rate in frames per second
decay_time = 0.4                    # length of a typical transient in seconds

# motion correction parameters
strides = (48, 48)          # start a new patch for pw-rigid motion correction every x pixels
overlaps = (24, 24)         # overlap between pathes (size of patch strides+overlaps)
max_shifts = (6,6)          # maximum allowed rigid shifts (in pixels)
max_deviation_rigid = 3     # maximum shifts deviation allowed for patch with respect to rigid shifts
pw_rigid = True             # flag for performing non-rigid motion correction

# parameters for source extraction and deconvolution
# see https://www.youtube.com/watch?v=wUhKkNtSu_s 21:10
p = 2                       # order of the autoregressive system (original: 1 (2 advised for slow signal like GCaMP6s))
gnb = 2                     # number of global background components
merge_thr = 0.8             # merging threshold, max correlation allowed. (original: 0.85)
# WARNING: for photostim, seizures etc., this might be the reason why neurons (all highly correlated) get drawn together.

rf = 15                     # half-size of the patches in pixels. e.g., if rf=25, patches are 50x50
stride_cnmf = 6             # amount of overlap between the patches in pixels
K = 4                       # number of components per patch
gSig = [6, 6]               # expected half size of neurons in pixels (original value: 4, 4)
method_init = 'greedy_roi'  # initialization method (if analyzing dendritic data using 'sparse_nmf')
ssub = 1                    # spatial subsampling during initialization
tsub = 1                    # temporal subsampling during intialization

# parameters for component evaluation
min_SNR = 2.0               # signal to noise ratio for accepting a component
rval_thr = 0.85              # space correlation threshold for accepting a component
cnn_thr = 0.99              # threshold for CNN based classifier
cnn_lowest = 0.5 # neurons with cnn probability lower than this value are rejected 
                # (original: 0.1; found a lot of artefacts accepted with CNN classifier predicting <0.2)

### Create a parameters object
You can creating a parameters object by passing all the parameters as a single dictionary. Parameters not defined in the dictionary will assume their default values. The resulting `params` object is a collection of subdictionaries pertaining to the dataset to be analyzed `(params.data)`, motion correction `(params.motion)`, data pre-processing `(params.preprocess)`, initialization `(params.init)`, patch processing `(params.patch)`, spatial and temporal component `(params.spatial), (params.temporal)`, quality evaluation `(params.quality)` and online processing `(params.online)`

In [None]:
opts_dict = {'fnames': moco_fnames, 
            'fr': fr,
            'decay_time': decay_time,
            'strides': strides,
            'overlaps': overlaps,
            'max_shifts': max_shifts,
            'max_deviation_rigid': max_deviation_rigid,
            'pw_rigid': pw_rigid,
            'p': p,
            'nb': gnb,
            'rf': rf,
            'K': K, 
            'stride': stride_cnmf,
            'method_init': method_init,
            'rolling_sum': True,
            'only_init': True,
            'ssub': ssub,
            'tsub': tsub,
            'merge_thr': merge_thr, 
            'min_SNR': min_SNR,
            'rval_thr': rval_thr,
            'use_cnn': True,
            'min_cnn_thr': cnn_thr,
            'cnn_lowest': cnn_lowest,
            'var_name_hdf5': 'data',}  # FIXME: does not work! Check where does this setting get lost?

opts = params.CNMFParams(params_dict=opts_dict)

### Setup a cluster
To enable parallel processing a (local) cluster needs to be set up. This is done with a cell below. The variable `backend` determines the type of cluster used. The default value `'local'` uses the multiprocessing package. The `ipyparallel` option is also available. More information on these choices can be found [here](https://github.com/flatironinstitute/CaImAn/blob/master/CLUSTER.md). The resulting variable `dview` expresses the cluster option. If you use `dview=dview` in the downstream analysis then parallel processing will be used. If you use `dview=None` then no parallel processing will be employed.

In [None]:
#%% start a cluster for parallel processing (if a cluster already exists it will be closed and a new session will be opened)
if 'dview' in locals():
    cm.stop_server(dview=dview)
c, dview, n_processes = cm.cluster.setup_cluster(
    backend='local', n_processes=None, single_thread=False)

In [None]:
mc = MotionCorrect(moco_fnames, dview=dview, **opts.get_group('motion'))

In [None]:
# TODO: ALTERNATIVE to exporting h5 and importing it again!
"""
Args:
            max_shift_w,max_shift_h: maximum pixel shifts allowed when correcting
                                     in the width and height direction

            template: if a good template for frame by frame correlation exists
                      it can be passed. If None it is automatically computed

            method: depends on what is installed 'opencv' or 'skimage'. 'skimage'
                    is an order of magnitude slower

            num_frames_template: if only a subset of the movies needs to be loaded
                                 for efficiency/speed reasons
                                 
max_shift_w=5,
max_shift_h=5,
num_frames_template=None,
template=None,
method: str = 'opencv',
remove_blanks: bool = False,
interpolation: str = 'cubic'
"""

# movie.motion_correct()   # this might change movie itself! Alternative: extract_shifts, apply_shifts

### Perform motion correction and save as C-order memmap
The filename is mc.fname_tot_els and mc.mmap_file

In [None]:
#%%capture
#%% Run piecewise-rigid motion correction using NoRMCorre
mc.motion_correct(save_movie=True)
m_els = cm.load(mc.fname_tot_els)
border_to_0 = 0 if mc.border_nan is 'copy' else mc.border_to_0  # FIXME: gives warning, should use "==" with literals
    # maximum shift to be used for trimming against NaNs

### Optional: show comparison with original movie

In [None]:
#%% compare with original movie
display_movie = False  # TODO: does not seem to work. Create own function to show result?
if display_movie:
    m_orig = cm.load_movie_chain(moco_fnames)
    ds_ratio = 0.2
    cm.concatenate([m_orig.resize(1, 1, ds_ratio) - mc.min_mov*mc.nonneg_movie,
                    m_els.resize(1, 1, ds_ratio)], 
                   axis=2).play(fr=60, gain=15, magnification=2, offset=0)  # press q to exit

### Save C-order memmap

In [None]:
#%% MEMORY MAPPING
fname_mmap_f = mc.mmap_file
# memory map the file in order 'C'
fname_mmap_c = cm.save_memmap(fname_mmap_f, base_name='memmap', order='C',
                           border_to_0=border_to_0, dview=dview) # exclude borders

In [None]:
if 'fname_mmap_c' not in locals():
    fname_mmap_c = fh.open_file("Select C-memmap file.")
print(f"Working with C-memmap\n{fname_mmap_c}")

In [None]:
#%% restart cluster to clean up memory
if "dview" in locals():
    cm.stop_server(dview=dview)
c, dview, n_processes = cm.cluster.setup_cluster(
    backend='local', n_processes=None, single_thread=False)

In [None]:
# now load the file
Yr, dims, T = cm.load_memmap(fname_mmap_c)
images = np.reshape(Yr.T, [T] + list(dims), order='F') 
    #load frames in python format (T x X x Y)

### Clean up memory now

### Run CNMF on patches in parallel

In [None]:
%%capture
#%% RUN CNMF ON PATCHES
# First extract spatial and temporal components on patches and combine them
# for this step deconvolution is turned off (p=0). If you want to have
# deconvolution within each patch change params.patch['p_patch'] to a
# nonzero value
cnm = cnmf.CNMF(n_processes, params=opts, dview=dview)
cnm = cnm.fit(images)

### Inspecting the results
Briefly inspect the results by plotting contours of identified components against correlation image.
The results of the algorithm are stored in the object `cnm.estimates`. More information can be found in the definition of the `estimates` object and in the [wiki](https://github.com/flatironinstitute/CaImAn/wiki/Interpreting-Results).

In [None]:
#%% plot contours of found components
Cn = cm.local_correlations(images.transpose(1,2,0))
Cn[np.isnan(Cn)] = 0
cnm.estimates.plot_contours_nb(img=Cn)

In [None]:
cnm.estimates.nb_view_components(img=Cn, idx=cnm.estimates.idx_components)

## Re-run (seeded) CNMF  on the full Field of View  
You can re-run the CNMF algorithm seeded on just the selected components from the previous step. Be careful, because components rejected on the previous step will not be recovered here.

In [None]:
%%capture
#%% RE-RUN seeded CNMF on accepted patches to refine and perform deconvolution 
cnm2 = cnm.refit(images, dview=dview)  
# cnm and cnm2 reference the same object! Still useful, as existence of cnm2 implies that this step was made.

## Component Evaluation

The processing in patches creates several spurious components. These are filtered out by evaluating each component using three different criteria:

- the shape of each component must be correlated with the data at the corresponding location within the FOV
- a minimum peak SNR is required over the length of a transient
- each shape passes a CNN based classifier

In [None]:
#%% COMPONENT EVALUATION
# the components are evaluated in three ways:
#   a) the shape of each component must be correlated with the data
#   b) a minimum peak SNR is required over the length of a transient
#   c) each shape passes a CNN based classifier

# if performed re-run:
if "cnm2" in locals():
    cnm2.estimates.evaluate_components(images, cnm2.params, dview=dview)
else:
    cnm.estimates.evaluate_components(images, cnm.params, dview=dview)

Plot contours of selected and rejected components

In [None]:
#%% PLOT COMPONENTS
if "cnm2" in locals():
    cnm2.estimates.plot_contours_nb(img=Cn, idx=cnm2.estimates.idx_components)
else:
    cnm.estimates.plot_contours_nb(img=Cn, idx=cnm.estimates.idx_components)

View traces of accepted and rejected components. Note that if you get data rate error you can start Jupyter notebooks using:
'jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10'

In [None]:
# accepted components
plot_accepted = True
if plot_accepted:
    if "cnm2" in locals():
        cnm2.estimates.nb_view_components(img=Cn, idx=cnm2.estimates.idx_components)
    else:
        cnm.estimates.nb_view_components(img=Cn, idx=cnm.estimates.idx_components)

In [None]:
# rejected components
plot_rejected = False
if plot_rejected:
    if "cnm2" in locals():
        if len(cnm2.estimates.idx_components_bad) > 0:
            cnm2.estimates.nb_view_components(img=Cn, idx=cnm2.estimates.idx_components_bad)
        else:
            print("No components were rejected.")
    else:
        if len(cnm.estimates.idx_components_bad) > 0:
            cnm.estimates.nb_view_components(img=Cn, idx=cnm.estimates.idx_components_bad)
        else:
            print("No components were rejected.")

# Manually review and correct classifications
It is only possible to check rejected and accepted components individually. A unified view can not be shown probably due to JavaScript limitations (or Bokeh limitations, more precisely). 

In [None]:
from labrotation.two_photon_session import nb_view_components_manual_control, reopen_manual_control

In [None]:
cnm2.estimates.idx_components

## Check rejected components first

In [None]:
manual_accepted_fname = nb_view_components_manual_control(cnm2.estimates, img=Cn, denoised_color=None, cmap='jet', thr=0.99, mode="rejected")

## Move selected components to accepted

In [None]:
false_rejected = reopen_manual_control(manual_accepted_fname)
print(false_rejected)

In [None]:
n_old_accepted = len(cnm2.estimates.idx_components)
n_old_rejected = len(cnm2.estimates.idx_components_bad)
n_false_rejected = len(false_rejected)

arr_accepted = np.zeros(shape=(n_old_accepted + n_false_rejected,), dtype=cnm2.estimates.idx_components.dtype) 
arr_rejected = np.zeros(shape=(n_old_rejected - n_false_rejected,), dtype=cnm2.estimates.idx_components_bad.dtype) 

In [None]:
# get new accepted list. Consists of old accepted components plus elements of false_rejected.
# TODO: alternative is np.concatenate(cnm2.estimates.idx_components, false_rejected)
for i in range(n_old_accepted):
    arr_accepted[i] = cnm2.estimates.idx_components[i]
for i_extra in range(n_false_rejected):
    arr_accepted[n_old_accepted + i_extra] = false_rejected[i_extra]
arr_accepted = np.sort(arr_accepted)

# copy all rejected elements minus falsely rejected into new array of rejected components
i_new_rejected = 0
for i in range(n_old_rejected):
    if cnm2.estimates.idx_components_bad[i] not in false_rejected:
        arr_rejected[i_new_rejected] = cnm2.estimates.idx_components_bad[i]
        i_new_rejected += 1

In [None]:
for false_rejection in false_rejected:
    assert false_rejection not in arr_rejected

## Warning: old classification data will be overwritten here!

In [None]:
if len(false_rejected) > 0:
    cnm2.estimates.idx_components = arr_accepted.copy()
    cnm2.estimates.idx_components_bad = arr_rejected.copy()
    print("Replaced cnm2.estimates fields with new component lists")
del arr_accepted
del arr_rejected

## Check accepted components, reject falsely classified neurons

In [None]:
manual_rejected_fname = nb_view_components_manual_control(cnm2.estimates, img=Cn, denoised_color=None, cmap='jet', thr=0.99, mode="accepted")

In [None]:
false_accepted = reopen_manual_control(manual_rejected_fname)
print(false_accepted)

In [None]:
n_old_accepted = len(cnm2.estimates.idx_components)
n_old_rejected = len(cnm2.estimates.idx_components_bad)
n_false_accepted = len(false_accepted)

arr_accepted = np.zeros(shape=(n_old_accepted - n_false_accepted,), dtype=cnm2.estimates.idx_components.dtype) 
arr_rejected = np.zeros(shape=(n_old_rejected + n_false_accepted,), dtype=cnm2.estimates.idx_components_bad.dtype) 

In [None]:
# get new rejected list. Consists of old rejected components plus elements of false_accepted.
for i in range(n_old_rejected):
    arr_rejected[i] = cnm2.estimates.idx_components_bad[i]
for i_extra in range(n_false_accepted):
    arr_rejected[n_old_rejected + i_extra] = false_accepted[i_extra]
arr_rejected = np.sort(arr_rejected)

# copy all accepted elements minus falsely accepted into new array of accepted components
i_new_accepted = 0
for i in range(n_old_accepted):
    if cnm2.estimates.idx_components[i] not in false_accepted:
        arr_accepted[i_new_accepted] = cnm2.estimates.idx_components[i]
        i_new_accepted += 1

## Warning: old classification data will be overwritten here!

In [None]:
if len(false_accepted) > 0:
    cnm2.estimates.idx_components = arr_accepted.copy()
    cnm2.estimates.idx_components_bad = arr_rejected.copy()
    print("Replaced cnm2.estimates fields with new component lists")
del arr_accepted
del arr_rejected

## Experimental: merge components
WARNING: right now, it takes a lot of memory (for 9 GB video, around 100 GB). Could be reduced to significantly less, maybe 18 GB, maybe even to 9GB, with optimizations... Main source of RAM requirement is working with 64-bit float instead of 16-bit integers, defining the components as individual parameters instead of just using one.

In [None]:
if cnm2.skip_refinement:
    print("Merging did not take place. It makes sense to merge components.")
    to_merge = True
else:
    print("Merging did take place in fit()! No need to merge again.")
    to_merge = False

# Set to true to save movie of background for checking algorithm correctness 
save_background = False

In [None]:
if to_merge:
    AC = (cnm2.estimates.A * cnm2.estimates.C)
    AC = AC.reshape((512,512,AC.shape[1]))
    # create global background components
    bf1 = np.outer(cnm2.estimates.b[:,0],cnm2.estimates.f[0]).reshape((512,512,bf1.shape[1]))
    bf2 = np.outer(cnm2.estimates.b[:,1],cnm2.estimates.f[1]).reshape((512,512,bf2.shape[1]))
    
    # the parameters should have shape (x, y, n_frames), for example, (512, 512, n_frames)
    assert bf1.shape[-1] == AC.shape[-1]
    assert iamges.shape[0] == bf1.shape[-1]  # assert that the images is of dimensions (n_frames, 512, 512)
    print(bf1.shape)
    
    # reformat all information-carrying components to same shape as original movie (images)
    AC = np.moveaxis(AC, [0, 1], [-2, -1])
    bf1 = np.moveaxis(bf1, [0, 1], [-2, -1])
    bf2 = np.moveaxis(bf2, [0, 1], [-2, -1])
    
    # Get residual movie: Y_res = Y - A*C - b*f
    Y_res = images - AC - bf1 - bf2
    Y_res = Y_res.astype(np.int16)  # reduce variable size
    
    del bf1, bf2, AC
    
    cnm2.merge_comps(Y=Y_res.reshape((Y_res.shape[0]*Y_res.shape[1], Y_res.shape[2])))
    
    # Optional: save background as tif to check correctness
    if save_background:
        import tifffile as tif
        tif.imsave(os.path.join(export_folder, 'background.tif'), Y_res, bigtiff=True)
else:
    print("No merging took place.")

### Extract DF/F values

In [None]:
#%% Extract DF/F values
#FIXME: "Oops!" printed when cnm2 not in locals (i.e. no refitting was done). Possibly this function never returns.
if "cnm2" in locals():
    cnm2.estimates.detrend_df_f(quantileMin=8, frames_window=250)
else:
    cnm.estimates.detrend_df_f(quantileMin=8, frames_window=250)

### Select only high quality components
**IMPORTANT** up until running `select_components()`, `cnm2.estimates.idx_components` and `cnm2.estimates.idx_components_bad` contain the indices of the accepted and rejected components, respectively. After running select_components, these entries disappear (are set to None). Then, in `cnm2.estimates`

In [None]:
print(f"{len(cnm2.estimates.idx_components)} accepted, {len(cnm2.estimates.idx_components_bad)} rejected components")

In [None]:
# expicitly state to save discarded components (default is also True)
if "cnm2" in locals():
    cnm2.estimates.select_components(use_object=True, save_discarded_components=True)
else:
    cnm.estimates.select_components(use_object=True, save_discarded_components=True)

In [None]:
#print(f"{len(cnm2.estimates.idx_components)} accepted, {len(cnm2.estimates.idx_components_bad)} rejected components")
cnm2.estimates.nr

## Optional: Display final results

In [None]:
if "cnm2" in locals():
    cnm2.estimates.nb_view_components(img=Cn, denoised_color='red')
else:
    cnm.estimates.nb_view_components(img=Cn, denoised_color='red')
print('you may need to change the data rate to generate this one: use jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10 before opening jupyter notebook')

## Saving, closing, and creating denoised version
### You can save an hdf5 file with all the fields of the cnmf object. Use load_CNMF() to open the results again

In [None]:
save_results = True
if save_results:
    if "cnm2" in locals():
        cnm2.save(cnmf_results_save_path)
    else:
        cnm.save(cnmf_results_save_path)
    print(f"saved to\n{cnmf_results_save_path}")

### Add uuid as attribute to cnmf file.

In [None]:
with h5py.File(cnmf_results_save_path, 'r+') as hf:
    hf.attrs["uuid"] = session_uuid

### Stop cluster and clean up log files

In [None]:
#%% STOP CLUSTER and clean up log files
cm.stop_server(dview=dview)
log_files = glob.glob('*_LOG_*')
for log_file in log_files:
    os.remove(log_file)

### Export parameters and metadata as json

In [None]:
json_dict = opts_dict.copy()
json_dict["original_fnames"] = nd2_fpath
json_dict["rnr_win"] = win
json_dict["amplitude_threshold"] = amplitude_threshold
json_dict["uuid"] = session_uuid

In [None]:
with open(json_fpath, 'w') as f:
    json.dump(json_dict, f, indent=4)
print(f"Saved parameters to\n{json_fpath}")

### Optional: View movie with the results
We can inspect the denoised results by reconstructing the movie and playing alongside the original data and the resulting (amplified) residual movie

In [None]:
play_movie = False
if play_movie:
    if "cnm2" in locals():
        cnm2.estimates.play_movie(images, q_max=99.9, gain_res=2,
                                          magnification=2,
                                          bpx=border_to_0,
                                          include_bck=False)
    else:
        cnm.estimates.play_movie(images, q_max=99.9, gain_res=2,
                                      magnification=2,
                                      bpx=border_to_0,
                                      include_bck=False)

The denoised movie can also be explicitly constructed using:

In [None]:
#%% reconstruct denoised movie
if "cnm2" in locals():
    denoised = cm.movie(cnm2.estimates.A.dot(cnm2.estimates.C) + \
                        cnm2.estimates.b.dot(cnm2.estimates.f)).reshape(dims + (-1,), order='F').transpose([2, 0, 1])
else:
    denoised = cm.movie(cnm.estimates.A.dot(cnm.estimates.C) + \
                        cnm.estimates.b.dot(cnm.estimates.f)).reshape(dims + (-1,), order='F').transpose([2, 0, 1])

In [None]:
save_denoised = False
if save_denoised:
    denoised.save(denoised_optional_fpath)
    print(f"Denoised movie saved to\n\t{denoised_optional_fpath}")

# Save moco parameters

In [None]:
# motion_correction.py (from caiman) 503-, 524 is the relevant case
Y = cm.load(moco_fnames[0]).astype(np.float32)
ymin = Y.min()
if ymin < 0:
    Y -= Y.min()

xy_grid = [(it[0], it[1]) for it in sliding_window(Y[0], mc.overlaps, mc.strides)]
dims_grid = tuple(np.max(np.stack(xy_grid, axis=1), axis=1) - np.min(
                    np.stack(xy_grid, axis=1), axis=1) + 1)
shifts_x = np.stack([np.reshape(_sh_, dims_grid, order='C').astype(
                    np.float32) for _sh_ in mc.x_shifts_els], axis=0)
shifts_y = np.stack([np.reshape(_sh_, dims_grid, order='C').astype(
                    np.float32) for _sh_ in mc.y_shifts_els], axis=0)

In [None]:
moco_params_lis = [
"max_shifts",
"niter_rig",
"splits_rig",
"num_splits_to_process_rig",
"num_splits_to_process_els",
"strides",
"overlaps",
"splits_els",
"upsample_factor_grid",
"max_deviation_rigid",
"shifts_opencv",
"min_mov",
"nonneg_movie",
"gSig_filt",
"use_cuda",
"border_nan",
"pw_rigid",
"var_name_hdf5",
"is3D",
"indices",
"total_template_rig",
"templates_rig",
"fname_tot_rig",
"shifts_rig",
"total_template_els",
"fname_tot_els",
"templates_els",
"x_shifts_els",
"y_shifts_els",
"coord_shifts_els",
"border_to_0",
"mmap_file",  # also fname_mmap_f
]

# min_mov, total_template_rig, total_template_els, border_to_0 have shapes

In [None]:
list_types = False
if list_types:
    for dset in moco_params_lis:
        data = getattr(mc, dset)
        print(f"{dset}: {type(data)}")
        try:
            print(f"\t{data.shape}")
        except:
            print(f"\tno shape")

In [None]:
# TODO: for compatibility with Pure Python Pipeline Splitting, add 
# moco_intervals, moco_flags, cnmf_flags. As there is no splitting, these are:
# moco_intervals = [(1, len(rnr.nd2_data))]
# moco_flags = [True]
# cnmf_flags = [True]
utf8_type = h5py.string_dtype('utf-8', 30)
def append_dataset(h5_file, name, data):
    if (type(data) is tuple and type(data[0]) is slice) \
    or \
    data is None \
    or \
    type(data) is str \
    or \
    (type(data) is list and (data[0] is None or type(data[0]) is str)):  
        # some entries (e.g. indices) are a tuple of slices
        # some entries are of type string, are None, [None, None, ...] or ["..."]
        # convert these types to string (easiest way to preserve information about format)
        #data_arr = np.array(, dtype=utf8_type)
        hf.attrs[name] = data.__str__().encode("utf-8")
    else:
        data_arr = np.array(data)
        dataset = h5_file.create_dataset(name, data_arr.shape, data_arr.dtype)
        if len(data_arr.shape) == 0:
            dataset = data_arr
        else:
            for i in range(data_arr.shape[0]):
                dataset[i] = data_arr[i]

In [None]:
with h5py.File(moco_pars_fpath, 'w') as hf:
    print("Adding uuid")
    hf.attrs["uuid"] = session_uuid
    print("Adding moco_intervals")
    append_dataset(hf, "moco_intervals", moco_intervals)
    print("Adding moco_flags")
    append_dataset(hf, "moco_flags", moco_flags)
    print("Adding cnmf_intervals")
    append_dataset(hf, "cnmf_intervals", cnmf_intervals)
    print("Adding cnmf_flags")
    append_dataset(hf, "cnmf_flags", cnmf_flags)
    print("Adding begin_end_frames")
    append_dataset(hf, "begin_end_frames", begin_end_frames)
    print("Saving moco params...")
    for dset_name in moco_params_lis:
        print("\t" + dset_name)
        data = getattr(mc, dset_name)
        append_dataset(hf, dset_name, data)
print(f"Saved listed parameters in\n\t{moco_pars_fpath}")

# (Optional) Match to LabView

In [None]:
match_to_lv = True

In [None]:
if "MATLAB_2P_FOLDER" in env_dict:
    matlab_2p_path = env_dict["MATLAB_2P_FOLDER"]

In [None]:
if match_to_lv:
    import labrotation.two_photon_session as tps
    nd2_meta_path = fh.open_file("Choose Nikon metadata file (.txt)!")
    labview_path = fh.open_file("Choose LabView file (xy.txt, NOT xytime.txt)! Press cancel if not available.")
    lfp_path = fh.open_file("Choose LFP file (.abf)! Press cancel if none available.")
    if lfp_path == ".":
        lfp_path = None
    if labview_path == ".":
        labview_path = None
        labview_timestamps_path = None
    else:
        labview_timestamps_path = labview_path[:-4] + "time.txt"
    export_fpath = os.path.join(export_folder, os.path.splitext(nd2_fname)[0] + "_session.h5")
    session = tps.TwoPhotonSession.init_and_process(nd2_fpath, nd2_meta_path, labview_path, labview_timestamps_path, lfp_path, matlab_2p_path)
    session.export_hdf5(export_fpath)

# Opening results (data fields and attributes)

In [None]:
with h5py.File(moco_pars_fpath, "r") as hf:
    for key in hf.attrs.keys():
        print(f"{key}:\n\t{hf.attrs[key]}")

In [None]:
with h5py.File(moco_pars_fpath, "r") as hf:
    for key in hf.keys():
        print(f"{key}:\n\t{hf[key]}")

In [None]:
rnr.rnr_data[0].shape

In [None]:
cnm2.estimates.discarded_components.idx_components_bad

In [None]:
cnm.estimates.idx_components_bad