## NWB-Datajoint tutorial 1

**Note: make a copy of this notebook and run the copy to avoid git conflicts in the future**

This is the first in a multi-part tutorial on the NWB-Datajoint pipeline used in Loren Frank's lab, UCSF. It demonstrates how to run spike sorting within the pipeline.

If you have not done [tutorial 0](0_intro.ipynb) yet, make sure to do so before proceeding.

Let's start by importing the `nwb_datajoint` package, along with a few others. 

In [1]:
from pathlib import Path
import os
import numpy as np

import nwb_datajoint as nd

import warnings
warnings.simplefilter('ignore')

# Comment these if you have already set these environment variables
# data_dir = Path('/stelmo/nwb') # CHANGE ME TO THE BASE DIRECTORY FOR DATA STORAGE ON YOUR SYSTEM
# os.environ['DJ_SUPPORT_FILEPATH_MANAGEMENT'] = 'TRUE'
# os.environ['KACHERY_P2P_API_PORT'] = '14747'
# os.environ['NWB_DATAJOINT_BASE_DIR'] = str(data_dir)
# os.environ['KACHERY_STORAGE_DIR'] = str(data_dir / 'kachery-storage')
# os.environ['SPIKE_SORTING_STORAGE_DIR'] = str(data_dir / 'spikesorting')

Connecting jhbak@lmf-db.cin.ucsf.edu:3306


In [3]:
data_dir = Path(os.environ['NWB_DATAJOINT_BASE_DIR'])
os.environ['SPIKE_SORTING_STORAGE_DIR'] = str(data_dir / 'spikesorting')

# port required for spike sorting
# set by kachery_p2p daemon
# os.environ['KACHERY_P2P_API_PORT']
os.environ['KACHERY_P2P_API_PORT'] = '20431'
os.environ['KACHERY_P2P_CONFIG_DIR'] = '/home/jhbak/.kachery-p2p'

In [4]:
# We also import a bunch of tables so that we can call them easily
from nwb_datajoint.common import RawPosition, HeadDir, Speed, LinPos, StateScriptFile, VideoFile,\
                                 DataAcquisitionDevice, CameraDevice, Probe,\
                                 DIOEvents,\
                                 ElectrodeGroup, Electrode, Raw, SampleCount,\
                                 LFPSelection, LFP, LFPBandSelection, LFPBand,\
                                 SortGroup, SpikeSorting, SpikeSorter, SpikeSorterParameters, SpikeSortingWaveformParameters, SpikeSortingParameters, SpikeSortingMetrics, CuratedSpikeSorting,\
                                 FirFilter,\
                                 IntervalList, SortInterval,\
                                 Lab, LabMember, Institution,\
                                 BrainRegion,\
                                 SensorData,\
                                 Session, ExperimenterList,\
                                 Subject,\
                                 Task, TaskEpoch,\
                                 Nwbfile, AnalysisNwbfile, NwbfileKachery, AnalysisNwbfileKachery

In this tutorial, we will continue to work with the copy of `beans20190718.nwb` that you created in tutorial 0. If you deleted it from `Session`, make sure to re-insert before proceeding.

In [5]:
# Define the name of the file that you copied and renamed; make sure it's something unique. 
# nwb_file_name = 'beans20190718.nwb'
# nwb_file_name = 'beans20190718_jhbak.nwb'
nwb_file_name = 'beans20190718_jhbak2.nwb'
filename, file_extension = os.path.splitext(nwb_file_name)
# This is a copy of the original nwb file, except it doesn't contain the raw data (for storage reasons)
nwb_file_name2 = filename + '_' + file_extension

In [6]:
# Run if you need to reinsert the data
# nd.insert_sessions(nwb_file_name)

In [7]:
Session()

nwb_file_name  name of the NWB file,subject_id,institution_name,lab_name,session_id,session_description,session_start_time,timestamps_reference_time,experiment_description
beans20190718_.nwb,Beans,"University of California, San Francisco",Loren Frank,beans_01,Reinforcement leaarning,2019-07-18 15:29:47,1970-01-01 00:00:00,Reinforcement learning
beans20190718_jhbak2_.nwb,Beans,"University of California, San Francisco",Loren Frank,beans_01,Reinforcement leaarning,2019-07-18 15:29:47,1970-01-01 00:00:00,Reinforcement learning
beans20190718_jhbak_.nwb,Beans,"University of California, San Francisco",Loren Frank,beans_01,Reinforcement leaarning,2019-07-18 15:29:47,1970-01-01 00:00:00,Reinforcement learning
despereaux20191125_.nwb,Despereaux,"University of California, San Francisco",Loren Frank,4,Sungod,2019-11-25 10:17:29,1970-01-01 00:00:00,Sungod control
despereaux20191125_2_.nwb,Despereaux,"University of California, San Francisco",Loren Frank,4,Sungod,2019-11-25 10:17:29,1970-01-01 00:00:00,Sungod control
peanut20201117_.nwb,peanut,"University of California, San Francisco",Loren Frank,peanut_20201117,spatial alternation memory task,2020-11-17 08:50:38,1970-01-01 00:00:00,spatial alternation memory task


### Spike sorting

In general, running spike sorting means making decisions about the following:
1. which eletrodes to sort together (e.g. electrodes that form a tetrode should be sorted together, but tetrodes that are far apart need not be);
2. which time interval to sort (e.g. there may a long period in the recording where nothing happens, and we might want to exclude that);
3. which spike sorter to use (e.g. Mountainsort? Kilosort? IronClust?);
4. given choice of the spike sorter in 3, which parameter set to use.

In our Datajoint framework, everything that we do is an interaction with a table. This is true for spike sorting as well - i.e. we think of spike sorting as a process where we enter parameters of spike sorting (i.e. our decisions about the four questions above) into tables, and use that information to populate another table that will hold the result of spike sorting. Under the hood, we use a number of packages, notably `spikeinterface`. But the user need not know this - they just have to interact with the table. This makes spike sorting straightforward. In addition, the entries in these tables serve as a record of exactly which decisions you made.

#### Define sort group
We start with the first question: which electrodes do we want to sort together? We first inspect the `Electrode` table.

In [8]:
Electrode & {'nwb_file_name': nwb_file_name2}

nwb_file_name  name of the NWB file,electrode_group_name  electrode group name from NWBFile,electrode_id  the unique number for this electrode,probe_type,probe_shank  shank number within probe,probe_electrode  electrode,region_id,name  unique label for each contact,original_reference_electrode  the configured reference electrode for this electrode,x  the x coordinate of the electrode position in the brain,y  the y coordinate of the electrode position in the brain,z  the z coordinate of the electrode position in the brain,filtering  description of the signal filtering,impedance  electrode impedance,bad_channel  if electrode is 'good' or 'bad' as observed during recording,x_warped  x coordinate of electrode position warped to common template brain,y_warped  y coordinate of electrode position warped to common template brain,z_warped  z coordinate of electrode position warped to common template brain,contacts  label of electrode contacts used for a bipolar signal -- current workaround
beans20190718_jhbak2_.nwb,0,0,128c-4s8mm6cm-20um-40um-sl,0,0,1,0,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,1,128c-4s8mm6cm-20um-40um-sl,0,1,1,1,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,3,128c-4s8mm6cm-20um-40um-sl,0,3,1,3,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,4,128c-4s8mm6cm-20um-40um-sl,0,4,1,4,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,5,128c-4s8mm6cm-20um-40um-sl,0,5,1,5,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,7,128c-4s8mm6cm-20um-40um-sl,0,7,1,7,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,8,128c-4s8mm6cm-20um-40um-sl,0,8,1,8,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,9,128c-4s8mm6cm-20um-40um-sl,0,9,1,9,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,11,128c-4s8mm6cm-20um-40um-sl,0,11,1,11,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_jhbak2_.nwb,0,12,128c-4s8mm6cm-20um-40um-sl,0,12,1,12,-1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,


This recording was done with polymer probes. Here `electrode_group_name` refers to a probe. We can see that there were two probes, `0` and `1`.

In [9]:
# get unique probe id
np.unique((Electrode & {'nwb_file_name': nwb_file_name2}
          ).fetch('electrode_group_name')
         )

array(['0', '1'], dtype=object)

Each probe has four shanks, as you can see:

In [10]:
# get unique shank id for the first probe
np.unique((Electrode & {'nwb_file_name': nwb_file_name2, 'electrode_group_name': 0}
          ).fetch('probe_shank')
         )

array([0, 1, 2, 3])

Our job is to identify the electrodes that we want to sort together, and add them as a sort group in the `SortGroup` table. One natural way to do this is to set each shank as a sort group (for tetrode recordings, each tetrode can be thought of as a "shank" with four electrodes). Use `set_group_by_shank` method for this:

In [11]:
SortGroup().set_group_by_shank(nwb_file_name2)

About to delete:
`common_spikesorting`.`spike_sorting_parameters`: 1 items
`common_spikesorting`.`sort_group__sort_group_electrode`: 200 items
`common_spikesorting`.`sort_group`: 9 items


Proceed? [yes, No]:  yes


Committed.


This generates 8 sort groups, one for each of the four shanks in the two probes.

In [12]:
SortGroup & {'nwb_file_name': nwb_file_name2}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,"sort_reference_electrode_id  the electrode to use for reference. -1: no reference, -2: common median"
beans20190718_jhbak2_.nwb,0,-1
beans20190718_jhbak2_.nwb,1,-1
beans20190718_jhbak2_.nwb,2,-1
beans20190718_jhbak2_.nwb,3,-1
beans20190718_jhbak2_.nwb,4,-1
beans20190718_jhbak2_.nwb,5,-1
beans20190718_jhbak2_.nwb,6,-1
beans20190718_jhbak2_.nwb,7,-1


`SortGroup` has a *parts table* called `SortGroupElectrode` - think of this as child table that contains information auxiliary to the parent table. As you can see, it contains two extra attributes: `electrode_group_name` and `electrode_id`.

In [13]:
SortGroup.SortGroupElectrode & {'nwb_file_name': nwb_file_name2}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,electrode_group_name  electrode group name from NWBFile,electrode_id  the unique number for this electrode
beans20190718_jhbak2_.nwb,0,0,0
beans20190718_jhbak2_.nwb,0,0,1
beans20190718_jhbak2_.nwb,0,0,3
beans20190718_jhbak2_.nwb,0,0,4
beans20190718_jhbak2_.nwb,0,0,5
beans20190718_jhbak2_.nwb,0,0,7
beans20190718_jhbak2_.nwb,0,0,8
beans20190718_jhbak2_.nwb,0,0,9
beans20190718_jhbak2_.nwb,0,0,11
beans20190718_jhbak2_.nwb,0,0,12


What if you don't want to sort by shank? Maybe you want to select specific electrodes across shanks and sort them. To do so, you just have to manually `insert` a new entry into the `SortGroup` and `SortGroupElectrode` tables. 

In [14]:
sort_group_id = 8

In [15]:
# First we make a new entry in the SortGroup table, and give it sort_group_id of 8
SortGroup.insert1({'nwb_file_name': nwb_file_name2, 
                   'sort_group_id': sort_group_id, 
                   'sort_reference_electrode_id': -1}, 
                  skip_duplicates = True)

# # Next, we will associate with the sort group 
# # that we just created every fourth electrode of the first shank
# SortGroup.SortGroupElectrode.insert(
#     [[nwb_file_name2, 8, 0, elec] for elec in range(0,32,4)], 
#     skip_duplicates = True)

In [16]:
SortGroup & {'nwb_file_name': nwb_file_name2}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,"sort_reference_electrode_id  the electrode to use for reference. -1: no reference, -2: common median"
beans20190718_jhbak2_.nwb,0,-1
beans20190718_jhbak2_.nwb,1,-1
beans20190718_jhbak2_.nwb,2,-1
beans20190718_jhbak2_.nwb,3,-1
beans20190718_jhbak2_.nwb,4,-1
beans20190718_jhbak2_.nwb,5,-1
beans20190718_jhbak2_.nwb,6,-1
beans20190718_jhbak2_.nwb,7,-1
beans20190718_jhbak2_.nwb,8,-1


In [17]:
# Next, we will associate with the sort group 
# that we just created every fourth electrode of the first shank
SortGroup.SortGroupElectrode.insert(
    [[nwb_file_name2, 8, 0, elec] for elec in range(0,32,4)], 
    skip_duplicates = True)

Note that `insert` is a method, just like `fetch`. You can insert an entry in the form of a dictionary or a list in the order of the attributes. We can look at the new entries we just made.

In [18]:
SortGroup & {'nwb_file_name' : nwb_file_name2, 'sort_group_id' : sort_group_id}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,"sort_reference_electrode_id  the electrode to use for reference. -1: no reference, -2: common median"
beans20190718_jhbak2_.nwb,8,-1


In [19]:
SortGroup.SortGroupElectrode & {'nwb_file_name': nwb_file_name2, 
                                'sort_group_id': sort_group_id}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,electrode_group_name  electrode group name from NWBFile,electrode_id  the unique number for this electrode
beans20190718_jhbak2_.nwb,8,0,0
beans20190718_jhbak2_.nwb,8,0,4
beans20190718_jhbak2_.nwb,8,0,8
beans20190718_jhbak2_.nwb,8,0,12
beans20190718_jhbak2_.nwb,8,0,16
beans20190718_jhbak2_.nwb,8,0,20
beans20190718_jhbak2_.nwb,8,0,24
beans20190718_jhbak2_.nwb,8,0,28


#### Define sort interval
Next, we make a decision about the time interval for our spike sorting. Let's re-examine `IntervalList`.

In [20]:
IntervalList & {'nwb_file_name' : nwb_file_name2}

nwb_file_name  name of the NWB file,interval_list_name  descriptive name of this interval list,valid_times  numpy array with start and end times for each interval
beans20190718_jhbak2_.nwb,01_s1,=BLOB=
beans20190718_jhbak2_.nwb,02_r1,=BLOB=
beans20190718_jhbak2_.nwb,03_s2,=BLOB=
beans20190718_jhbak2_.nwb,04_r2,=BLOB=
beans20190718_jhbak2_.nwb,pos 0 valid times,=BLOB=
beans20190718_jhbak2_.nwb,pos 1 valid times,=BLOB=
beans20190718_jhbak2_.nwb,pos 2 valid times,=BLOB=
beans20190718_jhbak2_.nwb,pos 3 valid times,=BLOB=
beans20190718_jhbak2_.nwb,raw data valid times,=BLOB=


For our example, let's just decide the first 10 seconds of the first run interval (`02_r1`) as our sort interval. To do so, we first fetch `valid_times` of this interval, define our new sort interval, and add this to the `SortInterval` table.

In [21]:
interval_list_name = '02_r1'

In [22]:
interval = (IntervalList & {'nwb_file_name' : nwb_file_name2,
                            'interval_list_name' : interval_list_name}).fetch1('valid_times')
print(interval)

[[1.56349063e+09 1.56349340e+09]]


In [23]:
sort_interval = np.asarray([interval[0][0]+10, interval[0][0]+20])
print(sort_interval)

[1.56349064e+09 1.56349065e+09]


In [24]:
# Check out SortInterval
SortInterval & {'nwb_file_name' : nwb_file_name2}

nwb_file_name  name of the NWB file,sort_interval_name  name for this interval,sort_interval  1D numpy array with start and end time for a single interval to be used for spike sorting
beans20190718_jhbak2_.nwb,beans_02_r1_10s,=BLOB=


In [25]:
sort_interval_name = 'beans_02_r1_10s'

In [26]:
# Specify the required attributes
SortInterval.insert1({'nwb_file_name' : nwb_file_name2,
                      'sort_interval_name' : sort_interval_name,
                      'sort_interval' : sort_interval}, skip_duplicates=True)

In [27]:
# See results
SortInterval & {'nwb_file_name' : nwb_file_name2}

nwb_file_name  name of the NWB file,sort_interval_name  name for this interval,sort_interval  1D numpy array with start and end time for a single interval to be used for spike sorting
beans20190718_jhbak2_.nwb,beans_02_r1_10s,=BLOB=


In [28]:
# (SortInterval & {'nwb_file_name': nwb_file_name2}).fetch('sort_interval')
(SortInterval & {'nwb_file_name': nwb_file_name2}).fetch1('sort_interval')

array([1.56349064e+09, 1.56349065e+09])

#### Define sorter
Next we decide which spike sorter to use. This boils down to looking at the `SpikeSorter` table and choosing the one we like. Initially, `SpikeSorter` may not be populated; in that case, we insert some sorters to it by checking which ones are available via `spikeinterface`, the package that we will be using implicitly for spike sorting.

In [29]:
SpikeSorter().insert_from_spikeinterface()

In [30]:
SpikeSorter()

sorter_name  the name of the spike sorting algorithm
combinato
hdsort
herdingspikes
ironclust
kilosort
kilosort2
kilosort2_5
kilosort3
klusta
mountainsort4


For our example, we will be using `mountainsort4`.

In [31]:
sorter_name = 'mountainsort4'

#### Define sorter parameters
Once we have decided on a spike sorter, we have to set parameters. Some of these parameters are common to all sorters (e.g. frequency band to filter the raw data before sorting begins) but most are specific to the sorter that we chose. Again, we populate `SpikeSorterParameters` table with some default parameters for each sorter, and then we add our version as a new entry.

In [32]:
SpikeSorterParameters().insert_from_spikeinterface()

In [33]:
SpikeSorterParameters()

sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,parameter_dict  dictionary of parameter names and values,frequency_min  high pass filter value,frequency_max  low pass filter value,filter_width  the number of coefficients in the filter,filter_chunk_size  the size of the chunk for the filtering
combinato,default,=BLOB=,300,6000,1000,30000
hdsort,default,=BLOB=,300,6000,1000,30000
herdingspikes,default,=BLOB=,300,6000,1000,30000
ironclust,default,=BLOB=,300,6000,1000,30000
kilosort,default,=BLOB=,300,6000,1000,30000
kilosort2,default,=BLOB=,300,6000,1000,30000
kilosort2_5,default,=BLOB=,300,6000,1000,30000
kilosort3,default,=BLOB=,300,6000,1000,30000
klusta,default,=BLOB=,300,6000,1000,30000
mountainsort4,default,=BLOB=,300,6000,1000,30000


Define a new set of spike sorter parameters from default and add to table.

In [34]:
# Let's look at the default params
ms4_default_params = (SpikeSorterParameters & {'sorter_name': sorter_name,
                                               'parameter_set_name' : 'default'}).fetch1()
print(ms4_default_params)

{'sorter_name': 'mountainsort4', 'parameter_set_name': 'default', 'parameter_dict': {'detect_sign': -1, 'adjacency_radius': -1, 'freq_min': 300, 'freq_max': 6000, 'filter': True, 'whiten': True, 'curation': False, 'num_workers': None, 'clip_size': 50, 'detect_threshold': 3, 'detect_interval': 10, 'noise_overlap_threshold': 0.15}, 'frequency_min': 300, 'frequency_max': 6000, 'filter_width': 1000, 'filter_chunk_size': 30000}


In [35]:
# Change the default params
param_dict = ms4_default_params['parameter_dict']
# We will just sort electrodes one by one
param_dict['adjacency_radius'] = 0
param_dict['curation'] = False
# Turn filter off since we will filter it prior to starting sort
param_dict['filter'] = False
# set num_workers to be the same number as the number of electrodes
param_dict['num_workers'] = len((SortGroup.SortGroupElectrode & {'sort_group_id':sort_group_id}).fetch('electrode_id'))
param_dict['verbose'] = True
# set clip size as number of samples for 2 milliseconds
param_dict['clip_size'] = np.int(2e-3 * (Raw & {'nwb_file_name' : nwb_file_name2}).fetch1('sampling_rate'))
param_dict['noise_overlap_threshold'] = 0

In [36]:
# Give a unique name here
# parameter_set_name = 'test'
# parameter_set_name = 'test3'
parameter_set_name = 'test4'

In [37]:
# Insert
SpikeSorterParameters.insert1({'sorter_name' : sorter_name,
                               'parameter_set_name' : parameter_set_name,
                               'parameter_dict' : param_dict}, skip_duplicates = True)

In [38]:
# Check that insert was successful
SpikeSorterParameters & {'sorter_name' : sorter_name, 'parameter_set_name' : parameter_set_name}

sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,parameter_dict  dictionary of parameter names and values,frequency_min  high pass filter value,frequency_max  low pass filter value,filter_width  the number of coefficients in the filter,filter_chunk_size  the size of the chunk for the filtering
mountainsort4,test4,=BLOB=,300,6000,1000,30000


In [39]:
SpikeSorterParameters & {'sorter_name': sorter_name}

sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,parameter_dict  dictionary of parameter names and values,frequency_min  high pass filter value,frequency_max  low pass filter value,filter_width  the number of coefficients in the filter,filter_chunk_size  the size of the chunk for the filtering
mountainsort4,default,=BLOB=,300,6000,1000,30000
mountainsort4,test,=BLOB=,300,6000,1000,30000
mountainsort4,test2,=BLOB=,300,6000,1000,30000
mountainsort4,test3,=BLOB=,300,6000,1000,30000
mountainsort4,test4,=BLOB=,300,6000,1000,30000
mountainsort4,tetrode,=BLOB=,300,6000,1000,30000
mountainsort4,tetrode2,=BLOB=,300,6000,1000,30000


#### Define qualtiy metric parameters

We're almost done. There are more parameters related to how to compute the quality metrics for curation. We just use the default options here. 

In [40]:
# we'll use `test`
SpikeSortingMetrics()

cluster_metrics_list_name  the name for this list of cluster metrics,metrics_dict  a dict of SpikeInterface metrics with True / False elements to indicate whether a given metric should be computed.,isi_threshold  Interspike interval threshold in s for ISI metric (default 0.003),snr_mode  SNR mode: median absolute deviation ('mad) or standard deviation ('std') (default 'mad'),snr_noise_duration  length of data to use for noise estimation (default 10.0),max_spikes_per_unit_for_snr  Maximum number of spikes to compute templates for SNR from (default 1000),template_mode  Use 'mean' or 'median' to compute templates,"max_channel_peak  direction of the maximum channel peak: 'both', 'neg', or 'pos' (default 'both')",max_spikes_per_unit_for_noise_overlap  Maximum number of spikes to compute templates for noise overlap from (default 1000),noise_overlap_num_features  Number of features to use for PCA for noise overlap,noise_overlap_num_knn  Number of nearest neighbors for noise overlap,drift_metrics_interval_s  length of period in s for evaluating drift (default 60 s),drift_metrics_min_spikes_per_interval  minimum number of spikes in an interval for evaluation of drift (default 10),max_spikes_for_silhouette  Max spikes to be used for silhouette metric,num_channels_to_compare  number of channels to be used for the PC extraction and comparison (default 7),max_spikes_per_cluster  Max spikes to be used from each unit,max_spikes_for_nn  Max spikes to be used for nearest-neighbors calculation,n_neighbors  number of nearest clusters to use for nearest neighbor calculation (default 4),n_jobs  Number of parallel jobs (default 96),"memmap  If True, waveforms are saved as memmap object (recommended for long recordings with many channels)",max_spikes_per_unit  Max spikes to use for computing waveform,seed  Random seed for reproducibility,"verbose  If nonzero (True), will be verbose in metric computation"
default,=BLOB=,0.003,mad,10.0,500,mean,both,500,5,1,60.0,10,500,7,500,500,1,64,0,2000,47,1
franklab_default,=BLOB=,0.003,mad,10.0,1000,mean,both,1000,5,1000,60.0,10,1000,7,1000,1000,4,96,0,2000,47,1
test,=BLOB=,0.003,mad,10.0,1000,mean,both,100,3,10,60.0,10,1000,7,1000,1000,4,1,0,2000,47,1


In [41]:
cluster_metrics_list_name = 'test'
# cluster_metrics_list_name = 'test4'

#### Bringing everything together

We now collect all the decisions we made up to here and put it into `SpikeSortingParameters` table (note: this is different from spike sor*ter* parameters defined above).

In [42]:
# collect the params
key = dict()
key['nwb_file_name'] = nwb_file_name2
key['sort_group_id'] = sort_group_id
key['sort_interval_name'] = sort_interval_name
key['interval_list_name'] = interval_list_name
key['sorter_name'] = sorter_name
key['parameter_set_name'] = parameter_set_name
key['cluster_metrics_list_name'] = cluster_metrics_list_name

In [43]:
key

{'nwb_file_name': 'beans20190718_jhbak2_.nwb',
 'sort_group_id': 8,
 'sort_interval_name': 'beans_02_r1_10s',
 'interval_list_name': '02_r1',
 'sorter_name': 'mountainsort4',
 'parameter_set_name': 'test4',
 'cluster_metrics_list_name': 'test'}

In [44]:
SpikeSortingParameters()

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,sort_interval_name  name for this interval,cluster_metrics_list_name  the name for this list of cluster metrics,interval_list_name  descriptive name of this interval list,import_path  optional path to previous curated sorting output
beans20190718_jhbak_.nwb,8,mountainsort4,test3,beans_02_r1_10s,test,02_r1,
despereaux20191125_.nwb,0,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,1,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,2,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,3,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,4,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,5,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,6,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,7,mountainsort4,tetrode,02_r1,default,02_r1,
despereaux20191125_.nwb,8,mountainsort4,tetrode,02_r1,default,02_r1,


In [45]:
# insert
SpikeSortingParameters.insert1(key, skip_duplicates = True)

In [46]:
# inspect
SpikeSortingParameters & {'nwb_file_name' : nwb_file_name2}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,sort_interval_name  name for this interval,cluster_metrics_list_name  the name for this list of cluster metrics,interval_list_name  descriptive name of this interval list,import_path  optional path to previous curated sorting output
beans20190718_jhbak2_.nwb,8,mountainsort4,test4,beans_02_r1_10s,test,02_r1,


#### Running spike sorting
Now we can run spike sorting. As we said it's nothing more than populating another table (`SpikeSorting`) from the entries of `SpikeSortingParameters`.

In [47]:
(SpikeSortingParameters & {'nwb_file_name' : nwb_file_name2}).proj()

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,sort_interval_name  name for this interval
beans20190718_jhbak2_.nwb,8,mountainsort4,test4,beans_02_r1_10s


In [48]:
[(SpikeSortingParameters & {'nwb_file_name' : nwb_file_name2}).proj()]

[*nwb_file_name *sort_group_id *sorter_name   *parameter_set *sort_interval
 +------------+ +------------+ +------------+ +------------+ +------------+
 beans20190718_ 8              mountainsort4  test4          beans_02_r1_10
  (Total: 1)]

In [49]:
# Specify entry (otherwise runs everything in SpikeSortingParameters); 
# `proj` gives you primary key
SpikeSorting.populate(
    [(SpikeSortingParameters & {'nwb_file_name' : nwb_file_name2}).proj()]
)

Getting ready...
Writing new NWB file beans20190718_jhbak2_000003.nwb

Running spike sorting on {'nwb_file_name': 'beans20190718_jhbak2_.nwb', 'sort_group_id': 8, 'sorter_name': 'mountainsort4', 'parameter_set_name': 'test4', 'sort_interval_name': 'beans_02_r1_10s', 'analysis_file_name': 'beans20190718_jhbak2_000003.nwb'}...
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
Using 24 workers.
Using tmpdir: /tmp/tmpcpq6jkbf
Num. workers = 24
Preparing /tmp/tmpcpq6jkbf/timeseries.hdf5...
'end_frame' set to 200000
'end_frame' set to 200000
Preparing neighborhood sorters (M=8, N=200000)...
Preparing output...
Done with ms4alg.
Cleaning tmpdir::::: /tmp/tmpcpq6jkbf
mountainsort4 run time 32.11s


Extracting waveforms in chunks:   0%|          | 0/1 [00:00<?, ?it/s]


Computing quality metrics...
Computing waveforms
Number of chunks: 1 - Number of jobs: 1
'end_frame' set to 200000


Extracting waveforms in chunks: 100%|##########| 1/1 [00:02<00:00,  2.75s/it]


Fitting PCA of 3 dimensions on 3305 waveforms
Projecting waveforms on PC
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
'end_frame' set to 200000
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Elapsed time for compute metrics: 44.264819383621216 sec

Saving sorting results...
Adding metric noise_overlap : [0.14900000000000002, 0.019512195121951237, 0.020312499999999956, 0.26249999999999996, 0.5045, 0.25149999999999995, 0.258, 0.20550000000000002, 0.5115000000000001, 0.29100000000000004]
Adding metric nn_hit_rate : [0.7384615384615385, 0.7160493827160493, 0.6354166666666666, 0.6128205128205129, 0.27450980392156865, 0.46228710462287104, 0.3879781420765027, 0.6491885143570537, 0.25, 0.3601190476190476]

Done - entry inserted to table.


In [50]:
SpikeSorting & {'nwb_file_name' : nwb_file_name2}

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,sort_interval_name  name for this interval,analysis_file_name  name of the file,units_object_id  the object ID for the units for this sort group,time_of_sort  This is when the sort was done.,curation_feed_uri  {curation_workspace_name} Name of labbox-ephys workspace for curation
beans20190718_jhbak2_.nwb,8,mountainsort4,test4,beans_02_r1_10s,beans20190718_jhbak2_000003.nwb,88e54c5e-81cf-41a1-ae84-2b1692f8cb07,1616198868,
