# Check spykingcircus2

note: Buccino's spikeinterface version is older 0.9.9 while ours is 0.96.1, we updated the code.

Ground-truth comparison and ensemble sorting of a synthetic Neuropixels recording
This notebook reproduces figures 2 and 3 from the paper SpikeInterface, a unified framework for spike sorting.

The data set for this notebook is available on the Dandi Archive: https://gui.dandiarchive.org/#/dandiset/000034.

The entire data archive can be downloaded with the command dandi download https://gui.dandiarchive.org/#/dandiset/000034/draft (about 75GB).

The data file required to run the code is:

the raw data: sub-MEAREC-250neuron-Neuropixels_ecephys.nwb
This file should be in the same directory where the notebook is located (otherwise adjust paths below).

Author: Matthias Hennig, University of Edinburgh, 22 Aug 2020

Requirements
For this need you will need the following Python packages:

* numpy
* pandas
* matplotlib
* seaborn
* spikeinterface
* dandi
* matplotlib-venn
To run the MATLAB-based sorters, you would also need a MATLAB license. For other sorters, please refer to the documentation on how to [install sorters](https://spikeinterface.readthedocs.io/en/latest/install_sorters.html).

1. Activate spack environment:

```bash
cd /gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/
module load spack
. /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/spack/share/spack/setup-env.sh
spack env activate spack_env -p
spack load python@3.9.7
```

2. Create and activate buccino_env:

```bash
python3.9 -m venv buccino_env
source buccino_env/bin/activate
pip3.9 install requirements_buccino.txt
```

3. Download `sub-MEAREC-250neuron-Neuropixels_ecephys.nwb` file (28 GB):

```bash
dandi download https://api.dandiarchive.org/api/assets/6d94dcf4-0b38-4323-8250-04fdc7039a66/download/
```


TODO:
- Access non-python sorters (for now only python ones)

In [70]:
import os

# Matlab sorter paths:
# change these to match your environment
os.environ["IRONCLUST_PATH"] = "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/sorters_package/ironclust/"
os.environ["KILOSORT2_PATH"] = "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/sorters_package/Kilosort/"
os.environ["HDSORT_PATH"] = "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/sorters_package/HDsort/"

import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pandas as pd
import seaborn as sns
from collections import defaultdict
from matplotlib_venn import venn3

import spikeinterface as si
import spikeinterface.full as si_full
import spikeinterface.extractors as se
import spikeinterface.sorters as ss
import spikeinterface.comparison as sc
from spikeinterface.comparison import GroundTruthStudy
import spikeinterface.widgets as sw

%matplotlib inline

def clear_axes(ax):
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

# print version information
# si.print_spikeinterface_version()
si.sorters.installed_sorters()
si.sorters.print_sorter_versions()

MS_BEFORE = 3
MS_AFTER = 3

RUNNING SHELL SCRIPT: /tmp/tmp_shellscript_okxa1yg/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscript5t5ahx4s/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptaefkjqg6/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscript83hiqw1g/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptpj7dkfsw/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptylenu126/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptbr2h5i9v/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscript3sqhrcon/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptueig9mtz/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscript12bf9of7/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptj8f9t3nw/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptocjwchlu/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptqhmnzqdk/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptsucakvs3/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptnoz1vezt/script.sh
RUNNING SHELL SCRIPT: /tmp/tmp_shellscriptxtgz3jhp/script.sh
herdingspikes: 0.3.102
s

### Set up ground truth study an run all sorters

In [41]:
# WARNING !! (takes 50ish min the first time !)
study_path = Path('/gpfs/bbp.cscs.ch/project/proj68/scratch/laquitai/raw/')
data_path = Path('/gpfs/bbp.cscs.ch/project/proj68/scratch/laquitai/raw/')
study_folder = study_path / 'study_mearec_250cells_Neuropixels-384chans_duration600s_noise10uV_2020-02-28/'

# the original data
# this NWB file contains both the ground truth spikes and the raw data
data_filename = data_path / 'sub-MEAREC-250neuron-Neuropixels_ecephys.nwb'
SX_gt = se.NwbSortingExtractor(str(data_filename))
RX = se.NwbRecordingExtractor(str(data_filename))

# bandpass (note: This is an update to the original code to enable waveform extraction)
RX = si_full.bandpass_filter(RX, freq_min=300, freq_max=6000)

# (slowest piece)
if not os.path.isdir(study_folder):
    gt_dict = {'rec0' : (RX, SX_gt) }
    study = GroundTruthStudy.create(study_folder, gt_dict)
else:
    study = GroundTruthStudy(study_folder)

# get Waveform extractor
WaveformExtractor = study.get_waveform_extractor(RX)
WaveformExtractor.set_params(ms_before=MS_BEFORE, ms_after=MS_AFTER)
WaveformExtractor.run_extract_waveforms()

  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."


In [69]:
# sorting (10 min)
sorter_list = ['herdingspikes', 'spykingcircus']
sorter_names = ['HerdingSpikes', 'SpykingCircus']
sorter_names_short = ['HS', 'SC']

study.run_sorters(sorter_list, mode_if_folder_exists='keep', remove_sorter_folders=False, engine='loop', verbose=True)
study.copy_sortings()

# Generating new position and neighbor files from data file
# Not Masking any Channels
# Sampling rate: 32000
# Localization On
# Number of recorded channels: 384
# Analysing frames: 19200000; Seconds: 600.0
# Frames before spike in cutout: 10
# Frames after spike in cutout: 58
# tcuts: 42 90
# tInc: 100000
# Detection completed, time taken: 0:09:08.050980
# Time per frame: 0:00:00.028544
# Time per sample: 0:00:00.000074
Loaded 661977 spikes.
Fitting dimensionality reduction using all spikes...
...projecting...
...done
Clustering...
Clustering 661977 spikes...
number of seeds: 3882
seeds/job: 1942
using 2 cpus


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


Error running herdingspikes
RUNNING SHELL SCRIPT: /gpfs/bbp.cscs.ch/project/proj68/scratch/laquitai/raw/study_mearec_250cells_Neuropixels-384chans_duration600s_noise10uV_2020-02-28/sorter_folders/rec0/spykingcircus/run_spykingcircus.sh
Traceback (most recent call last):

  File "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/buccino_env/bin/spyking-circus", line 5, in <module>

    from circus.scripts.launch import main

  File "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/buccino_env/lib/python3.9/site-packages/circus/scripts/launch.py", line 31, in <module>

    from circus.shared.files import data_stats

  File "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/buccino_env/lib/python3.9/site-packages/circus/shared/files.py", line 3, in <module>

    from circus.shared.utils import get_tqdm_progressbar

  File "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/spike-sorting/buccino_env/lib/python3.9/site-packages/circus/shared/utils.py", line 20,

In [43]:
# compute or load SNR for the ground truth units
snr_file = study_folder / 'snr.npy'
if os.path.isfile(snr_file):
    snr = np.load(snr_file, allow_pickle=True)
else:
    print('computing snr')
    ## note this is quite slow for a NWB file as the data is arranged as channels:time
    ## it is faster to first write out a binary file in time:channels order
    # snr = st.validation.compute_snrs(SX_gt, RX, apply_filter=False, verbose=False, 
    #                                  memmap=True, max_spikes_per_unit_for_snr=500)
    snr = si.qualitymetrics.misc_metrics.compute_snrs(WaveformExtractor)
    np.save(snr_file, snr)

In [65]:
# WaveformExtractor = study.get_waveform_extractor(RX, sorter_name=["HerdingSpikes"])
study.run_comparisons(exhaustive_gt=True, match_score=0.1)
for (rec_name, sorter_name), comp in study.comparisons.items():
    print('*' * 10)
    print(rec_name, sorter_name)
    print(comp.count_score)  # raw counting of tp/fp/...
    comp.print_summary()
    perf_unit = comp.get_performance(method='by_unit')
    perf_avg = comp.get_performance(method='pooled_with_average')
    m = comp.get_confusion_matrix()
    w_comp = sw.plot_agreement_matrix(comp)
    w_comp.ax.set_title(rec_name  + ' - ' + sorter_name)

### Run the ground truth comparison and summarise the results

In [44]:
comparisons = study.comparisons
dataframes = study.aggregate_dataframes()

ValueError: No objects to concatenate

## References

(1) https://spikeinterface.github.io/blog/ground-truth-comparison-and-ensemble-sorting-of-a-synthetic-neuropixels-recording/