# Data processing


author: steeve.laquitaine@epfl.ch  
date: 2023.08.29  
last modified: 2023.08.29
status: OK  
display-status: OK  
regression: None  
duration: 3 hours (first time)


## Setup

create and activate env from `npx_10m_384ch_unit_classes.txt`

In [1]:
# listen to changes
%load_ext autoreload
%autoreload 2

import os
import spikeinterface as si
import spikeinterface.extractors as se 
import shutil 

# move to project path
PROJ_PATH = "/gpfs/bbp.cscs.ch/project/proj68/home/laquitai/bernstein_2023/"
os.chdir(PROJ_PATH)

from src.nodes.utils import get_config
from src.nodes.prepro import preprocess
from src.nodes.truth.silico import ground_truth
from src.nodes.load import load_campaign_params

# SETUP PARAMETERS
EXPERIMENT = "buccino_2020"   # the experiment 
SIMULATION_DATE = "2020"      # the run (date)
data_conf, param_conf = get_config(EXPERIMENT, SIMULATION_DATE).values()
NWB_PATH = data_conf["recording"]["input"]
WRITE_PATH = data_conf["probe_wiring"]["output"]
GT_SORTING_PATH = data_conf["sorting"]["simulation"]["ground_truth"]["input"]

2023-10-13 19:02:39,321 - root - utils.py - get_config - INFO - Reading experiment config.
2023-10-13 19:02:39,331 - root - utils.py - get_config - INFO - Reading experiment config. - done


### Get raw data

Download `sub-MEAREC-250neuron-Neuropixels_ecephys.nwb` file (28 GB):

```bash
dandi download https://api.dandiarchive.org/api/assets/6d94dcf4-0b38-4323-8250-04fdc7039a66/download/
```

### Wire probe to recording

The probe is already wired to the recording made open sourced. We just cast the recording as a Spikeinterface RecordingExtractor for processing with SpikeInterface.

In [2]:
# This takes 2h20 min ! 

# # read recording
# wired_recording = se.NwbRecordingExtractor(NWB_PATH)

# # write
# shutil.rmtree(WRITE_PATH, ignore_errors=True)
# wired_recording.save(folder=WRITE_PATH, format="binary")

# or load
wired_recording = si.load_extractor(WRITE_PATH)

### Preprocess recording

I found no indication that the data has been preprocessed from https://spikeinterface.github.io/blog/ground-truth-comparison-and-ensemble-sorting-of-a-synthetic-neuropixels-recording/ or https://dandiarchive.org/dandiset/000034 so I preprocess it. 

In [3]:
# preprocess once (takes 28 min !)
# Preprocessed = preprocess.run(data_conf, param_conf)
Preprocessed = preprocess.load(data_conf)

# write
preprocess.write(Preprocessed, data_conf)
 
# sanity check is preprocessed
print(Preprocessed.is_filtered())

write_binary_recording with n_jobs = 1 and chunk_size = None
True


### Sort ground truth spikes

In [6]:
# takes 1 sec

# cast ground truth spikes as a SpikeInterface Sorting Extractor object (1.5h for 534 units)
SortedTrue = se.NwbSortingExtractor(GT_SORTING_PATH)

# write
ground_truth.write(SortedTrue, data_conf)

  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
