# SpikeInterface Processing Pipeline for OpenEphys Neuropixels 2 & raw Axona recordings
### Jake Swann, 2024

##### This is a notebook which takes a spreadsheet as input with information on NP2 OpenEphys recordings, and sorts all unsorted recordings in a loop. It will concatenate all recordings made for each animal on each unique day, and sort them all together, to be split apart afterwards
##### Each path in the spreadsheet should be to a folder containing all recordings in a given day with the file structure: `base_folder/rXXXX/YYYY-MM-DD/`. Trial names should match those in the spreadsheet.
##### Required spreadsheet columns are: `trial_name, path (animal and date parts only), probe_type ('NP2_openephys'), num_channels (384), include ('Y')`
##### The script loads them as a [SpikeInterface](https://github.com/SpikeInterface) object & attaches probe geometry, spike sorts using [Kilosort2 (Axona)/ Kilosort 4 (Neuropixels)](https://github.com/MouseLand/Kilosort), and allows curation of the output in the [phy](https://github.com/cortex-lab/phy/) template-gui
##### **N.B.** This requires a Python 3.8 environment with SpikeInterface v0.101+ installed
---

In [1]:
import os
import numpy as np
import pandas as pd
import spikeinterface as si
from pyscan.session_utils import gs_to_df
from pyscan.sorting_utils.np2_preprocessing import sort_np2
from pyscan.sorting_utils.axona_preprocessing import sort_axona
from pyscan.sorting_utils.collect_sessions import collect_sessions

# Load Google sheet
sheet = gs_to_df('https://docs.google.com/spreadsheets/d/1cZxgOw7worcVZq8wIPslmU2jD__xm1MXnNgbs1-9ros/edit#gid=0')
path_to_data = '/home/isabella/Documents/isabella/jake/recording_data/'
sorting_suffix = 'sorting_ks4'
probe_to_sort = 'NP2_openephys'

# Format sheet, collect trials and sessions
sheet['path'] = path_to_data + sheet['path']
sheet_inc = sheet[sheet['Include'] == 'Y']
sheet_inc = sheet_inc[sheet_inc['probe_type'] == probe_to_sort]
trial_list = sheet_inc['trial_name'].to_list()
session_list = np.unique([f"{i.split('_')[0]}_{i.split('_')[1]}" for i in trial_list])

# Collect recordings for concatenation and sorting
recording_list = collect_sessions(session_list, trial_list, sheet_inc, probe_to_sort)

# Concatenate over a single session and sort
for recording in recording_list:
	session = pd.DataFrame(recording)
	base_folder = session.iloc[0,2]
	probe_type = session.iloc[0,3]

	# Concatenate recordings
	recordings_concat = si.concatenate_recordings(session.iloc[:,0].to_list())

	if probe_type == 'NP2_openephys':
		# Save concatenated recording to .dat
		if f'concat.dat' in os.listdir(base_folder):
			print(f'{base_folder}/concat.dat already exists, skipping concatenation')
		else:
			si.write_binary_recording(recordings_concat, f'{base_folder}/concat.dat')
			print(f'Concatenated recording saved to {base_folder}/concat.dat')
		# Sort concatenated recording
		print(f'Sorting {recordings_concat}')
		sorting = sort_np2(recording = recordings_concat, 
				recording_name = session.iloc[0,1], 
				base_folder = session.iloc[0,2],
				sorting_suffix = sorting_suffix)
		
	elif probe_type == '5x12_buz':
		print(f'Sorting {recordings_concat}')
		sorting = sort_axona(recording = recordings_concat, 
				recording_name = session.iloc[0,1], 
				base_folder = session.iloc[0,2],
				electrode_type = session.iloc[0,3],
				sorting_suffix = sorting_suffix)

		session.to_csv(f'{session.iloc[0,2]}/{session.iloc[0,1][:6]}_{sorting_suffix}/session.csv') #save session trial info to .csv

/home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-15/240315_r1503_open-field-ml_1
/home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-20/240320_r1503_sleep-ml_1
/home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-20/240320_r1503_open-field-ml_2
/home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-20/240320_r1503_open-field-sl_3
/home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-15/concat.dat already exists, skipping concatenation
Sorting ConcatenateSegmentRecording: 384 channels - 30.0kHz - 1 segments - 18,613,123 samples 
                             620.44s (10.34 minutes) - int16 dtype - 13.31 GiB
Sorting loaded from file /home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-15/240315_sorting_ks4



write_binary_recording:   0%|          | 0/1330 [00:00<?, ?it/s]

Concatenated recording saved to /home/isabella/Documents/isabella/jake/recording_data/r1503/2024-03-20/concat.dat
Sorting ConcatenateSegmentRecording: 384 channels - 30.0kHz - 1 segments - 39,882,613 samples 
                             1,329.42s (22.16 minutes) - int16 dtype - 28.53 GiB
Loading recording with SpikeInterface...
number of samples: 39882613
number of channels: 384
numbef of segments: 1
sampling rate: 30000.0
dtype: int16
Preprocessing filters computed in  3.57s; total  3.57s

computing drift
Re-computing universal templates from data.


100%|██████████| 665/665 [08:45<00:00,  1.27it/s]


drift computed in  531.77s; total  535.34s

Extracting spikes using templates
Re-computing universal templates from data.


 88%|████████▊ | 582/665 [07:34<01:05,  1.27it/s]

#### Unused Code

In [None]:
import spikeinterface as si
import spikeinterface.extractors as se
import spikeinterface.widgets as sw

recording_path = '/data/isabella/jake/recording_data/NP2 data/2024-03-15/test/2024-03-15_13-05-49'
sorting_path = '/data/isabella/jake/recording_data/NP2 data/2024-03-15/test/kilosort4'

recording = se.read_openephys(folder_path=recording_path, stream_id = '0')
sorting = se.read_phy(sorting_path, exclude_cluster_groups=['noise', 'mua'])


import spikeinterface.postprocessing as sp
sorting_analyzer = si.create_sorting_analyzer(sorting=sorting, recording=recording)
sorting_analyzer.compute('random_spikes')
sorting_analyzer.compute('waveforms')
sorting_analyzer.compute_one_extension('templates')
si.postprocessing.compute_template_metrics(sorting_analyzer)
unit_locations = sorting_analyzer.compute(input="unit_locations", method="monopolar_triangulation")

sw.plot_rasters(sorting, time_range=[0, 10])



Loading recording with SpikeInterface...
number of samples: 18613123
number of channels: 384
numbef of segments: 1
sampling rate: 30000.0
dtype: int16
Interpreting binary file as default dtype='int16'. If data was saved in a different format, specify `data_dtype`.
Using GPU for PyTorch computations. Specify `device` to change this.


TypeError: string indices must be integers