# Automated spike sorting and quality metrics. 
This is a pipeline which spike-sorts neuropixel data aquired with [SpikeGLX](https://billkarsh.github.io/SpikeGLX/). 

It makes use of the [Spike Interface](https://spikeinterface.readthedocs.io/en/latest/) framework and assumes
that [Kilosort 3](https://github.com/MouseLand/Kilosort) is installed on this machine.

This version was created by [Thom Elston](https://www.thomelston.com/) in December 2023.

In [1]:
import spikeinterface.full as si
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import os
import multiprocessing

  "class": algorithms.Blowfish,


These are the only user defined parameters - everything else is automated!

The things you need to change are `base_folder`, `brain_areas`, and `kilosort3_path`

In [2]:
# highest level folder for this recording
base_folder = Path('E:/D20231214_Rec04_g0/')

# subfolders for each probe
probe_folders = [folder for folder in base_folder.glob('*') if folder.is_dir()]

# which brain area is probe 0 and 1?
brain_areas = ['CdN', 'OFC']

# set the path to kilosort 3, which we'll use to spike sort
kilosort3_path = 'C:/Users/Thomas Elston/Documents/MATLAB/Kilosort-3'
si.Kilosort3Sorter.set_kilosort3_path(kilosort3_path)

# get the default sorting parameters for ks3 and change a few for our purposes
default_ks3_params = si.get_default_sorter_params('kilosort3')
params_kilosort3 = dict(projection_threshold= [10, 4])

# figure out how many cores are available on this machine 
num_cores = multiprocessing.cpu_count()

# set parameters for parallelized operations
job_kwargs = dict(n_jobs=num_cores-2, chunk_duration='1s', progress_bar=True)

# should we delete the intermediate files generated by preprocessing + sorting?
delete_intermediate = True

Setting KILOSORT3_PATH environment variable for subprocess calls to: C:\Users\Thomas Elston\Documents\MATLAB\Kilosort-3


In [3]:
# Start looping over each probe
for i in range(len(probe_folders)):

    i_probe = probe_folders[i]
    i_brain_area = brain_areas[i]

    print('Loading ' + str(i_probe) + ' in ' + str(i_brain_area) + '\n')

    # read and verify the data streams for this probe
    stream_names, stream_ids = si.get_neo_streams('spikeglx', i_probe)

    # get the action-potential data stream
    ap_stream = stream_names['.ap' in stream_names]

    # we do not load the sync channel, so the probe is automatically loaded
    raw_rec = si.read_spikeglx(i_probe, stream_name=ap_stream, load_sync_channel=False)

    print('Bandpassing the signal.')
    # do a series of signal preprocessing steps:
    # 1. bandpass the data
    rec1 = si.highpass_filter(raw_rec, freq_min=400.)

    # 2. find and remove bad channels
    print('Finding and removing bad channels...')
    bad_channel_ids, channel_labels = si.detect_bad_channels(rec1)
    rec2 = rec1.remove_channels(bad_channel_ids)
    print('bad_channel_ids:', bad_channel_ids)

    # 3. apply a shift correction to account for multiplexing error
    print('Correcting multiplexing temporal shift...')
    rec3 = si.phase_shift(rec2)

    rec = rec3

    # now save the preprocessed data for use in kilosort 3
    print('Saving preprocessed data... \n')
    rec = rec.save(folder=i_probe / 'preprocess', format='binary', **job_kwargs)

    # run kilosort 3
    print('Running kilosort 3... \n')
    out_name = i_probe / 'ks3_out'
    sorting = si.run_sorter('kilosort3', rec, output_folder=out_name, verbose=True, **params_kilosort3)

    # now extract waveforms and spike positions to compute quality metrics
    print('Extracting waveforms for QC metrics...')
    we = si.extract_waveforms(rec, sorting, folder= i_probe / 'waveforms_ks3',
                          sparse=True, max_spikes_per_unit=1000, ms_before=1.5,ms_after=2.,
                          **job_kwargs)
    si.compute_spike_locations(we)

    # compute quality metrics
    print('Computing QC metrics...')
    metrics = si.compute_quality_metrics(we, metric_names=['firing_rate', 'presence_ratio', 'snr',
                                                       'isi_violation', 'drift','amplitude_median', 'amplitude_cutoff'])
    
    # save the quality metrics
    metrics_save_name = i_probe / 'ks3_out' / 'sorter_output' / 'quality_metrics.csv'
    metrics.to_csv(metrics_save_name)

    # check if we should delete the intermediate files
    if delete_intermediate:
        print('Deleting intermediate files...')
        files_to_delete = [i_probe / 'ks3_out' / 'sorter_output' / 'temp_wh.dat',
                           i_probe / 'preprocess' / 'traces_cached_seg0.raw']
        
        # Delete each file
        for file_path in files_to_delete:
            try:
                os.remove(file_path)
                print(f"Deleted: {file_path}")
            except OSError as e:
                print(f"Error deleting {file_path}: {e}")

    print('Finished preprocessing and sorting in ' + i_brain_area + '\n')

print('Finished all files. :)')          

Loading E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0 in CdN

Bandpassing the signal.
Correcting multiplexing temporal shift...
Saving preprocessed data... 

write_binary_recording with n_jobs = 9 and chunk_size = 30000


write_binary_recording:   0%|          | 0/5717 [00:00<?, ?it/s]

Running kilosort 3... 

RUNNING SHELL SCRIPT: E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0\ks3_out\sorter_output\run_kilosort3.bat


(base) c:\Users\Thomas Elston\Documents\PYTHON\Neuropixel_preprocessing>E:



(base) E:\>cd E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0\ks3_out\sorter_output 



(base) E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0\ks3_out\sorter_output>matlab -nosplash -wait -r "kilosort3_master('E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0\ks3_out\sorter_output', 'C:\Users\Thomas Elston\Documents\MATLAB\Kilosort-3')" 

kilosort3 run time 21638.42s
Extracting waveforms for QC metrics...


extract waveforms shared_memory multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

extract waveforms shared_memory multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

extract waveforms shared_memory multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

extract waveforms memmap multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

localize peaks using center_of_mass:   0%|          | 0/5717 [00:00<?, ?it/s]

Computing QC metrics...




Deleting intermediate files...
Deleted: E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0\ks3_out\sorter_output\temp_wh.dat
Error deleting E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec0\preprocess\traces_cached_seg0.raw: [WinError 32] The process cannot access the file because it is being used by another process: 'E:\\D20231214_Rec04_g0\\D20231214_Rec04_g0_imec0\\preprocess\\traces_cached_seg0.raw'
Finished preprocessing and sorting in CdN

Loading E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1 in OFC

Bandpassing the signal.
Correcting multiplexing temporal shift...
Saving preprocessed data... 

write_binary_recording with n_jobs = 9 and chunk_size = 30000


write_binary_recording:   0%|          | 0/5717 [00:00<?, ?it/s]

Running kilosort 3... 

RUNNING SHELL SCRIPT: E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1\ks3_out\sorter_output\run_kilosort3.bat


(base) c:\Users\Thomas Elston\Documents\PYTHON\Neuropixel_preprocessing>E:



(base) E:\>cd E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1\ks3_out\sorter_output 



(base) E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1\ks3_out\sorter_output>matlab -nosplash -wait -r "kilosort3_master('E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1\ks3_out\sorter_output', 'C:\Users\Thomas Elston\Documents\MATLAB\Kilosort-3')" 

kilosort3 run time 21751.44s
Extracting waveforms for QC metrics...


extract waveforms shared_memory multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

extract waveforms shared_memory multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

extract waveforms shared_memory multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

extract waveforms memmap multi buffer:   0%|          | 0/5717 [00:00<?, ?it/s]

localize peaks using center_of_mass:   0%|          | 0/5717 [00:00<?, ?it/s]

Computing QC metrics...
Deleting intermediate files...
Deleted: E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1\ks3_out\sorter_output\temp_wh.dat
Error deleting E:\D20231214_Rec04_g0\D20231214_Rec04_g0_imec1\preprocess\traces_cached_seg0.raw: [WinError 32] The process cannot access the file because it is being used by another process: 'E:\\D20231214_Rec04_g0\\D20231214_Rec04_g0_imec1\\preprocess\\traces_cached_seg0.raw'
Finished preprocessing and sorting in OFC

Finished all files. :)




In [2]:
num_cores = multiprocessing.cpu_count()
print(num_cores)

20
