# Creating a New Clustering Task in the Utah Organoids Pipeline


### **Overview**

This notebook guides users through the process of adding a new `ClusteringTask` entry to the DataJoint pipeline. A **clustering task** pairs a selected **ephys session** with a **parameter set**, enabling spike sorting for neural data analysis.

By the end of this notebook, you will:
- Select a session for clustering
- Inspect available clustering parameter sets
- Insert a new `ClusteringTask` entry into the database

**_Note:_**

- This notebook uses example data, replace values with actual database entries.
- If connected to the cloud, inserting a `ClusteringTask` **will trigger automatically computations** in downstream tables.


### **Key Steps**


- **Setup**

- **Step 1: Select Session of Interest**

- **Step 2: Insert a New `ClusteringTask` Entry**


### **Setup**


First, import the necessary packages for the data pipeline and essential schemas.


In [1]:
import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

In [2]:
import datajoint as dj

In [3]:
from workflow.pipeline import ephys

[2025-03-03 12:00:29,481][INFO]: Connecting milagros@db.datajoint.com:3306
[2025-03-03 12:00:31,306][INFO]: Connected milagros@db.datajoint.com:3306


#### **Step 1: Select Session of Interest**


Each clustering task must be linked to an existing ephys session stored in the `EphysSession` table.

Letâ€™s define a session key for an example session:

In [4]:
session_key = {
    "organoid_id": "MB07",
    "experiment_start_time": "2024-09-07 14:49:00",
    "insertion_number": 0,
    "start_time": "2024-09-07 14:49:00",
    "end_time": "2024-09-07 14:54:00",
}

session_key

{'organoid_id': 'MB07',
 'experiment_start_time': '2024-09-07 14:49:00',
 'insertion_number': 0,
 'start_time': '2024-09-07 14:49:00',
 'end_time': '2024-09-07 14:54:00'}

Check if the session exists in the database:

In [5]:
ephys.EphysSession * ephys.EphysSessionProbe & session_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,session_type,probe  unique identifier for this model of probe (e.g. serial number),port_id,"used_electrodes  list of electrode IDs used in this session (if null, all electrodes are used)"
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,spike_sorting,Q983,C,=BLOB=


#### **Step 2: Insert a New `ClusteringTask` Entry**


#### Inspect Available Clustering Parameter Sets

Before inserting a new task, check existing clustering parameter sets:

In [6]:
ephys.ClusteringParamSet()

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
0,spykingcircus2,Default parameters for spyking circus2 using SpikeInterface v0.100.1,b6fb9ec2-768c-66b0-2b71-9b8ac91e94da,=BLOB=
1,spykingcircus2,Default parameter set for spyking circus2 using SpikeInterface v0.101.*,434894d0-eb7b-db6c-80e6-638a1322c568,=BLOB=
2,kilosort2,kilosort2 with SpikeInterface version 0.101+,79a731f3-f1b6-c110-5f8a-e25227464de7,=BLOB=
5,spykingcircus2,Spyking circus2 with a detection threshold 5 (neg direction),4c895afd-a1b1-5d64-b747-e8489078e2e3,=BLOB=
11,spykingcircus2,waveform>threshold: .25->2,17d41d84-067d-791c-8706-8cab83020b84,=BLOB=
12,spykingcircus2,waveform>threshold: .25->2 attempt 2,2b28cf23-2456-8202-b70f-96871b837a26,=BLOB=
13,spykingcircus2,waveform>threshold: .25->2 attempt 2,1faf6aee-71d6-fe26-74ec-6bb7cdc0f30f,=BLOB=
14,spykingcircus2,apply_preprocessing = False,ce720015-b59a-08d6-198e-def81c860f46,=BLOB=
15,spykingcircus2,"apply_preprocessing, matched_filtering, and apply_motion_correction = False",5f7a8362-c31c-061e-14b2-74ad55466546,=BLOB=
16,spykingcircus2,"default parameters, different format",0a3d0360-c0de-6c30-9c35-7c931a9a6f62,=BLOB=


To view default parameters for **Kilosort2.5**, a common spike sorting algorithm:

In [7]:
import spikeinterface as si

si.sorters.Kilosort2_5Sorter.default_params()

{'detect_threshold': 6,
 'projection_threshold': [10, 4],
 'preclust_threshold': 8,
 'whiteningRange': 32.0,
 'momentum': [20.0, 400.0],
 'car': True,
 'minFR': 0.1,
 'minfr_goodchannels': 0.1,
 'nblocks': 5,
 'sig': 20,
 'freq_min': 150,
 'sigmaMask': 30,
 'lam': 10.0,
 'nPCs': 3,
 'ntbuff': 64,
 'nfilt_factor': 4,
 'NT': None,
 'AUCsplit': 0.9,
 'do_correction': True,
 'wave_length': 61,
 'keep_good_only': False,
 'skip_kilosort_preprocessing': False,
 'scaleproc': None,
 'save_rez_to_mat': False,
 'delete_tmp_files': ('matlab_files',),
 'delete_recording_dat': False,
 'n_jobs': 1,
 'chunk_duration': '1s',
 'progress_bar': True,
 'mp_context': None,
 'max_threads_per_process': 1}

#### Define and Insert an New Clustering Task

Check existing tasks for this organoid:

In [14]:
ephys.ClusteringTask & session_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,clustering_output_dir  clustering output directory relative to the clustering root data directory
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,2,MB5-8_raw/202409071449_202409071454/MB07/kilosort2_2
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,23,MB5-8_raw/202409071449_202409071454/MB07/kilosort3_23
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,24,MB5-8_raw/202409071449_202409071454/MB07/kilosort2_24
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,27,MB5-8_raw/202409071449_202409071454/MB07/kilosort3_27
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,MB5-8_raw/202409071449_202409071454/MB07/kilosort2-5_250
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,401,MB5-8_raw/202409071449_202409071454/MB07/kilosort4_401


To select a new parameter set, fetch the parameter set key and update the `paramset_idx` field in the `ClusteringTask` table:

In [26]:
MB07_key = (ephys.ClusteringTask & session_key & "paramset_idx=2").fetch1("KEY")

In [27]:
MB07_key


{'organoid_id': 'MB07',
 'experiment_start_time': datetime.datetime(2024, 9, 7, 14, 49),
 'insertion_number': 0,
 'start_time': datetime.datetime(2024, 9, 7, 14, 49),
 'end_time': datetime.datetime(2024, 9, 7, 14, 54),
 'paramset_idx': 2}

In [28]:
MB07_key["paramset_idx"] = 250
MB07_key

{'organoid_id': 'MB07',
 'experiment_start_time': datetime.datetime(2024, 9, 7, 14, 49),
 'insertion_number': 0,
 'start_time': datetime.datetime(2024, 9, 7, 14, 49),
 'end_time': datetime.datetime(2024, 9, 7, 14, 54),
 'paramset_idx': 250}

Insert the new `ClusteringTask` entry:

In [29]:
# Insert a new ClusteringTask entry
ephys.ClusteringTask.insert1(
    MB07_key,
    skip_duplicates=True,
)

Verify task insertion:

In [30]:
ephys.ClusteringTask & session_key & MB07_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,clustering_output_dir  clustering output directory relative to the clustering root data directory
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,MB5-8_raw/202409071449_202409071454/MB07/kilosort2-5_250


### **Next Steps**

Now that the `ClusteringTask` is created, you can:
- Run spike sorting locally using `RUN` notebooks
- Inspect results using `EXPLORE` notebooks