# DataJoint Workflow Guide for Creating a New Clustering Task


This notebook guides users through the process of adding a new `ClusteringTask` entry to the DataJoint pipeline.


**_Note:_**

- The examples in this notebook use a sample dataset. Replace these entries with your actual database entries to access and analyze your data.



### **Key Steps**


- **Setup**

- **Step 1: Select Session of Interest**

- **Step 2: Insert a New `ClusteringTask` Entry**


### **Setup**


First, import the necessary packages for the data pipeline and essential schemas.


In [1]:
import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

In [2]:
import datajoint as dj

In [3]:
from workflow.pipeline import ephys

[2024-07-24 22:49:20,017][INFO]: Connecting milagros@db.datajoint.com:3306
[2024-07-24 22:49:21,832][INFO]: Connected milagros@db.datajoint.com:3306


#### **Step 1: Select Session of Interest**


Let's select one session as an example and create a key:


In [5]:
session_key = dict(
    organoid_id="O09",
    experiment_start_time="2023-05-18 12:25:00",
    insertion_number=0,
    start_time="2023-05-18 12:25:00",
    end_time="2023-05-18 12:26:30",
)

session_key

{'organoid_id': 'O09',
 'experiment_start_time': '2023-05-18 12:25:00',
 'insertion_number': 0,
 'start_time': '2023-05-18 12:25:00',
 'end_time': '2023-05-18 12:26:30'}

In [6]:
ephys.EphysSession * ephys.EphysSessionInfo * ephys.EphysSessionProbe & session_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,session_type,session_info  Session header info from intan .rhd file. Get this from the first session file.,probe  unique identifier for this model of probe (e.g. serial number),port_id,"used_electrodes  list of electrode IDs used in this session (if null, all electrodes are used)"
O09,2023-05-18 12:25:00,0,2023-05-18 12:25:00,2023-05-18 12:26:30,spike_sorting,=BLOB=,Q983,A,=BLOB=


#### **Step 2: Insert a New `ClusteringTask` Entry**


In [7]:
ephys.ClusteringTask.heading

# Manual table for defining a clustering task ready to be run
organoid_id          : varchar(4)                   # e.g. O17
experiment_start_time : datetime                     # 
insertion_number     : tinyint unsigned             # 
start_time           : datetime                     # 
end_time             : datetime                     # 
paramset_idx         : smallint                     # 
---
clustering_output_dir="" : varchar(255)                 # clustering output directory relative to the clustering root data directory

The `ephys.ClusteringTask` table facilitates pairing a specific `ephys.ClusteringParamSet` with a particular session. Each entry in the `ClusteringTask` table represents a clustering task awaiting execution.

Before creating a new `ClusteringTask`, let's inspect the existing parameter sets and sessions:


In [8]:
ephys.ClusteringParamSet()

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
0,spykingcircus2,Default parameters for spyking circus2 using SpikeInterface v0.100.1,b6fb9ec2-768c-66b0-2b71-9b8ac91e94da,=BLOB=
1,spykingcircus2,Default parameter set for spyking circus2 using SpikeInterface v0.101.*,434894d0-eb7b-db6c-80e6-638a1322c568,=BLOB=
2,kilosort2,kilosort2 with SpikeInterface version 0.101+,79a731f3-f1b6-c110-5f8a-e25227464de7,=BLOB=
101,spykingcircus2,Spyking circus2 using SpikeInterface v0.101.* and `include_multi_channel_metrics=False`,fd4eb67f-5784-a6ae-6cd8-25a429cad653,=BLOB=


**Attention:**

- The next code cell will insert a new entry for this experiment.

- If connected to the cloud, this will trigger a series of computations in downstream tables. Please double-check the session attributes and `paramset_idx`.


Let's insert a new entry in the `ClusteringTask` table with the selected session and `paramset_idx=101`:


In [9]:
# Insert a new ClusteringTask entry
ephys.ClusteringTask.insert1(
    dict(
        **session_key,
        paramset_idx=101,
    ),
    skip_duplicates=True,
)

In [10]:
ephys.ClusteringTask & session_key & "paramset_idx=101"

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,clustering_output_dir  clustering output directory relative to the clustering root data directory
O09,2023-05-18 12:25:00,0,2023-05-18 12:25:00,2023-05-18 12:26:30,101,O09-12_raw/202305181225_202305181226/O09/spykingcircus2_101
