# Getting familiar with the electrophysiology (Ephys) pipeline

In this notebook, we will take a tour of the Moser group electrophysiology (ephys) pipeline, taking a close look at how each table is defined and relate to each other to represent various experimental information.

But before we begin, let's import a couple essential packages, in particular importing DataJoint (`datajoint`) package, and establish connection to the Moser lab database where the entire pipeline resides.

In [1]:
import datajoint as dj

In [2]:
dj.config['display.limit'] = 10

If you have followed all the setups for Python environment, DataJoint and the Ephys pipeline as described in [Zero to DataJoint Wiki page](https://github.com/kavli-ntnu/dj-elphys/wiki/Zero-to-datajoint), you should be able to just run the next line, importing pipeline definitions as found in the `ephys` package.

In [3]:
# from ephys import reference, acquisition, tracking, behavior, ephys, analysis

# animal = dj.create_virtual_module('mlims', dj.config['custom'].get('mlims.database', 'prod_mlims_data'))

ModuleNotFoundError: No module named 'ephys'

If you experienced any issue in importing the pipeline, please refer to the [Wiki page and guide](https://github.com/kavli-ntnu/dj-elphys/wiki/Zero-to-datajoint) again or seek help from administrators.

The entire pipeline is quite elaborate and contains many parts, grouping related tables together to facilitate navigation and understanding. Rather than looking at them all at once, we will now take a look at each of these groups of tables or **schemas**, one at a time.

## Animal information

Just like any experiment cannot occur without the animal, the entire pipeline also begin with capturing information about the animal. Naturally, information pertaining to the animal can be found under the `animal` schema.

Among other things, the `animal.Animal` table contains listing of all animals in the lab, and as we will see, serve as the starting point for the entire ephys pipeline.

In [None]:
animal.Animal()

Each animal is uniquely captured by a combination of `animal_id` and `datasource_id`. `datasource_id=0` indicates that these are animals found in MLIMS colony management system.

### Querying tables

In DataJoint, you can probe into or **query** tables to get a specific subset of data that you may be interested in. The most common type of query is **restriction**, where you'll subselect entries based on specific criteria.

#### Question 1: Find only rats

In [None]:
animal.Animal & 'animal_species = "rat"'

#### Question 2: Find only animals born after 2017

In [None]:
animal.Animal & 'animal_dob >= "2017-01-01"'

### Getting data out of the table

Once you form your desired query result, you can **fetch** back the data from the tables by calling `fetch` on the query result.

In [None]:
# get all information about animals born on or after 2017
data = (animal.Animal & 'animal_dob >= "2017-01-01"').fetch()

In [None]:
data

You can also fetch just the specific columns by specifying the column names when calling `fetch`

In [None]:
names, dobs = (animal.Animal & 'animal_dob >= "2017-01-01"').fetch('animal_name', 'animal_dob')

In [None]:
names

In [None]:
dobs

### Getting list of unique animals

You can just get the identifying information of the table by grabbing (or **fetching**) the `KEY` of the table, and then using this to **restrict** the entries.

In [None]:
keys = (animal.Animal & 'animal_dob >= "2017-01-01"').fetch('KEY')

In [None]:
keys[:5]  # first five keys

Get specific animal back

In [None]:
# get the first animal back
animal.Animal & keys[0]

## Electrophysiology Session

In order to understand how the entire `ephys` pipeline is organized, it is essential to first understand what we mean by "a session".

The following figure illustrates the structure of an electrophysiology experiment session.

<img src="../images/Ephys_Session_Structure.png" alt="drawing" width="700"/>

A single experimental **session** consists of one or more of the following:
* tracking data stream (recording from animal tracker) -- referred to as **tracking** data
* electrophsyiology data stream (recording from a single probe) -- referred to as **recording** data

Within a single **session**, you can get any number of the above, with possibility of having multiple probe recordings occuring simultaneously (e.g. when recording from two or more probes at the same time).

While the experimenter is free to split any number of recordings/trackings into a desired number of sessions, an **experimental session** is typically considered to be a single coherent collection of recordings, and is also typically the basic unit for the downstream analysis.

Additionally, a single **session** may be assocciated with one or more **tasks**, usually labeling the particular behavioral task or environment that the animal is exposed to.

To represent the above complexity, we make use of multiple related tables, starting with `Session`, representing a single experimental session in the pipeline. 

An experimental `Session` can be uniquely identified by knowing the **animal** and **the exact date time** of the experiment session. This is reflected in the definition of the `Session` table.

In [None]:
acquisition.Session()

Looking at the *definition* of the table, we see that `Session` table **refers to `Animal` table** to indicate that each `Sessions` depends on an `Animal`. In other words, you cannot possibly have an experiment session without a corresponding animal.

In [None]:
acquisition.Session.describe();

This relationship is captured by the following table **diagram**

<img src="../images/ephys_pipeline/animal_session.png" alt="drawing" width="300"/>

The power of DataJoint query shines when you start combining multiple tables in your query to ask more complex questions!

#### Question 3: Find all animals with one or more experimental session

In [None]:
animal.Animal & acquisition.Session

#### Question 4: Find all animals with sessions run on or after 2018.

In [None]:
animal.Animal & (acquisition.Session & 'session_time >= "2018-01-01"')

## Recording and Tracking

As was mentioned above, we expect to find one or more electrode probe recordings and/or tracking data stream. These information are represented by `Recording` and `Tracking` tables, respectively.

<img src="../images/ephys_pipeline/session_simple.png" alt="drawing" width="600"/>

The extended diagram now captures the relationship between `Session` and the two new tables `Recording` and `Tracking`. However, notice that `Recording` also *depends* on another table -- `ProbeInsertion`.

## Understanding recording - working with probes

In [None]:
acquisition.Recording.describe();

Looking at the definition of `Recording` table, we can see that it depends on two parent tables -- `acquisition.Session` and `acquisition.ProbeInsertion`. Dependency on `acquisition.Session` ensures that each recording must be associated with exactly one `Session` entry. What is the dependency on `ProbeInsertion` about? 

If you guessed that this captures exactly which electrode probe that the recording was performed on, then you guessed right! `ProbeInsertion` table actually represents the final table in a chain of tables capturing precise information about the probe that was inserted into the animal.

<img src="../images/ephys_pipeline/probe_insert.png" alt="drawing" width="450"/>

In [None]:
acquisition.ProbeInsertion.describe();

Conceptually, an entry in `ProbeInsertion` table represents a particular event of probe insertion into the animal, and this is uniquely identified by knowing on **which animal** and **when** the insertion was performed. You can also see that `ProbeInsertion` *refers* to `Probe` table, thereby indicating what `Probe` was actually inserted.

In [None]:
reference.Probe()

In [None]:
reference.Probe.describe();

An entry in `Probe` table refers to a specific *physical instance* of a probe, with some string of characters used to uniquely identify the probe (e.g. name or serial number of the probe). Again, `Probe` *refers* to `ProbeModel` table, that now captures the **model** of the probe, such as neuropixel probe version 1.0 or 4 probe tetrode array.

In [None]:
reference.ProbeModel()

In [None]:
reference.ProbeModel.describe();

Finally, each `ProbeModel` *refers* to the type of probe, which is a more general grouping of probe classifiation, such as neuropixels or tetrode array.

In [None]:
reference.Probe * reference.ProbeModel 

<img src="../images/ephys_pipeline/probe_insert.png" alt="drawing" width="450"/>

Put together, `ProbeInsertion` allows experimenters to capture the information about the probe that was inserted into an animal.

#### Question 5: Find all animals with tetrode array implants

In [None]:
tetrode_probes = reference.Probe & (reference.ProbeModel & 'probe_type = "tetrode_array"')

In [None]:
animal.Animal * acquisition.ProbeInsertion & tetrode_probes

## Recordings

Now we understand what `ProbeInsertion` table captures, let's get back to `Recording` table.

<img src="../images/ephys_pipeline/session_simple.png" alt="drawing" width="700"/>

In [None]:
acquisition.Recording.describe();

Notice that a entry in `Recording` is uniquely identified by a unique combination of a `Session`, `ProbeInsertion`, **and recording time**! It is the presence of this last primary key attribute that enables more than one recording to be present for each combination of `Sessions` and `ProbeInsertion`. In other words, within a `Session`, you can have more than one recordings from the same probe, as long as they are separated in time (`recording_time`).

Now you may have noticed that `Recording` also *refers* to a number of other tables. This is graphically depicted by the following diagram.

<img src="../images/ephys_pipeline/recording.png" alt="drawing" height="400"/>

`RecordingSystem`, as the name indicates, keeps track of different recroding systems.

In [None]:
reference.RecordingSystem()

`ElectrodeConfig` is a bit more complex, referring to the exact configurations of groupings of channels found in a probe that the recording occured for. We will revisit this table and other related tables later in this notebook.

#### Question 6: Pick a mouse and find all recordings performed on that mouse.

In [None]:
animal.Animal() & 'animal_id = "61fd2ac184c13c73"'

In [None]:
acquisition.Recording & 'animal_id = "61fd2ac184c13c73"'

## Tracking data

In contrast to the `Recording` table, `Tracking` table contains information about a particular tracking data stream found inside a session.

In [None]:
tracking.Tracking().describe();

In [None]:
tracking.Tracking()

Again, `Tracking` *refers* to `Session` but also has it's own primary key attribute `tracking_time`, allowing for more than one tracking to be present per session, as long as they are separated in time. 

### Where is the data?

While `Recording` and `Tracking` tables both represented data recordings, you may have noticed that we didn't see the actual data points present in the table. This is because these tables are used to describe information **about recording and tracking**, and the actual data are loaded and handled by downstream tables as we will see shortly.

## Clustering

While a stream of electrical activities in itself can be interesting (i.e. refer to `LFP` below), we are often interested in working with individual spikes, and thus want to perform spike detection/clustering on the raw signal to extract these spikes. The ephys pipeline is built so that it can express very complex configurations of clustering.

Namely, **arbitrary time segments from one or more recordings across one or more sessions** can be combined to be the target of clustering. Let's now take a look at how that is represented in the tables.

### Specification of clustering time windows

To allow for flexible combination of one or more sessions, `ClusterSessionGrouping` table is used.

<img src="../images/ephys_pipeline/cluster_group.png" alt="drawing" width="700"/>

In [None]:
acquisition.ClusterSessionGroup.describe();

`ClusterSessionGroup` table simply defines a group of one or more sessions, and the `ClusterSessionGroup.GroupMember` ties the member sessions to the group it belongs to.

Taking a look at an example group.

In [None]:
keys = acquisition.ClusterSessionGroup().fetch('KEY')

In [None]:
acquisition.ClusterSessionGroup() & keys[0]

and members for this session group

In [None]:
acquisition.ClusterSessionGroup.GroupMember & keys[3]

### Specifying time windows

<img src="../images/ephys_pipeline/group_windows.png" alt="drawing" width="900"/>

Once a group of one or more sessions is defined, `ClusterTimeWindows` and `ClusterTimeWindows.TimeWindow` are  used to subselect one or more time windows within the **concatenated session data**.

In [None]:
acquisition.ClusterTimeWindows.TimeWindow.describe();

For the most common scenario, the entire duration of the concatenated sessions are used, and this is indicated by `window_start = 0` and `window_stop = -1`.

In [None]:
acquisition.ClusterTimeWindows.TimeWindow()

# Organization of Clustering - ClusteringSetup and Clustering tables

The `ClusterSetup` table captures a valid combination of particular electrode probe (`ProbeInsertion`), clustering time windows across one or more concatenated Sessions (`ClusterTimeWindows`), as well as some clustering specific parameter (`ClusterParam`), and this depenencies are captured in the diagram below. 

<img src="../images/ephys_pipeline/clustering.png" alt="drawing" width="900"/>

While `ClusteringSetup` serves to specify the combination of clustering configurations, the `Clustering` table represents completed clustering.

Again, clusterings can be performed on:
+ Multiple concatenated sessions (although in most cases, just on one session)
+ Different time window(s) over the selected session(s)
+ Yielding units for this ***clustering***

`ClusteringSetup` is for users to specify:

+ the `ClusterSessionGroup` and `ClusterTimeWindows`
+ the probe this clustering is performed on - `ProbeInsertion`
+ the parameters to perform the clustering - `ClusterParam`
+ the directory to output this clustering results 
+ the electrode configuration information

Note: all of this is taken care of in the ingestion routine (more on that later)

`Clustering` - a processing step to either trigger the clustering or perform ingestion on performed clustering

### Curated clustering

Often times, an experimenter would follow an automatic clustering step and **manually curate** the clustering. In the pipeline, this is represented through `CuratedClustering` table.

<img src="../images/ephys_pipeline/curated_clustering.png" alt="drawing" width="350"/>

In [None]:
ephys.CuratedClustering.describe();

**CuratedClustering** captures the following information:
+ the curator
+ the time of curation
+ the directory of the new curation results

All downstream analysis are then performed on the `CuratedClustering`, allowing for different curated results to be compared side by side.

# Unit & Spike times

From the `CuratedClustering` results, definition of identified `Units`, their representative `Waveforms` as well as their spike times (`UnitSpikeTimes`) are extracted and captured in the corresponding table. Notice that the `UnitSpikeTimes` are matched back to the particular recording file during which the identified unit's spike occured.

<img src="../images/ephys_pipeline/unit_spikes.png" alt="drawing" width="500"/>

In [None]:
ephys.UnitSpikeTimes()

#### Question 7: Find all animals with at least 100,000 spike counts identified in a recording

In [None]:
animal.Animal & (ephys.UnitSpikeTimes & 'spike_counts > 100000')

## Summary diagram of the clustering structure
![](../images/clustering_detail_erd.png)

# Tracking

<img src="../images/ephys_pipeline/process_track.png" alt="drawing" width="900"/>

# Toward the results - Spikes and Tracking

<img src="../images/ephys_pipeline/task_spikes.png" alt="drawing" width="600"/>

***UnitSpikeTimes*** and ***Tracking*** data can be matched together - resulting in the ***SpikesTracking***:
> The spike times and tracking data (e.g. position, speed, head angle, etc.) associated with each spike, per unit

And such ***SpikesTracking*** can then be further narrowed down to the task, in ***TaskSpikesTracking***

In [None]:
analysis.TaskSpikesTracking()

# Electrode Configuration

#### Electrode configuration represents the electrode grouping used in a particular recording from a probe:

<img src="../images/ephys_pipeline/econfig.png" alt="drawing" width="600"/>

For ***neuropixels*** probe: 
+ 1 group - which 384 channels used

For ***tetrode array*** probe:
+ each tetrode is a group
+ which channel assigned to which tetrode

In [None]:
reference.ElectrodeConfig()

In [None]:
reference.ElectrodeConfig.ElectrodeGroup * reference.ElectrodeConfig & 'electrode_config_name = "g_5678-4_chn"'

In [None]:
reference.ElectrodeConfig.Electrode * reference.ElectrodeConfig & 'electrode_config_name = "g_5678-4_chn"'

In [None]:
dj.ERD(reference.ElectrodeConfig) + ephys.ClusteringSetup + acquisition.Recording

# Probe Insertion Location 

In [None]:
dj.ERD(acquisition.ProbeInsertion.InsertionLocation) - 1

In [None]:
acquisition.ProbeInsertion.InsertionLocation.describe();

# Probe Adjustment

In [None]:
dj.ERD(acquisition.ProbeAdjustment) - 1

In [None]:
acquisition.ProbeAdjustment.describe();