This notebook will walk you through the important tables in ephys and histology schemas and introduce some useful queries

In [None]:
# import datajoint and modules from ibl_pipeline
import datajoint as dj
from ibl_pipeline import reference, subject, acquisition, behavior
from ibl_pipeline.analyses import behavior as behavior_analyses
import numpy as np

Ephys and histology tables are still in active development. We therefore recommend accessing them with `dj.create_virtual_module()`

In [None]:
ephys = dj.create_virtual_module('ephys', 'ibl_ephys')
histology = dj.create_virtual_module('histology', 'ibl_histology')

# Ephys tables

In [None]:
dj.Diagram(ephys)

In the diagram, tables not represented as a class are the ones that were the leftovers during development. We will clean these tables up once in a while and you can ignore them for the moment.

Here is a list of important tables:
    
>* ProbeModel: model of a probe, ingested from the alyx table experiments.probemodel  
>* ProbeInsertion: ingested from the alyx table experiments.probeinsertion  
>* ChannelGroup: raw index and local coordinates of each channel group
>* DefaultCluster: Cluster properties achieved from the default clustering method.
>* DefaultCluster.Metrics: metrics exported from the spike sorting softwares.
>* DefaultCluster.Metric: same contents as Metrics, each metric is a separate entry (metric_name, metric_value), to support filtering on each of the metrics.  
>* DefaultCluster.Ks2Label: label given by kilosort2, ‘good’ or ‘mua’
>* GoodClusterCriterion: Criterion to identify whether a cluster is good.
>* GoodCluster: whether a cluster is good based on the criterion defined in GoodClusterCriterion
>* Event: Different behavioral events, including 'go cue', 'stim on', 'response', 'feedback', and 'movement'
>* AlignedTrialSpikes: spike times of each trial aligned to different events


Detailed table definitions could be easily checked with the method `describe()`, for example

In [None]:
ephys.DefaultCluster.describe();

`blob@ephys` refers to a blob field in external storage, but as a user, you will feel that the field is similar as an internal blob field. 

Preview the contents of the table:

In [None]:
ephys.DefaultCluster()

# Histology tables

In [None]:
dj.Diagram(histology) + ephys.DefaultCluster + ephys.ProbeInsertion + acquisition.Session

Here is a list of important histology tables:

>* Provenance: method to estimate the probe trajectory, including Ephys aligned histology track, Histology track, Micro-manipulator, and Planned  
>* ProbeTrajectoryTemp: probe trajectory estimated with each method, ingested from Alyx table experiments.probetrajectory  
>* ChannelBrainLocationTemp: brain coordinates and region assignment of each channel, ingested from Alyx table experiments.channel  
>* ClusterBrainRegionTemp: Brain region assignment to each cluster  
>* ProbeBrainRegionTemp: Brain regions assignment to each probe, including the regions of finest granularity and their upper-level areas.  
>* DepthBrainRegionTemp: For each ProbeTrajectoryTemp, assign depth boundaries relative to the probe tip to each brain region covered by the trajectory

Tables not in active use and will be redefined:
>* InputDataSource: will be deleted
>* ProbeTrajectory: will be redefined to reflect the final probe trajectory with multiple users' approval, ingested from FlatIron data.
>* ChannelBrainLocation: will be redefined to reflect the final brain location assignment of each channel, ingested from FlatIron data.
>* ClusterBrainRegion: will be redefined to reflect the final brain region assignment of each cluster, based on ChannelBrainLocation data on Flatiron
>* ProbeBrainRegion: will be redefined to reflect the final brain regions assignment to each probe, including the regions of finest granularity and their upper-level areas.
>* DepthBrainRegion: will be redefined to reflect the final depth boundaries relative to the probe tip to each brain region covered by the trajectory

# Useful queries

## Select clusters from a particular session

In [None]:
# which sessions have cluster data?
acquisition.Session & ephys.DefaultCluster

In [None]:
# fetch the key of one of them
key = (acquisition.Session & ephys.DefaultCluster).fetch('KEY', limit=1)

In [None]:
# get all clusters of one session
ephys.DefaultCluster & key

In [None]:
# fetch spike times, takes a little while
clusters_spikes_times = (ephys.DefaultCluster & key).fetch('cluster_spikes_times')

## Filter clusters with particular metrics

There are three part tables of `ephys.DefaultCluster` that stores information of Metrics.

In [None]:
ephys.DefaultCluster.Metrics.describe();

Since metrics is longblob and we cannot query on the metrics.  
Therefore, we created another table ephys.DefaultCluster.Metric to store the values of individual field of the metrics

In [None]:
ephys.DefaultCluster.Metric()

To check what are the available metric names, we could use `dj.U()`.   

dj.U('field_name') is a uniform set of all possible values of a `field_name`, which is very useful to get how many unique values of a field in a table.

In [None]:
dj.U('metric_name') & ephys.DefaultCluster.Metric()

Check [this document](https://docs.google.com/document/d/1ba_krsfm4epiAd0zbQ8hdvDN908P9VZOpTxkkH3P_ZY/edit) for the meaning of each metric.

Now let's filter on some fields. For example, only keep clusters with firing_rate > 1 spks/sec:

In [None]:
ephys.DefaultCluster & (ephys.DefaultCluster.Metric & 'metric_name="firing_rate"' & 'metric_value>1')

Another example, presence_ratio > 0.5:

In [None]:
ephys.DefaultCluster & (ephys.DefaultCluster.Metric & 'metric_name="presence_ratio"' & 'metric_value > 0.5')

## Visualize the distributions with certain metrics

Now let's visualize some metrics of a subset of clusters, for example, all clusters from cortexlab

In [None]:
clusters_cortexlab = ephys.DefaultCluster & (acquisition.Session & 'session_lab="cortexlab"')
clusters_cortexlab

In [None]:
# fetch the values of a metric for all these clusters, for example, the "missed_spikes_est"
values = (ephys.DefaultCluster.Metric & 'metric_name="missed_spikes_est"').fetch('metric_value')

In [None]:
# plot histogram
import matplotlib.pyplot as plt
plt.hist(values, bins=100);
plt.xlabel('Missed spikes estimate');
plt.ylabel('Cluster counts');

## Filter ephys data based on behavioral performance

Now let's address a common question of how to fetch ephys data from sessions with good performance and recorded from a particular brain region

In [None]:
# get sessions with performance > 80% on easy trials
good_performance_clusters = ephys.DefaultCluster & (behavior_analyses.PsychResults & 'performance_easy > 0.8')
good_performance_clusters

If we further need to filter with brain regions, we'll need to bring in several histology related tables.
The assignment of the brain region of each cluster is in the table `histology.ClusterBrainRegionTemp`

In [None]:
histology.ClusterBrainRegionTemp.describe();

The brain region assignment depends on the Provenance:

In [None]:
histology.Provenance()

The brain region assignment is usually only meaningful with the "Ephys aligned histology track", which has the highest provenance of 70.

In [None]:
histology.ClusterBrainRegionTemp & 'provenance=70'

Now let's figure out the acronym of the region of your interest, for example we would like to fetch from "frontopolar cortex", we could do a vague search in the table `reference.BrainRegion` with the `like` keyword

In [None]:
fronto_pole = reference.BrainRegion() & 'brain_region_name like "%front%"'
fronto_pole

Restrict the ClusterBrainRegion with these region entries and good performance clusters:

In [None]:
histology.ClusterBrainRegionTemp & 'provenance = 70' & fronto_pole & good_performance_clusters

## Compute firing rate during a time period aligned to behavior event

Now let's introduce another table `ephys.AlignedTrialSpikes` that might be helpful for your exploration.

In [None]:
ephys.AlignedTrialSpikes.describe();

Spike times of each cluster where cut into different trials, aligned to one of the following events:

In [None]:
dj.U('event') & ephys.AlignedTrialSpikes()

The table has many many entries, so we **do not** recommend loading the whole table. Some restrictions are necessary.

Now let's compute the firing rate of one cluster in a time window of 0-100ms relative to the stim on time of a trial:

In [None]:
# First, pick a cluster, for example, a recent ephys session from cortexlab
clusters_recent_sessions = ephys.DefaultCluster & (acquisition.Session & 'session_lab="cortexlab"' & 'session_start_time > "2020-01-01"')
clusters_recent_sessions

In [None]:
# Then get the key of the first cluster listed above
cluster = clusters_recent_sessions.fetch('KEY', limit=1)
cluster

In [None]:
# fetch all `trial_spike_times` aligned to stim on events, compute trial number
trials_spike_times = (ephys.AlignedTrialSpikes & cluster & 'event="stim on"').fetch('trial_spike_times')
n_trials = len(trials_spike_times)
n_trials

In [None]:
# stack spikes and only count the ones that are between 0 and 100. Remember that these spike times are pre-aligned
trials_spike_times_all = np.hstack(trials_spike_times)
spike_times_window = trials_spike_times_all[np.logical_and(trials_spike_times_all>0, trials_spike_times_all<0.1)]
spike_times_window

In [None]:
# count the total spike numbers, divided by number of trials and the time window.
firing_rate = len(spike_times_window)/0.1/n_trials
firing_rate

# Summary

In this notebook, we listed a few query examples related to ephys and histology schemas that might be helpful for your research. For a full fledged introduction of major types of queries and fetches, please refer to [this notebook](../201909_code_camp/1-Explore%20IBL%20data%20pipeline%20with%20DataJoint.ipynb) during the 2019 IBL Code Camp.

Note that ephys and histology are still under active development, so it is very likely that there are bugs in the data and pipeline. This tutorial hopefully will help the users to access the current datasets easier and could contribute to the detection and fixation of bugs.