# Introduction to the workflow structure

This notebook gives a brief overview of the workflow structure and introduce some useful DataJoint tools to facilitate the exploration.
+ DataJoint needs to be pre-configured before running this notebook, if you haven't set up the configuration, refer to notebook [00-Set-up-configuration](00-Set-up-configuration.ipynb).
+ If you are familar with DataJoint and the workflow structure, proceed to the next notebook [02-process-ephys-workflow](02-process-ephys-workflow.ipynb) directly to run the workflow.
+ For a more thorough introduction of DataJoint functionings, please visit our [datajoint tutorial site](https://playground.datajoint.io) or [tutorials for BrainCogs U19 team](../tutorials/202103)
+ The current workflow is composed of multiple database schemas, each of them corresponds to a module within the database.
+ Some of the schemas and tables are from `u19_pipeline`, and the ephys related schemas are from the [DataJoint Array Ephys Element](https://github.com/datajoint/element-array-ephys) installed in `u19_pipeline`.

## Connect to database

In [None]:
import datajoint as dj
import numpy as np
from matplotlib import pyplot

### User setup

In [None]:
import getpass
dj.config['database.host'] = 'datajoint00.pni.princeton.edu'
dj.config['database.user'] = '<username>'
dj.config['database.password'] = getpass.getpass()

#### Import the schemas and tables in this workflow

In [None]:
subject = dj.create_virtual_module('subject', 'u19_subject')
acquisition = dj.create_virtual_module('acquisition', 'u19_acquisition')
ephys = dj.create_virtual_module('ephys', 'u19_ephys')
probe_element = dj.create_virtual_module('probe_element', 'u19_probe_element')
ephys_element = dj.create_virtual_module('ephys_element', 'u19_ephys_element')

### Developer setup
+ Connect to the database using `dj_local_conf.json` file or use the `U19-pipeline_python/notebooks/00-Datajoint-configuration.ipynb` notebook.
+ Once the `dj_local_conf.json` is declared, to load the local configuration we will change the directory to the package root.

In [None]:
import os
os.chdir('../..')

#### Import the schemas and tables in this workflow

In [None]:
from u19_pipeline import subject, acquisition, ephys
from u19_pipeline.ephys import probe_element, ephys_element

## Schemas and tables

+ A `schema` is a container/folder that houses `tables`

+ DataJoint's schema/table convention

     | Python      | Database          | example
     | ----------- | ------------------|------------
     | Module      | Schema            | `ephys`
     | Class       | Table             | `EphysRecording`


+ Each module contains a schema object that enables interaction with the schema in the database.
+ The module `ephys` is the native 


In [None]:
probe_element.schema

In [None]:
ephys_element.schema

+ The table classes in the module corresponds to a table in the schema in the database.

In [None]:
# preview table columns and contents in a table
ephys_element.EphysRecording()

+ By importing the modules for the first time, the schemas and tables will be created inside the database.
+ Once created, importing modules will not create schemas and tables again, but the existing schemas/tables can be accessed and manipulated by the modules.

## Data Exploration of Ephys Workflow
+ For the purpose of this notebook, we have already run the workflow for a few data sets. As a result, the processed data are already in the database and ready to be explored. 

### Generate a raster plot

In [None]:
units, unit_spiketimes = (ephys_element.CuratedClustering.Unit & 'subject_fullname="ms81_M005"' & 'session_date="2021-05-05"' & 'insertion_number = 0' & 'curation_id=1' & 'cluster_quality_label = "good"').fetch('unit', 'spike_times')

x = np.hstack(unit_spiketimes)
y = np.hstack([np.full_like(s, u) for u, s in zip(units, unit_spiketimes)])

fig, ax = pyplot.subplots(1, 1, figsize=(32, 16))
ax.plot(x, y, '|')
ax.set_xlabel('Time (s)');
ax.set_ylabel('Unit');

### Plot waveform of a unit

In [None]:
unit_key = (ephys_element.CuratedClustering.Unit & 'subject_fullname="ms81_M005"' & 'session_date="2021-05-05"' & 'insertion_number = 0' & 'curation_id=1' & 'unit = 26').fetch1('KEY')

ephys_element.CuratedClustering.Unit * ephys_element.WaveformSet.Waveform & unit_key

unit_data = (ephys_element.CuratedClustering.Unit * ephys_element.WaveformSet.PeakWaveform & unit_key).fetch1()

In [None]:
sampling_rate = (ephys_element.EphysRecording & 'subject_fullname="ms81_M005"' & 'session_date="2021-05-05"' & 'insertion_number = 0').fetch1('sampling_rate')/1000 # in kHz

pyplot.plot(np.r_[:unit_data['peak_electrode_waveform'].size] * 1/sampling_rate, unit_data['peak_electrode_waveform'])
pyplot.xlabel('Time (ms)')
pyplot.ylabel(r'Voltage ($\mu$V)')
pyplot.title("Waveform of a Unit")

## DataJoint tools to explore schemas and tables

+ `dj.list_schemas()`: list all schemas a user has access to in the current database

In [None]:
dj.list_schemas()

+ `dj.Diagram()`: plot tables and dependencies. 

In [None]:
# plot diagram for all tables in a schema
dj.Diagram(ephys_element)

**Table tiers**

+ Green box - Manual table
     + Manually inserted table, expect new entries daily, e.g. Subject, ProbeInsertion.  

+ Gray box - Lookup table
     + Pre inserted table, commonly used for general facts or parameters. eg. Strain, ClusteringMethod, ClusteringParamSet.  

+ Blue oval - Imported table
     + Auto-processing table, the processing depends on the importing of external files. e.g. process of Clustering requires output files from kilosort2.  

+ Red circle - Computed table
     + Auto-processing table, the processing does not depend on files external to the database, commonly used for data analyses.     

+ Plain text - Part table
     + As an appendix to the master table, all the part entries of a given master entry represent a intact set of the master entry. e.g. Unit of a CuratedClustering.

**Dependencies**

+ Thick solid line - One-to-one primary
     + Share the exact same primary key, meaning the child table inherits all the primary key fields from the parent table as its own primary key.

+ Thin solid line - One-to-many primary
     + Inherit the primary key from the parent table, but have additional field(s) as part of the primary key as well

+ Dashed line - Secondary dependency
     + The child table inherits the primary key fields from parent table as its own secondary attribute.

In [None]:
# plot diagram of tables in multiple schemas
dj.Diagram(subject) + dj.Diagram(acquisition) + dj.Diagram(ephys) + dj.Diagram(ephys_element)

In [None]:
# plot diagram of selected tables and schemas
dj.Diagram(subject.Subject) + dj.Diagram(acquisition.Session) + dj.Diagram(ephys) + dj.Diagram(ephys_element)

In [None]:
# plot diagram with 1 additional level of dependency downstream
dj.Diagram(subject.Subject) + 1

In [None]:
# plot diagram with 2 additional levels of dependency upstream
dj.Diagram(ephys_element.EphysRecording) - 2

+ `describe()`: show table definition with foreign key references.

In [None]:
ephys_element.EphysRecording.describe();

+ `heading`: show attribute definitions regardless of foreign key references

In [None]:
ephys_element.EphysRecording.heading

# Major tables for the ephys workflow

In [None]:
# subject, session, and ephys session
dj.Diagram(subject.Subject) + dj.Diagram(acquisition.Session) + dj.Diagram(ephys.EphysSession)

In [None]:
subject.Subject.describe();

In [None]:
acquisition.SessionStarted.describe();

In [None]:
acquisition.Session.describe();

In [None]:
# saves the data directory ephys data
ephys.EphysSession.describe();

+ [`ephys_element`](https://github.com/datajoint/element-array-ephys): Neuropixel based probe and ephys information. Check [this link](https://github.com/datajoint/element-array-ephys/tree/main/element_array_ephys) for definitions of the tables

In [None]:
dj.Diagram(probe_element) + dj.Diagram(ephys_element)

## Summary and next step

+ This notebook introduced the overall structures of the schemas and tables in the workflow and relevant tools to explore the schema structure and table definitions.

+ In the next notebook [02-process-ephys-workflow](02-process-ephys-workflow.ipynb), we will further introduce the detailed steps running through the pipeline and table contents accordingly.