# Datajoint Introductory Tutorial
In this tutorial we will use datajoint to replicate the analysis we conductd in the ONE tutorial

This tutorial assumes that you have setup the unified ibl environment for python and have set up datajoint credentials.
First let's import datajoint

In [None]:
import datajoint as dj
# for the purposes of tutorial limit the table print output to 3

We can access datajoint tables by importing schemas from IBL-pipeline. Let's import the subject schema

In [None]:
from ibl_pipeline import subject

Within this schema there is a datajoiint table called Subject. This holds all the information about subjects registered on Alyx under IBL projects. Let's access this table and look at the first couple of entries

In [None]:
subjects = subject.Subject()
subjects

Let's find the entry for the subject we looked at in the ONE tutorial, KS022. To do this we will restrict the Subject table by the subject nickname

In [None]:
subjects & 'subject_nickname = "KS022"'

We now want to find information about the behavioural sessions. This is information is stored in a table Session in the acquisition schema. Let's import this schema, access the table and display the first few entries

In [None]:
from ibl_pipeline import acquisition
sessions = acquisition.Session()
sessions

If we look at the primary keys columns (columns with black headings) in the subjects and sessions table, we will notice that both contain subject_uuid as a primary key. This means that these two tables can be combined. We want to find information about all the sessions that KS022 did in the training phase of the IBL training pipeline. When combining the tables we will therefore restrict the subject table by the subject nickname as we did before and the sessions table by the task protocol

In [None]:
(subjects & 'subject_nickname = "KS022"') * (sessions & 'task_protocol LIKE "%training%"')

There is a lot of information in this table and not all of it we are interested in for the purposes of our analysis. Let's just keep the information about the session_uuid using the proj method. We do not want any columns (apart from the primary keys which are by defauly kept) from the subject table and only want 'session_uuid' from the sesssions table. So we can write
 Note Primary keys are always retained we could have also combined first and then projected and ended up with the same result.

In [None]:
(subjects & 'subject_nickname = "KS022"').proj() * (sessions & 'task_protocol LIKE "%training%"').proj('session_uuid')

 Note Primary keys are always retained we could have also combined first and then projected and ended up with the same result. e.g
 ((subjects & 'subject_nickname = "KS022"') * (sessions & 'task_protocol LIKE "%training%"')
).proj('session_uuid')

If we compare with ONE tutorial will notice same number of sessions and that session_uuid correspods to what we defined previously as eID

Up until now we have been inspecting the content of the tables but do not actually have access to the content. This is because we actually have not read them into memory. For this we would need to use the fetch command. Let's fetch the session uuid into a pandas dataframe

In [None]:
eids = ((subjects & 'subject_nickname = "KS022"').proj() * (sessions & 'task_protocol LIKE "%training%"').proj('session_uuid')
       ).fetch(format='frame')

In [None]:
eids

Now we have found the sessions we want to get trial information associated with these sessions. One advantage of datajoint is that when the data is ingested into the tables, common computations like performance or reaction time that can be computed from the data can be executed and these values, rather than the raw data, stored in tables. 

The output from the trial data set is stored in a table called PyschResults. We can import this 

In [None]:
from ibl_pipeline.analyses import behavior
trials = behavior.PsychResults()
trials

Let's find the results for the first training day os KS022. Notice again how the primary keys of all the tables are consistent so we can combine tables.

In [None]:
eid_day1 = dict(session_uuid=eids['session_uuid'][0])
trials_day1 = ((subjects & 'subject_nickname = "KS022"').proj() * (sessions & eid_day1).proj() * trials).fetch(format='frame')
trials_day1

Note restrict by dictionary

From this table we can extract the same computations that we did in the previous tutorial. Let's look at contrasts presented and  number of contrasts

In [None]:
contrasts = trials_day1['signed_contrasts'].to_numpy()[0]
n_contrasts = trials_day1['n_trials_stim'].to_numpy()[0]
print(contrasts)
print(n_contrasts)

We can compute performance at each contrast

In [None]:
contrast_performance = trials_day1['n_trials_stim_right'].to_numpy()[0]/n_contrasts
print(contrast_performance)

In [None]:
import matplotlib.pyplot as plt
plt.plot(contrasts * 100, contrast_performance * 100)
plt.scatter(contrasts * 100, contrast_performance * 100)
plt.ylim([0,100])
plt.xticks([*(contrasts * 100)])
plt.xlabel('Stimulus Contrast (%)')
plt.ylabel('Performance (%)')