# Welcome to Inscopy
This notebook should guide you to the most common functions. You will.

1. Load data using the functions load_cells and load_TTL.
2. Inspect data using the function plot_cells.
3. Contruct peri-event data using the function peri_event.
4. Normalize the peri-event data using any of the provided normalization methods.
5. Plot (sorted) peri-event data using the function plot_PE.


  

In [None]:
# Here we import the libaries that we'll use.
import inscopy.main as inx # The main Inscopy library
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


# Loading the data
In order to run this example notebook, two example datafiles have been included. The are:

    - Mouse3_AC1_test.csv (containing all the identified cells of one recording)   
    - Mouse3_AC1_TTL.csv (Containing the TTL pulses in that same recording)
    
The files are obtained during a simple conditioning task in which a CS+ predicts a stimulus and a CS- is without scheduled concequences. The CS+, CS- and stimulus all have a duration of 2sec.


In [None]:
# Two load data functions are included in Inscopy, one for data and one for TTL pulses
cells = inx.load_cells('Mouse3_AC1_test.csv')
TTL = inx.load_TTL('Mouse3_AC1_TTL_test.csv')


Let's have a look at how the data is organized. The cells are stored in a [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html). Every column contains a cell, the index contains a timeline.

In [None]:
# Print the head of the dataframe
cells.head()

The TLL timestamps meanwhile are stored in a Python dict. The keys of the dict are the TLL channels on the nVista system. The values are again DataFrames, but in this case every row is a pulse on the channel.

In [None]:
# Look at the available keys
print('The available channels are:')
[print('   - ' +i) for i in TTL.keys()]

# Look at one in particular
print('\nThis is for instance what GPIO-1 looks like in this datafile:')
TTL['GPIO-1'].head()

# First look at the data
In the experiment from the example data, the stimulus is indicated by the pulses on channel GPIO-1, while the CS+ is indicated by the <i>offset</i> of a 1sec pulse on channel GPIO-2 and the CS- by the <i>offset</i> of a 0.5sec pulse on channel GPIO-2. Below we take a first look at the data. We use the function "plot_cells" to plot two cells as well as the stimulus on GPIO-1.

In [None]:
# Let's look at the first two cells. Here we use 'iloc' to index the first two columns, but using the column
# names of the first two cells (e.g. [['C014', 'C015']]) would also work.
first_two_cells = cells.iloc[:, :2]
stimulus = TTL['GPIO-1']

# We use the function 'plot_cells' to plot the data
inx.plot_cells(first_two_cells, stimulus)


In [None]:
# Higher time resolution of a series of 3 stimuli:
inx.plot_cells(first_two_cells, stimulus, window=[280, 380])



Interestingly, from looking at this raw data, you can alreay tell that cell C014 and cell C015 respond oppositely to the stimulus.

# Making peri-event histograms
A very straightforward way of exploring the data is by plotting peri-event histograms. Here we will build peri-event histograms of the activity of all the cells in the DataFrame 'cells' around the stimulus timestamps that we collected on channel GPIO-1.

The build-in function 'peri_event' will produce peri-event plots for one or multiple cells. If the input is a Dataframe with data from multiple cells (e.g. multiple columns) the output is a dictonary with the cell names as keys. If only one cell (one column) is passed to peri_event, the output is a Dataframe. For every cell the output is a DataFrame with a timeline in the index and the individual trials as columns.

The second argument of peri_event are the 'stamps', which will be T=0 in our peri-event data. In this case it is the stimulus onset.


In [None]:
# We're going to look at the stimulus onset (e.g. the 'start')
stamps = TTL['GPIO-1'].Start

# Grab PE_data
PE_data = inx.peri_event(cells, stamps, window=10) 

# The output is a dictionary with all the cells as keys. Let's have a look at the first cell:
PE_data['C014'].head()


As you can see the output is a Dataframe. The index is a timeline (T=0 is the stamp) and the columns are the individual trials. In this case the stimulus was presented 19 times, so the peri-event data has 19 columns. We'll plot the peri-event data below, but first we have to talk about normalization.

Inscopy has 4 build-in ways to normalize peri-event data. Z-score, baseline-subtraction and auROC are based on a baseline (by default all the datapoints in the peri-event data before T=0), while min-max is not.

### Z-score
Z-score normalization is probably the best method if you want to look at one individual cell or if you want to combine multiple cells in one dataset, but you don't have enough trials per cell to do auROC. A Z-score of '1' is the standard deviation of the baseline, thus the signal value is dependend on how much variation you have in your baseline. In addition, the mean value of the baseline is subtracted, to the baseline ends op at 0 by default.

### min-max
Min-max normalization simply rescales all the data to fit in the interval 0:1. If you want to fit the data to a different interval, simply multiply by the desired interval width and add the floor of the desired interval (e.g. multiply by 2 and subtract 1 to scale between -1 and 1).

### baseline subtraction
Baseline subtraction does not really normalize the data but it does subtract the baseline so any values are relative increases or discreases.

### auROC
The area under the receiver opperant characteristic is a measure for how much information can be gained from any particular data point. It automatically scales the data to values between 0 and 1, with the baseline at 0.5. A value of '1' at T=x means that a threshold exists which perfectly sepparates all datapoints at T=x from the baseline. A value of 0 means the same, but in this case, the values are lower than the baseline. An important difference between auROC and the other methods is that it takes all the trials into consideration to produce a single 1-D vector for each cells. This vector is not very sensitive to outliers or low baseline (such as Z-score normalization) and the values truly say something about how much information can be gained from the cell activity at any given time points. If enough trials are available it is most likely the best normalization method. For more information see [Cohen et al. Nature 2012](https://www.nature.com/articles/nature10754), especially supplementary figure S1.


In [None]:
# Since we want to plot a heatmap of an individual cell (as an example) we'll use z-score for now.
PE_data_normalized = inx.normalize_PE(PE_data, method='z-score')

# The output is again a dictionary with all the cells as keys. Let's have a look at the second cell:
PE_data_normalized['C015'].head()

Finally, to plot the data, we use the function plot_PE. Note that this function takes a single DataFrame as input (so you can't feed it the entire dict). Bellow we plot the second cell (cell C015). The second argument for the plot is the colormap for the heatmap. See an overview of all possible colormaps [here](https://matplotlib.org/stable/tutorials/colors/colormaps.html).

Note that plot_PE returns the figure and axes handles, which can be used to format the figure to your liking if you run this code from a terminal or an IDE such as Spyder or Pycharm.

In [None]:
# Plot the peri-event data
inx.plot_PE(PE_data_normalized['C015'], cmap='plasma')

In [None]:
# Interestingly, the same example dataset also has cells that are inhibited by the stimuli:
inx.plot_PE(PE_data_normalized['C014'], cmap='plasma')

# Combining multiple cells
The function plot_PE will plot one DataFrame. It does not care wether or not the columns are individual trials or average responses of individual cells. So you can either average the responses for every cell individually and combine them into one DataFrame, or use the build-in function multi_cell_pe.

Note that the normalization method we use here (auROC) might take a few minutes.


In [None]:
# Grab the data
combined_data = inx.multi_cell_PE(cells, stamps=TTL['GPIO-1'].Start, norm_method='auroc')

# Let's have a look at the data
combined_data.head()

Finally we plot the combined data in the same way we did before.

In [None]:
combined_data.columns = PE_data.keys()
inx.plot_PE(combined_data, cmap='plasma')

As you can see there is quite some variation, with some cells going down and other going up during the stimulus. There is also something happening before the stimulus onset. Presumably this is because of the presentation of the CS+. Let's have a look at that as well. Note that in this dataset, the onset of the CS+ is signalled by the *offset* of a 1sec pulse on channel GPIO-2. Channel CPIO-2 also has 0.5sec pulses that signal the CS-, so we have to do a little bit more work to grab the timestamps we are interested in.

In [None]:
# Grab the offset (stop) of stamps with a duration of >0.6sec (e.i. 1 sec) on channel GPIO-2.
timestamps = TTL['GPIO-2'].Stop[TTL['GPIO-2'].Duration>0.6]

# Do the same thing as before to create the peri-event data.
response_to_CS = inx.multi_cell_PE(cells, stamps=timestamps, norm_method='auROC')

# Plot the data
response_to_CS.columns = PE_data.keys()
inx.plot_PE(response_to_CS, cmap='plasma')


The heatmap is not very clear and neither is the average responses. This is because some cells go down in response to the stimulus while other cells go up. In that sense it probalby does not make sense to combine all the cells into one figure. Regardless of the average responses, let's sort the cells by their response to the stimulus.

In [None]:
# We know that the stimulus is from T=2s to T=4s (because the 2sec CS is from T=0s to T=2s.)
indexer = (response_to_CS.index>=2) & (response_to_CS.index<4)

# Calculate the area under the curve under that interval (by summing).
sorter = response_to_CS[indexer].sum(axis=0)

# Plot, but this time pass a sorter as well
inx.plot_PE(response_to_CS, sorter=sorter, cmap='plasma')

