In [1]:
# %pip install --upgrade xarray seaborn pandas numpy requests tqdm

In [1]:
# Ignore future warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import xarray as xr
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 

# ERP Analysis With Pandas And Seaborn

## Overview

We will continue to use [Steinmetz et al, 2019 in Nature](https://www.nature.com/articles/s41586-019-1787-x) dataset. The experiment involved a mouse being presented with two gradients of varying intensities. The mouse's task was to adjust a wheel to center the brighter gradient on the screen. Simultaneously, Local Field Potential (LFP) measurements were recorded across various brain areas. These measurements were taken 250 times in 2.5 seconds, with data collected at 0.01-second intervals. 


**Analysis goals**

In these exercises, our primary objective is to analyze and visualize Local Field Potential (LFP) data collected from distinct brain regions separately. Through this analysis, we aim to:
  - compute trial statistics on LFP amplitudes (e.g. mean, min, max)
  - compare these statistics between different brain areas
  

**Learning goals**

In this notebook, we'll focus on learning Seaborn's:
  - `sns.catplot()` funciton for categorical plots
  - `sns.lineplot()` function for plotting time series models
  - `sns.relplot()` for making faceted rows and columns of data of figures effectively using relplot and
  - `sns.heatmap()` for using colors to compare trends.

---

#### Download the dataset

In [2]:
from pathlib import Path
import requests
from tqdm import tqdm

def download_from_sciebo(public_url, to_filename, is_file=True):
    """
    Downloads a file or folder from a shared URL on Sciebo.
    """
    # Create the folder if a longer path was described
    path = Path(to_filename)
    if len(path.parts) > 1:
        Path(to_filename).parent.mkdir(parents=True, exist_ok=True)

    r = requests.get(public_url + "/download", stream=True)

    if 'Content-Length' in r.headers and is_file:
        total_size = int(r.headers['Content-Length'])
        progress_bar = tqdm(desc=f"Downloading {to_filename}", unit='B', unit_scale=True, total=total_size)
    else:
        progress_bar = None

    with open(to_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)
            if progress_bar:
                progress_bar.update(len(chunk))

    if progress_bar:
        progress_bar.close()

download_from_sciebo('https://uni-bonn.sciebo.de/s/JFeueaaWCTVhTZh', 'data/steinmetz_2016-12-14_Cori.nc')

---
## Extracting Data from XArray Datasets into Tidy DataFrames
### Load Dataset

In this section, we'll work with a dataset from a single session recording of Cori the mouse 🐁 ('steinmetz_2016-12-14_Cori.nc'). 

Our primary objective is to read this data and convert it into a Pandas dataframe, which will serve as the foundation for the subsequent exercises.

**Load dataset and convert to Pandas dataframe:**

| Method/Code                                             | Description                                                                   |
|--------------------------------------------------------|-------------------------------------------------------------------------------|
| `dset = xr.load_dataset("path/to/file/like/this.nc")` | Loads the dataset from the specified file path using xarray (`xr`).      |
| `df = dset['column1'].to_dataframe()`                    | Extracts the 'column1' data variable from the dataset and converts it into a Pandas DataFrame (`df`). |
| `df.reset_index()`                                   | Resets the index of the 'df' DataFrame to create a default integer index.   |
| `dset['column1'].to_dataframe().reset_index()` | All of it, together! |
| `dset[['column1', 'column2']].to_dataframe().reset_index()` | Extracts column1 and column2, converts to dataframe, and resets index |
| `df.catplot(data=df, x='categorical_column_1', y='continuous_column', kind='bar'/'count'/'box'), col='categorical_column_2` | Makes categorical plots of specified kind split into columns based on categories in categorical_column_2 |

**Exercises**

Make a variable called `dset` by calling by Xarray's `xr.load_dataset()` function on the 'steinmetz_2016-12-14_Cori.nc' session file.  Confirm that the "lfp" data variable is there.

**Example** Make a catplot for feedback_type counting number of values in each category.

In [4]:
df = dset['feedback_type'].to_dataframe().reset_index()
sns.catplot(data=df, x='feedback_type', kind='count')

Make a catplot for response_type counting number of values in each category.

Make a catplot for brain_area counting number of values in each category.

**Example** Make a bar plot visualizing how mean reaction time varies for different feedback types

In [7]:
df = dset[['feedback_type', 'reaction_time']].to_dataframe().reset_index()
sns.catplot(data=df, x='feedback_type', y='reaction_time', kind='bar')

Make a bar plot visualizing how mean response time varies for different feedback types

Make a bar plot visualizing how mean response time varies for different response types

Make a box plot visualizing how mean response time varies for different response types

Hint: Use `kind='box'`

Make a box plot visualizing how mean feedback time varies for different feedback types

**Example** Make a box plot visualizing how mean feedback time varies for different feedback types in different columns

In [12]:
df = dset[['feedback_type', 'feedback_time']].to_dataframe().reset_index()
sns.catplot(data=df, x='feedback_type', y='feedback_time', kind='box', col='feedback_type')

Make a box plot visualizing how mean response time varies for different feedback types in different columns

 Make a box plot visualizing how mean feedback time varies for different feedback types  separated into columns based on response types

 Let's plot this another way. Make a box plot visualizing how mean feedback time varies for different response types separated into columns based on feedback types

Make a box plot visualizing how mean lfp varies for different brain areas separated into columns based on feedback types

Make a box plot visualizing how mean lfp varies for different brain areas separated into columns based on response types

---

### Selecting Data based on its Values ("Logical Indexing" or "Masking") and Plotting it in MultiFaceted Line Plots with `sns.relplot()`

##### Selecting Data based on its Values ("Logical Indexing" or "Masking") and Plotting it in MultiFaceted Line Plots with `sns.relplot()`

| Code                       | Description                                                         |
|----------------------------------------|---------------------------------------------------------------------|
| `mask = df["col_1"] == 'val_1'`     | Store which values of `col_1` are equal to `'val_'` |
| `mask = mask1 & mask2` | Store which values are true for both `mask1` and `mask2` |
| `mask = mask1 \| mask2` | Store which values are true for at least one of `mask1` or `mask2` |
| `df[mask]` | Get only the rows of `df` for which the values in `mask` are `True`.  |


##### Plotting MultiFaceted Line Plots with Seaborn: `sns.relplot()`

| Code       | Description                                                         |
|-------------------|---------------------------------------------------------------------|
| `sns.relplot()` | Creates a relational plot using Seaborn. Specifies the following parameters:
| | `data`: DataFrame variable that the plot will be made from.
| | `x=`: Column to use for the x-axis of the plot.
| | `y=`: Column to use for the y-axis of the plot.
| | `kind=`: "line" for a line plot, "scatter" for a scatter plot.
| | `col=`: Column to use to split the figure into columns
| | `col_wrap=`: The max number of columns per row 
| | `n_boot=`: Number of bootstrap resampling to compute confidence intervals.

**Example** Make a line plot of `time` vs `lfp`, but only for trial numbers less than 50.

In [19]:
df = dset[['lfp']].to_dataframe().reset_index()
mask = df['trial'] < 50
sns.relplot(data=df[mask], x='time', y='lfp', kind='line', n_boot=20);

Make a line plot of `time` vs `lfp`, but only for for trials where `contrast_left` was `100`

There seems to be a strong response right after t=0.5.  This is when the visual stimulus appeared in each trial.  Let's see if the response is still there when no stimulus was presented: 

Make a line plot of `time` vs `lfp`, but only for for trials where `contrast_left` was `0` and `contrast_right` was `0`

Make a line plot of `time` vs `lfp`, but only for for trials where either `contrast_left` was greator `50` **or** `contrast_right` was greater than `50`

Make a line plot of `time` vs `lfp`, but only for `brain_area_lfp` measurements in the visual cortex area `'VISp'`.

Does the hippocampus have such a distinct response?  Make a line plot of `time` vs `lfp`, but only for `brain_area_lfp` measurements in either `'DG'` or `'CA3'`.

How does the mouse's response affect the lfp in the visual cortex?  Make a line plot of `time` vs `lfp`, but only for `brain_area_lfp` measurements in the visual cortex area `'VISp'`, and use `hue` to compare the lfp between different `response_type` values.

There are so many different brain areas; let's plot them all at once in different subplots.  Make a line plot of `time` vs `lfp`, where `col` is the brain area.  (if there are too many columns, you can set `col_wrap=3` to make new rows automatically).

For each brain area, compare the lfps to different response types.  Which brain areas seem most related to the subject's behavior?

---

## Visualizing Average LFP Data with Heatmap

Let's try to visualize same information for all brain area in a different format. Sometimes, it might be enough to only see variations in terms of color change rather than number. This case, heatmap could be very informative to identify patterns in the time series of mean LFP signal across all trials.

We will make use of group-by and pivot_table method of Pandas dataframe to aggregate LFP and Seaborn heatmap method to visualize

| Method | Description |
| --- | --- |
| `mask = df["col_1"] == 'val_1'` | Store which values of `col_1` are equal to `'val_1'`. |
| `mask = mask1 & mask2` | Store which values are true for both `mask1` and `mask2`. |
| `mask = mask1 \| mask2` | Store which values are true for at least one of `mask1` or `mask2`. |
| `df[mask]` | Get only the rows of `df` for which the values in `mask` are `True`. |
| `df.groupby(['column1','column2'])['column3'].mean().unstack()` | Aggregate `column3` with respect to `column1` and `column2` and unstack the table. |
| `df.pivot_table(index='column1', columns='column2', values='column3', aggfunc='mean')` | Does the same as above. |
|`sns.heatmap(grouped_df)`| Create heatmap of grouped_df |

**Example** Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time'.

In [28]:
df = dset['lfp'].to_dataframe().reset_index()
group = df.groupby(['brain_area_lfp', 'time'])['lfp'].mean().unstack()
sns.heatmap(group)

Make a heatmap visualization of the median Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time'

Make a heatmap visualization of the maximum Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time'

Make a heatmap visualization of the minimum Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time'

**Example** Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == 1

In [32]:
df = dset[['lfp', 'feedback_type']].to_dataframe().reset_index()
mask = df['feedback_type'] == 1
group = df[mask].groupby(['brain_area_lfp', 'time'])['lfp'].mean().unstack()
sns.heatmap(group)

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == -1

We can get the same group with a Pandas method called pivot_table 

**Example** Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' using pivot_table

In [34]:
df = dset['lfp'].to_dataframe().reset_index()
group = df.pivot_table(index='brain_area_lfp', columns='time', values='lfp', aggfunc='mean')
sns.heatmap(group)

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == 1 using pivot table

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for response_type == 1  using pivot table

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == -1 using pivot_table method

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == 1 and response_type == 1 using pivot_table method

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == 1 and response_type == -1 using pivot_table method

Make a heatmap visualization of the mean Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for feedback_type == -1 and response_type == 0 using pivot_table method

Make a heatmap visualization of the median Local Field Potential (LFP) data grouped by 'brain_area_lfp' and 'time' but only for either VISp or DG brain areas using pivot_table method