In [None]:
# %pip install xarray matplotlib scipy pandas

In [None]:
# import external packages
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt

# Add path with self-created packages and import them
import sys
sys.path.append('./src')
import sciebo

# Spike Time Analysis with Pandas and Matplotlib


In the experiment reported by [Steinmetz et al, 2019 in Nature](https://www.nature.com/articles/s41586-019-1787-x), mice perform a discrimination task where they move the position of a stimulus using a steering wheel. During the experiment, electrophysiology recordings are made of different brain areas. The recordings are made with a dense array of probes called NeuroPixels.



##### Analysis Goals
In this notebook we will visualize spiking events, examine their relationship with key experimental variables, and compare activity patterns across distinct brain regions.
##### Learning Goals
We will once again use the [**Pandas**](https://pandas.pydata.org/) Python library to carry out numerical operations and plot results with the [**matplotlib**](https://matplotlib.org/)  plotting library.

---

### Practice with Pandas: Extracting information from a DataFrame

Each spike is characterised by 4 pieces of information:
* The identifier for that spike
* The trial in which the spike occurred
* The cell in which the spike occurred
* The time during the trial when the spike occurred


| Command                                    | Description                                                                                           |
|--------------------------------------------|-------------------------------------------------------------------------------------------------------|
| `df[df['column_B'] == 42]`                 | Filter a DataFrame where the values of `column_B` are 42.                                             |
| `df['column_A'].nunique()`                 | Count the number of unique values in `column_A` of a DataFrame.                                       |
| `df.groupby('column_B')['column_A'].nunique()` | Group a DataFrame according to values of `column_B` and count the number of unique values in `column_A`. |
| `df.max()`                                 | Find the maximum value of a DataFrame.                                                                |
| `df.idxmax()`                              | Find the index where the maximum value occurs in a DataFrame.                                         |
| `df.idxmin()`                              | Find the index where the minimum value occurs in a DataFrame.                                         |
| `plt.hist(data_values)`                    | Plot a histogram of `data_values` using Matplotlib.                                                   |



In [None]:
# Download the dataset

sciebo.download_from_sciebo('https://uni-bonn.sciebo.de/s/3Uf2gScrvuTPQhB', 'data/steinmetz_2017-01-08_Muller.nc')

Create a `dataset` with `xr.load_dataset`

Make a DataFrame `df` with spiking data (spike_time, spike_cell, spike_trial)

**Exercises**

**Example Exercise**

How many cells spiked during trial 1?

In [None]:
df[df['spike_trial'] == 1]['spike_cell'].nunique()

How many cells spiked during trial 24?

How many cells spiked during the last trial?

Make a list of the number of spiking cells of each trial. **Hint** - use `df.groupby('spike_trial')`

Name the result `n_spiking_cells`

What was the maximum number of spiking cells?

Which trial had the most spiking cells?

Name this trial `trial_num_with_most_spikes`

Use the trial number you found above to index the datafame `n_spiking_cells`. What is the result? 

Find the trial number and the spike count where the least number of cells spiked

Let's visualize the distribution of the number of spiking cells with a matplotlib histogram. 

Make the same histogram as above but with steps


## Visualize the spiking activity using spike times



Now that we have the spike timing data, let's visualize them, and some of their properties, using matplotlib:

| Command                          | Description                                                        |
|----------------------------------|--------------------------------------------------------------------|
| `plt.scatter(x, y)`              | Create a scatter plot of `x` vs `y`. `x` and `y` are both 1D arrays. |
| `plt.xlabel("your_text")`        | Label the x-axis of the plot with specified text.                  |
| `plt.ylabel("your_text")`        | Label the y-axis of the plot with specified text.                  |
| `plt.title("your_text")`         | Add a title to the plot with the specified text.                   |



**Example**

Create a scatter plot to visualize spikes with spike time on X-axis and spike_trial on Y-axis for a neuron number 12.

In [None]:
dd = df[df['spike_cell'] == 12]
plt.scatter(dd['spike_time'], dd['spike_trial'])

Create a scatter plot to visualize spikes with spike time on X-axis and spike_trial on Y-axis for a neuron number 11.

Instead of a dot, plot with `|`. Use `plt.scatter?` to figure out how

By default, matplotlib does not provide axis labels. 

Remake your plot with axis labels


Create a scatter plot to visualize spikes with spike time on X-axis and spike_trial on Y-axis for a neuron number 83.

We can vary the color of the points based of the spike time with the `c` argument of `plt.scatter`

Remake the above plot setting the color of the points to the spike time.

Remake the plot above, giving the plot an appropriate title 

---

## Relating spike timing to event-based variables

In this section, we will expand on the previous section and analyse spike time events joinly with other experimental events such as `response_time` or `feedback_time`.

| Command                                               | Description                                                                                     |
|-------------------------------------------------------|-------------------------------------------------------------------------------------------------|
| `pd.merge(df_1, df_2)`                                | Merge `df_1` and `df_2` DataFrames based on common columns.                                     |
| `pd.merge(df_1, df_2, left_on='column_on_df_1', right_on='column_on_df_2')`                                | Merge `df_1` and `df_2` DataFrames based on `column_on_df_1` on `df_1` and `column_on_df_2` on `df_2`.                                     |
| `df.rename(columns={"old_column_name": "new_column_name"})` | Rename the "old_column_name" column to "new_column_name" in the DataFrame.                      |
| `df.sort_values(by="column_b")`                       | Sort the DataFrame based on the values in the "column_b" column in ascending order.             |
| `df['column_a'].factorize()[0]`                       | Encode the "column_a" column as an enumerated type, replacing each unique value with an integer. `[0]` returns just the array of integers. |


Create a DataFrame `experimental_df` containing experimental events `response_time`, `feedback_time`, `gocue` and `active_trials`

In [None]:

cols = ["response_time", "feedback_time", "gocue", "active_trials"]
experimental_df = dataset[cols].to_dataframe().reset_index().rename(columns={"trial": "spike_trial"})
experimental_df

**Example**

Merge the DataFrames `df` and `experimental_df` into one DataFrame

In [None]:
spike_time_df = pd.merge(df, experimental_df)
spike_time_df

Create a merged dataframe of `df` and `experimental_df` but only where `active_trials` is `True`

Name the result `spike_time_df`

Make a scatter plot showing spike time events for a cell number 48 from `spike_time_df`

Overplot the above plot with another scatter plot of the response times against spike trials

We can better assess the relation between the response event and neuron spiking by sorting the cells by response time.

First sort the spike data for cell 3 by response time. Call the result `spike_cell_48_df`

Now add a new column called `spike_trial_sorted` to `spike_cell_48_df` that is the `spike_trial` data reset to 0. **Hint** use `df.factorize()`

Make a scatter plot showing the spike time against spike_trial_sorted in black and 

overplot with a scatter plot with response times on the x axis and the `spike_trial_sorted` values on the y axis in crimson. This plot is known as a rasterplot.

Cell 48 is clearly has a strong relationship to the subject's response. Can you think of why this may be? What brain region is cell 48 in?

**Hint** - use `dataset["brain_area"].to_dataframe().reset_index()`

Make another rasterplot but for cell 187

What do you notice about the above rasterplot? Do you think that anything special happens around 0.5s into each trial? **Hint** Look at the Attributes of the `dataset`

In [None]:
dataset

## Spiking activity in different brain areas

The dataset includes which brain areas each spiking cell is in. In this section we will use pandas and matplotlib to find the most active brain areas

| Command                                               | Description                                                                                     |
|-------------------------------------------------------|-------------------------------------------------------------------------------------------------|
| `pd.merge(df_1, df_2)`                                | Merge `df_1` and `df_2` DataFrames based on common columns.                                     |
| `pd.merge(df_1, df_2, left_on='column_on_df_1', right_on='column_on_df_2')`                                | Merge `df_1` and `df_2` DataFrames based on `column_on_df_1` on `df_1` and `column_on_df_2` on `df_2`.                                     |
| `df.sort_values(by="column_b")`                       | Sort the DataFrame based on the values in the "column_b" column in ascending order.             |
| `df['column_a'].factorize()[0]`                       | Encode the "column_a" column as an enumerated type, replacing each unique value with an integer. `[0]` returns just the array of integers. |
|`df['col_A'].value_counts()`|counts and lists how many times each unique category appears in the `col_A`|


Create a dataframe that contains the brain area and call it `brain_region_df`.


Create a DataFrame with information about spikes and brain areas by merging `brain_region_df` and `df` (use left_on and right_on parameters)

What brain areas were recorded from this specific mouse?

How many spikes occur in  each brain area?

We will take the spikes for trial 187, sort by 'brain_area', and create another column named 'ordered_index' by factorizing spike cell

Let's make a rasterplot for one brain area showing the `spike_time` on the x axis and `ordered_index` on the y axis

Which brain areas are in `dd`?

For each brain area make a rasterplot on the same figure 

Make the same plot in seaborn