## Introduction

#### The Spatial Selectivity of Cortical Neurons

Localization of sound is important for both communication and survival, as it plays a key role in knowing to whom and towards where one should focus their attention. The ability to localize sound is also very useful for pinpointing sounds in loud environments. How well a person or animal can perceive the surrounding sounds depends on how the audio they are receiving is being processed.

There are several places along the ascending audiotory pathway where neurons react to spatial cues. One of these areas is the auditory cortex, though the neurons of this area tend to be more broadly spatially tuned when compared to neurons of other auditory system locations. However, it has also been found that inactivation of the auditory cortex can lead to sound localization impairment.

One way to determine how spatial processing takes place in the auditory cortex is by observing the tuning curve and PSTH created by neurons when exposed to stimuli, as this gives us an idea of their spatial selectivity. The A1 cortical neurons have been said to react best to stimuli from the contralateral side, while other cortical neurons may react more or less to both lateral stimuli.

<p style="text-align: center;"><img src="images/spatial_tuning_curves.JPG" alt="drawing" width="600"/></p>

The dataset of this notebook was pulled from a larger set of neurons recorded from ferret auditory cortex. In this experiment, two noise sounds were played from speakers positioned left and right of the midline. Both noises had the same spectral properties, but the level of the sounds changed randomly and independently over the course of each trial. Changing the relative level of the two sounds creates a dynamic inter-aural level difference (ILD), which produces a percept of a single source moving in space. This approach allows the study of "fusion", the process by whcih two sounds are perceived as a single sound, combining properties of the two sources.

For auditory neurophysiology, we usually refer to spatial sources as contralateral (*contra*) or ipsilateral (*ipsi*) to the cortical hemisphere where the data were recorded.

<p style="text-align: center;"><img src="images/speaker_setup.JPG" alt="drawing" width="300"/></p>

We'll be looking at five simultaneously recorded neurons today.

## Getting started

This notebook will require a new library, **scikit-learn**, which contains several useful functions for modeling and statistical analysis. Today we'll be using it to perform multivariate linear regression in order to identify the relative contribution of each sound to the neural response.

```conda install scikit-learn```

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tables as tb
import pandas as pd
from sklearn import linear_model
from sklearn.linear_model import LinearRegression

## Cartoon regression example

### Generate dummy data 

To demonstrate how multivariate linear regression, we'll start with a cartoon example. Here's a very simple dataset, where you record output $y$ while inputs $x_0$ and $x_1$ fluctuate randomly.  In this dataset, the value of $y$ depends on $x_0$ but not $x_1$.

In [None]:
X = np.random.randn(100,2)
y = X[:,0:1] + np.random.randn(100,1)/2

### Display it

In [None]:
f,ax=plt.subplots(1,2)
ax[0].scatter(X[:,0],y,s=3)
ax[0].set_xlabel('$X_0$ (Independent variable)')
ax[0].set_ylabel('y (Dependent variable)')
ax[1].scatter(X[:,1],y,s=3)
ax[1].set_xlabel('$X_1$ (Independent variable)');

### Fit a line

The `LinearRegression` model is built into scikit-learn. Given a set of one or more inputs (X) and an output (y), it will fit coefficients for a linearl model where y is the weighted some of the values of X in each sample:

$y = \beta_0 x_0 + \beta_1 x_1 + y_0$

The slopes of the regression fit lines ($\beta$) are often called "beta weights".

In [None]:
regr = linear_model.LinearRegression()
regr.fit(X, y)
slopes=regr.coef_
print("Slope (beta) for y vs. x0:", slopes[0][0])
print("Slope (beta) for y vs. x1:", slopes[0][1])

### Find the intercept

The `intercept_` property tells you $y_0$, where the regression line crosses the y-axis (i.e., the value of $y$ when $x_0 = x_1 = 0$.

In [None]:
intercept=regr.intercept_
print("Y intercept:", intercept)

### Create a Scatterplot with regression line

We can visualize the results by overlaying the regression line on the scatter plots of input-output data for the two channels.

In [None]:
f,ax=plt.subplots(1,2, figsize=(6,3), sharex=True, sharey=True)
x_lim=np.array([X.min(),X.max()])
ax[0].scatter(X[:,0],y,s=3)
ax[0].plot(x_lim,intercept+x_lim*slopes[0][0])
ax[0].set_ylabel('y (Dependent variable)')
ax[0].set_xlabel('$X_0$ (Independent variable)')
ax[1].scatter(X[:,1],y,s=3, color='orange')
ax[1].plot(x_lim,intercept+x_lim*slopes[0][1], color='orange')
ax[1].set_xlabel('$X_1$ (Independent variable)')

## Analyze binaural auditory data

### Load the data

The data/ folder containers stimulus and response data for 5 neurons. The same stimulus was actually presented to each neuron, but for generality, there's a separate file with the stimulus for each one.

In [None]:
stim_a1 = np.loadtxt(open('data/por026c-a1_stim.csv'), delimiter=",")
raster_a1= np.loadtxt(open('data/por026c-a1_raster.csv'), delimiter=",")
stim_b1=np.loadtxt(open('data/por026c-b1_stim.csv'), delimiter=",")
raster_b1=np.loadtxt(open('data/por026c-b1_raster.csv'), delimiter=",")

In [None]:
# compute average PSTH across trials and adjust from spikes/samples to spikes/sec
fs = 100  # samples/sec
psth_a1=np.mean(raster_a1, axis=1) * fs
psth_b1=np.mean(raster_b1, axis=1) * fs

The response data has length T, indicating the total number of distinct stimulus-response samples. Note the stimulus data is Tx2, corresponding to the sound level contra (channel 0) and ipsi (channel 1) to the recording site.

In [None]:
print("Stimulus shape:",stim_a1.shape)
print("PSTH shape:",psth_a1.shape)

### Exercise 1 - Generate a PSTH plot and a stimuli plot to show the data for Neuron b1

Peri-Stimulus Time Histograms (PSTH) are used to visualize the timing and rate of neuronal spiking in response to external stimuli. To generate this plot we will be using `plt.subplot`. Since we will be creating two plots on one figure, ensure that you adjust the arguments to reflect this. The data we will be using for the stimulus plot will be `stim_b1` and the data to be used for the PSTH plot will be `psth_b1`. The stimulus plot should be at the top of the stack and so should be labeled `ax[0]`, and the PSTH plot should be labeled `ax[1]`.

In [None]:
# Answer 
fig,ax=plt.subplots(nrows=2,ncols=1,figsize=(15,5))
ax[0].plot(stim_b1)
ax[1].plot(psth_b1)

### Exercise 2 - Create a figure legend and labels for your axes

We've created two quick and dirty plots. Now we now need to ensure that we label our axes accurately. Since the plots are stacked one over the other, one plot can share the x-axis of the other. The axes should be labeled as follows: 

* Y axis (stimulus): "Amplitude (fraction 80dB)" 
* X axis (PSTH): "Time (sec)"
* Y axis (PSTH): "Spikes/sec"

The sound is calibrated so that a level of 1 is equivalent to 80 decibels (dB), which corresponds to something that is just about uncomfortably loud to a normal listener. Levels of 60-65 dB are equivalent to conversational speech.

We also need to have a sense of what the colors of these lines in the stimulus plot mean, and we can do so through the use of a figure legend. using the `ax[].legend` function, label the line for channel 0 as "contra" and the line for channel 1 as "ipsi".

In this case, the `t` value will be used in place of the `psth_b1` as the `np.arange` changes the x axis from samples to seconds as 100 samples correspond to 1 sec of data.  

In [None]:
t = np.arange(len(psth_b1))/fs

In [None]:
# Answer
fig,ax=plt.subplots(nrows=2,ncols=1,figsize=(15,5))
ax[0].plot(t,stim_b1)
ax[0].set_ylabel('Amplitude (fraction 80dB)')
ax[0].legend(('contra','ipsi'))
ax[1].plot(t,psth_b1)
ax[1].set_ylabel('Spikes/sec')
ax[1].set_xlabel('Time (sec)')

### Exercise 3- Seperate the two stimulus plots to see which signal is likely causing the neuron b1 response

We have now visualized the data for neuron b1, which signal do you think has more of an effect on the neuron?

This isn't an easy question to answer just from looking at these two graphs, so a good idea would be to separate the two stimulus signals so we can get a better look at the different stimuli peaks. To do so, we can simply splice out the lines we want to see.

You can copy the previous code and amend it to your needs. The difference this time is that we are offset one of the stimulus lines by a value of 0.2. The y-axis is no longer accurate in terms of amplitude, but we get a better view of what stimulus channel is dictating the response. 

In [None]:
# Answer
fig,ax=plt.subplots(nrows=2,ncols=1,figsize=(15,5))
ax[0].plot(t,stim_b1[:,0])
ax[0].plot(t,stim_b1[:,1]+0.2)
ax[0].legend(('contra','ipsi'))
ax[0].set_ylabel('Amplitude (fraction 80dB)')

ax[1].plot(t,psth_b1)
ax[1].set_ylabel('Spikes/sec')
ax[1].set_xlabel('Time (sec)')

### Exercise 4 - Find the slopes for the scatterplot

Even after separting the two graphs, it can still be hard to tell which signal is driving the neuronal response. So we'll try another way: Linear Regression. This will utilize the provided data points and allow us to generate a regression line to show correlation between the stimuli and the response.  

To start off, we will be saving `stim_b1` and `psth_b1` as `X_b1` and `y_b1`. 
Scikit has made it easy to generate the information needed to form the scatterplot so to form the regression line, we will create a model with `linear_model.LinearRegression()` and save it to `regr_b1`. Then we will be using the sklearn `regr_b1.fit()` to create a lines of best fit, use you x and y values for the argument. Finally we will be using the sklearn `.coef_` to find the slopes, which will be saved as `slopes_b1`.

This code can be modified from the cartoon example above.

In [None]:
X_b1 = stim_b1
y_b1 = psth_b1

In [None]:
# Answer
regr_b1 = linear_model.LinearRegression()
regr_b1.fit(X_b1, y_b1)
slopes_b1=regr_b1.coef_

In [None]:
print(slopes_b1)

### Exercise 5 - Find the intercept of the scatterplot 

As is common in a many linear graphs, this graph needs an intercept. Determine the intercept from the `.intercept_` property and save it to `intercept_b1`.

In [None]:
# Answer
intercept_b1 = regr_b1.intercept_
intercept_b1

### Exercise 6 - Create the scatterplots

Using the information gathered from the exercises 4 and 5, generate the scatterplot. Remember that we have generated two points of data with each function.

`Hint:` This chuck of code can be found at the top of the notebook, go grab it and change the arguments to match the variables used here.

In [None]:
# Answer
f,ax=plt.subplots(1,2)
x_lim=np.array([X_b1.min(),X_b1.max()])
ax[0].scatter(X_b1[:,0],psth_b1,s=3)
ax[0].plot(x_lim,intercept_b1+x_lim*slopes_b1[0])
ax[1].scatter(X_b1[:,1],psth_b1,s=3, color='orange')
ax[1].plot(x_lim,intercept_b1+x_lim*slopes_b1[1], color='orange')


### Exercise 7 - Label and tidy the two plots

Looking good but not quite there yet. Let's make some aesthetic add some labels and put some distance between the plots.
The the x label should be `Amplitude (fraction 80dB)`, and the y label for both points should be  `Spikes/sec`.

`Hint:` Since the two plots share the same x and y, we can make use of the `sharex` and the `sharey` parameters for `subplots`.

In [None]:
# Answer
f,ax=plt.subplots(1,2, figsize=(6,3), sharex=True, sharey=True)
x_lim=np.array([X_b1.min(),X_b1.max()])
ax[0].scatter(X_b1[:,0],psth_b1,s=3)
ax[0].plot(x_lim,intercept_b1+x_lim*slopes_b1[0])
ax[0].set_ylabel('Spikes/sec')
ax[0].set_xlabel('Amplitude (fraction 80dB)')
ax[0].set_title('Contra')
ax[1].scatter(X_b1[:,1],psth_b1,s=3, color='orange')
ax[1].plot(x_lim,intercept_b1+x_lim*slopes_b1[1], color='orange')
ax[1].set_xlabel('Amplitude (fraction 80dB)')
ax[1].set_title('Ipsi');

The larger slope for channel 0 indicates that this neuron is driven more by channel 0. But the slope for channel 1 is also positive indicating that the contra channel also has some effect on the response. 

### Exercise 8-Measure variance explained

We have successfully generated a pair of scatterplots for the ipsilateral and contralateral data points for neuron_b1. However, is the variance in the spikes truly caused by the changes in amplitude? One way to find out is to make use of this equation:

$\bar{y} = \text{mean}(y)$

$\text{var}(y) = \text{mean}[(y-\bar{y})^2]$

Error: $E = y - y_{\text{pred}}$

Fraction variance explained = $\frac{\text{var}(y) - \text{var}(E)}{\text{var}(y)}$

The fraction variance explained is equivalent to the square of the correlation coefficient between $y$ and $y_pred$, often called "r-squared".  So $r^2$=1 is equivalent to explaining all of the variance. In ANOVA terminology, $r^2$ is also known as "eta squared", $\eta^2$.

Scikit has us covered in that regard and so we are able to use `.predict()` to help create a plot to help explain the variance. 

In [None]:
y_pred_b1=regr_b1.predict(stim_b1)

Now we can use the `y_pred_b1` to generate our prediction model. For these sets of plots, we can use the answer from exercise 3 as a base. Remember to change the x and y values to reflect the use of the variance explained equation.

Remember, `y_pred_b1` replaces only one of the y axis data sets. When you are done tweaking the code you can import the `explained_variance_score` function from `sklearn.metrics`. Use the function to compare the two y values( $y$ and `y_pred_b1`) and save it to `r2`. Print the number value using ("Variance explained:").

In [None]:
# Answer
f,ax = plt.subplots(2,1,figsize=(15,5))
ax[0].set_ylabel('Amplitude (fraction 80dB)')
ax[0].plot(t,X_b1[:,0], label='contra')
ax[0].plot(t,X_b1[:,1]+0.2, label='contra')
ax[0].legend()

ax[1].set_ylabel('Spikes/sec')
ax[1].set_xlabel('Time (sec)')
ax[1].plot(t,y_b1,label='actual')
ax[1].plot(t,y_pred_b1,label='predicted')
ax[1].legend()

from sklearn.metrics import explained_variance_score
r2=explained_variance_score(y_b1, y_pred_b1)
print("variance explained:", r2)

### Exercise 9 - Create another PSTH plot and stimulus plot to show the data for Neuron a1

Now that we have gone through each individual process, we now have the tools to make the process easier for future data sets. Let's begin again by create a nother set of plots for `neuron_a1`. Since we've gone through the process, let us immediately generate a set of plots with the stimulus already separated.

In [None]:
fig,ax=plt.subplots(nrows=2,ncols=1,figsize=(15,5))
ax[0].plot(t,stim_a1[:,0])
ax[0].plot(t,stim_a1[:,1]+0.2)
ax[0].legend(('contra','ipsi'))
ax[0].set_ylabel('Amplitude (fraction 80dB)')
ax[1].plot(t,psth_a1)
ax[1].set_xlabel('Time (sec)')
ax[1].set_ylabel('Spike/sec')

If you study this plot, you might guess that the response for this neuron is much more strongly driven by the contra channel than the ipsi channel. Note how the peaks in the response curve align with peak in the contra stimulus.

### Exercise 10 - Cut down on the work and create a function for this whole process

Let's make quantification easier. We are going to create a prediction model requiring two inputs `X` and `Y` and optional parameter `fs=100` (which is used to generate the correctly labeled time axis). 

We have already worked through all of the parts of the function, so your job so look through the noteook. The function should return a tuple of results: `(slopes, variance_explained)`, so that the user has access to these values for subsequent analysis. `slopes` is a vector of two beta weights for the fit.

`Hint: ` use the code for `variance explained`, and `PSTH/Stimulus` plots. Also remember to add in the set of code that uses the `sklearn.metrics` from above.

In [None]:
# Answer
def prediction_model(X, Y, fs=100):

    regr = linear_model.LinearRegression()
    regr.fit(X, Y)

    Y_pred=regr.predict(X)

    f,ax = plt.subplots(2,1,figsize=(15,5))
    t=np.arange(X.shape[0])/fs
    ax[0].plot(t,X[:,0], label='contra')
    ax[0].plot(t,X[:,1]+0.2, label='ipsi')
    ax[0].set_ylabel('Amplitude (fraction 80dB)')
    ax[0].legend()

    ax[1].plot(t,Y,label='actual')
    ax[1].plot(t,Y_pred,label='predicted')
    ax[1].set_ylabel('Spikes/sec')
    ax[1].set_xlabel('Time (sec)')
    ax[1].legend()
    
    r2= explained_variance_score(Y, Y_pred)
    answer= print("Variance explained:",r2)
    slopes=regr.coef_
    return slopes,answer

## Show the prediction model for neuron a1

Now, use `prediction_model` to generate a scatterplot using the $X$ and $Y$ values, which are the stimulus and the recalulated psth for neuron a1.

In [None]:
prediction_model(stim_a1, psth_a1)

### Exercise 11 - Create a function to display the scatterplot

This is another process that can be streamlined through the use of functions. This function should be named `scatter_plt` with its parameters being $X$ and $Y$ the independant and dependant variables, respectively. This function can also be built by pulling some of the previous information from the notebook. 

`Hint:` Remember to use the codes associate with `regr`, as well as the previous code for our neuron_b1 scatterplot. Change the units as needed.

In [None]:
# Answer
def scatter_plt(X,Y):
    """
    parameters:
       X - stimulus (independent variables)
       Y - response (dependent variable)
    """
    regr = linear_model.LinearRegression()
    regr.fit(X, Y)
    regr.coef_
    
    f,ax=plt.subplots(1,2, figsize=(6,3), sharex=True, sharey=True)
    intercept = regr.intercept_
    slopes = np.array(regr.coef_)
    x_lim=np.array([X.min(),X.max()])
    ax[0].scatter(X[:,0],Y,s=3)
    ax[0].plot(x_lim,intercept+x_lim*slopes[0])
    ax[0].set_title("Contra")
    ax[0].set_ylabel('Spikes/sec')
    ax[0].set_xlabel('Amplitude (fraction 80dB)')
    ax[1].scatter(X[:,1],Y,s=3, color='orange')
    ax[1].plot(x_lim,intercept+x_lim*slopes[1], color='orange')
    ax[1].set_title("Ipsi")
    ax[1].set_ylabel('Spikes/sec')
    ax[1].set_xlabel('Amplitude (fraction 80dB)')
    
    plt.show()
    
    return plt.show()

## Show the newly created scatterplots for neuron a1

Now, use `scatter_plt` to generate a scatterplot using the $X$ and $Y$ values, which are still the stimulus and the recalulated psth of neuron a1.

In [None]:
scatter_plt(stim_a1,psth_a1)

Note that the slope for the ipsi channel is very nearly 0, consistent with the observation that neuron a1 is largely driven by the contra stimulus.

### Comparison of the slopes of all five neuronal data sets

Creating the functions mean that we can now rapidly go through the rest of the data sets. Let's take a look on how these other three neurons differ in response to the same stimulus.

### Load the extra data

In [None]:
stim_c1 = np.loadtxt(open('data/por026c-c1_stim.csv'), delimiter=",")
raster_c1= np.loadtxt(open('data/por026c-c1_raster.csv'), delimiter=",")
stim_b2=np.loadtxt(open('data/por026c-b2_stim.csv'), delimiter=",")
raster_b2=np.loadtxt(open('data/por026c-b2_raster.csv'), delimiter=",")
stim_d1=np.loadtxt(open('data/por026c-d1_stim.csv'), delimiter=",")
raster_d1=np.loadtxt(open('data/por026c-d1_raster.csv'), delimiter=",")

In [None]:
# compute PSTH across trials and adjust from spikes/samples to spikes/sec
fs = 100  # samples/sec
psth_c1=np.mean(raster_c1, axis=1) * fs
psth_b2=np.mean(raster_b2, axis=1) * fs
psth_d1=np.mean(raster_d1, axis=1) * fs

### Fit the model and create stimulus/PSTH plots for the three new neurons

In [None]:
slopes_b2,answer=prediction_model(stim_b2,psth_b2)

In [None]:
slopes_c1,var_c1=prediction_model(stim_c1,psth_c1)

In [None]:
slopes_d1,var_d1=prediction_model(stim_d1,psth_d1)

In [None]:
scatter_plt(stim_b2,psth_b2)

In [None]:
scatter_plt(stim_c1,psth_c1)

In [None]:
scatter_plt(stim_d1,psth_d1)

Let's also re-run the analysis for the first two neurons.

In [None]:
slopes_a1,var_a1=prediction_model(stim_a1,psth_a1)
slopes_b1,var_b1=prediction_model(stim_b1,psth_b1)

At this point, we want to now compare the slopes of all five data sets so that we can truly visualize which stimulus channel drove the response of each neuron.

In [None]:
slopes_all=np.array([slopes_a1, slopes_b1,slopes_b2,slopes_c1,slopes_d1])
df = pd.DataFrame({'cell': ['a1','b1','b2','c1','d1'],
                   'contra': slopes_all[:,0], 'ipsi': slopes_all[:,1]})
df

In [None]:
slope_max=slopes_all.max()
f,ax=plt.subplots(1,1,figsize=(3,3))
df.plot.scatter('contra','ipsi', ax=ax)
ax.plot([0,slope_max],[0,slope_max],'k--')
for i,r in df.iterrows():
    ax.text(r['contra'],r['ipsi'],r['cell'])
