# TAD Week 6: Shock the Monkey


Data a from Salzman et al. (1992) "Microstimulation in visual area MT:
Effects on direction discrimination performance," J. Neurosci.
12(6):2331-2355.

RTB wrote it, 26 January 2017
RTB adapted for self-test, 09 October 2017, rainy day, EB translated to Python from Matlab 29 September 2021

What to do: Login to Learning Catalytics (LC) and join the session for
the module entitled "Shock the Monkey". You will answer a series of
questions based on the guided programming below. Each section begins with
a '%%'. Read through the comments and follow the instructions provided.
In some cases you will be asked to answer a question, clearly indicated
by 'QUESTION' and a corresponding 'Q#' that directs you to answer the
relevant question in LC. In other cases, you be asked to supply missing
code, indicated by 'TODO'.

In [None]:
import numpy as np
import scipy.stats
from scipy.stats import norm
import scipy.io as sio
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.optimize
import statsmodels.api as sm
import statsmodels.stats as stats
matplotlib.rcParams.update({'font.size': 18})

## Read in the data from a Newsome lab microstimulation experiment


In [None]:
# Load in data file 
!gdown --id 1zD2X7df8hhgg1dYc1LFT6g4R_LeljxgS


In [None]:
filename = 'es5bRaw.xlsx'
data = pd.read_excel(f'/content/{filename}')

data.head()

Each row is data from a single trial during which the monkey viewed a
stochastic motion display whose signal strength was varied systematically
(Coh) and whose direction was chosen according to the preferred direction
of neurons at the stimulation site. At the end of each trial, the monkey
chose one of two possible directions: preferred direction (PDchoice = 1)
or the null direction (PDchoice = 0). Positive values of Coh indicate a
stimulus moving in the preferred direction; negative values indicate null
direction motion. On any given trial, microstimulation was applied to the
electrode with a probability of 0.5 (Mstim = 1 on microstim trials; 0 on
control trials).

Columns:

1) Mstim: 0/1 for absence/presence of microstimulation

2) Coh: Signed strength of the visual stimulus (% coherent motion)
   positive values indicate motion in the neuron's 'preferred' direction;
   negative values correspond to the opposite, or 'null', direction
   
3) PDchoice: 0/1 for monkey choice in the null/preferred direction

Our scientific questions are "Did the microstimulation influence the
monkey's perceptual decisions? If so, by how much?"


## Plot the data

TODO: We want a plot in which the x-axis is the signed correlation value;
y-axis is proportion of preferred decisions. Stim trials in red; no-stim
trials in black. NOTE: Because this step is rather time-consuming, I've
provided the code here, so you don't need to do anything except execute
it to get the figure. But please read through the code and make sure you
understand how it works. And there are LC questions to be answered afterwards.
We want a plot in which the x-axis is the signed correlation value;
y-axis is proportion of preferred decisions. Stim trials in red; no-stim
trials in black. 

In [None]:
# We first need to find all of the different stimulus conditions by dropping duplicate rows of Mstim/Coh
all_data_prop = data[['Mstim', 'Coh']].drop_duplicates()

# Sort values by coherence (for plotting later)
all_data_prop = all_data_prop.sort_values(by = 'Coh', ignore_index = True)

# Get number of conditions
n_conds = all_data_prop.shape[0]

# Add three columns of zeros to all_data_prop to hold the calculated proportions
# and confidence intervals
all_data_prop['Proportion Preferred'] = 0
all_data_prop['Lower CI'] = 0
all_data_prop['Upper CI'] = 0

# For error bars of the 95% CI
my_alpha = 0.05  

# For each unique condition, we need to find all of the corresponding rows
# in our original raw data file and tally the proportion of those trials on
# which the monkey chose the preferred direction choice target
for k in range(n_conds):

    # Get rows of original data that correspond to this condition
    this_cond = data[(data['Mstim'] == all_data_prop['Mstim'].iloc[k]) & (data['Coh'] == all_data_prop['Coh'].iloc[k])]
    
    # Get proportion of preferred direction choices
    all_data_prop.loc[k, 'Proportion Preferred'] = this_cond['PDchoice'].sum() / this_cond.shape[0]

    # Get confidence intervals of this binomial proportion
    [all_data_prop.loc[k, 'Lower CI'], all_data_prop.loc[k, 'Upper CI']] = stats.proportion.proportion_confint(this_cond['PDchoice'].sum(), this_cond.shape[0], alpha = my_alpha, method = 'beta')

# NOTE: The 'errorbar' function plots error bars that are L(i) + U(i) long.
# That is, it doesn't treat our CI as an interval, but rather as a distance
# from the mean to the end of each error bar. So we need to subtract each
# from the mean:
all_data_prop['Lower CI'] = all_data_prop['Proportion Preferred'] - all_data_prop['Lower CI']
all_data_prop['Upper CI'] = all_data_prop['Upper CI'] - all_data_prop['Proportion Preferred'] 


# Convert Mstim column to a logical for indexing
all_data_prop['Mstim'] = all_data_prop['Mstim'].astype('bool')

# Make plot
fig, ax = plt.subplots(1, 1, figsize = (10, 10))

ax.errorbar(x = all_data_prop.loc[all_data_prop['Mstim'], 'Coh'], 
            y = all_data_prop.loc[all_data_prop['Mstim'], 'Proportion Preferred'],
            yerr = pd.concat([all_data_prop.loc[all_data_prop['Mstim'], 'Lower CI'], 
                              all_data_prop.loc[all_data_prop['Mstim'] == 1, 'Upper CI']], axis = 1).values.T, 
            linestyle = '', marker = 'o', color = 'r', label = 'Stim')

ax.errorbar(x = all_data_prop.loc[~all_data_prop['Mstim'], 'Coh'], 
            y = all_data_prop.loc[~all_data_prop['Mstim'], 'Proportion Preferred'],
            yerr = pd.concat([all_data_prop.loc[~all_data_prop['Mstim'], 'Lower CI'], 
                              all_data_prop.loc[~all_data_prop['Mstim'] == 1, 'Upper CI']], axis = 1).values.T, 
            linestyle = '', marker = 'o', color = 'k', label = 'No stim',)
ax.legend();

ax.set(xlabel = 'Motion strength (%coh)', 
       ylabel = 'Proportion preferred decisions')

We wanted you to understand what's actually being plotted by going through the code above, but note that we could have made this plot in one line in seaborn - see the next cell. The beauty of seaborn...

In [None]:
fig, ax = plt.subplots(1, 1, figsize = (10, 10))

sns.lineplot(data = data, x = 'Coh', y = 'PDchoice', hue = 'Mstim', 
             err_style='bars', palette = ['k', 'r'], linestyle = '', marker = 'o');

**QUESTION (Q1)**: How many trials did the monkey perform in this experiment?

**QUESTION (Q2)**: How many unique types of trial were there?

**QUESTION (Q3)**: How many repetitions of each trial type did the monkey
perform?

**QUESTION (Q4)**: The error bars in the figure correspond to the 95% CI from
the binomial distribution. Name two different ways in which we could make
the error bars smaller?


## Fit full model using statsmodel

TODO: Write down the full regression model, including an interaction term
for microstimulation and signal strength (we will make this a new pandas column). Remember that you want to
pass the RAW data as your arguments to 'sm.GLM'


In [None]:
data['Interaction'] = ...

model = sm.GLM(...)

results = model.fit()

print(results.summary())

**QUESTION (Q5)**: Is the interaction term statistically significant at p < 0.05?


TODO: If your answer to the question is 'no', re-do the fit without the interaction term.

In [None]:
# The significance test for the interaction term is results.pvales
if results.pvalues['Interaction'] > 0.05:
    model = sm.GLM(...)

    results = model.fit()

    print(results.summary())

**QUESTION (Q6)**: What is the scientific meaning of beta0 (i.e. the
y-intercept) in our model?

**QUESTION (Q7)**: Look at your graph. What would a value of 0 for the beta0
coefficient correspond to in terms of the probability of the monkey
making a preferred decision choice on trials where there was no
microstimulation and the motion strength was zero?

**QUESTION (Q8)**: Did microstimulation affect the monkey's choices at a
significance level of p < 0.05?

**QUESTION (Q9)**: What is the p-value for the model parameter capturing the
effect of microstimulation on the monkey's choices?


## Plot the model fits

TODO: Plot the regression lines on top of the raw data. Make a separate
line for the stim (red line) and no-stim (black line) predictions.
Remember that our y-axis is in units of 'proportion preferred decisions'
and NOT in log(P/1-P). HINT: You need to solve for 'P':

P = 1 / {1 + exp[-(b0 + b1 * stim + b2 * coh)]}


In [None]:
# Make coherence vec to compute predictions and get better plots
coh = np.arange(data['Coh'].min(), data['Coh'].max(), .01)

fig, ax = plt.subplots(1, 1, figsize = (10, 10))

sns.lineplot(data = data, x = 'Coh', y = 'PDchoice', hue = 'Mstim', 
             err_style='bars', palette = ['k', 'r'], linestyle = '', marker = 'o', ax = ax)


p_stim = ...
p_no_stim = ...

...  # your plotting code here. Hint: probably easiest to just use ax.plot here instead of seaborn

## Determine the "equivalent visual stimulus" for microstimulation

We want to know not just "if" microstimulation had an effect; we would
also like to estimate the magnitude of the effect. Look at the plot you
made. You can see that there is a certain signal strength at which the
monkey is equally likely to choose the preferred vs. the non-preferred
direction of motion. To do this, we would draw a horizontal line from the
y-axis value of 0.5 to our curve, then drop a vertical line to the x-axis
to get a value of the motion strength. We call this value the "Point of
Subjective Equality" (PSE), because it represents the visual stimulus for
which the monkey was indifferent to the two choices. We can calculate the
effect of microstimulation in units of the visual stimulus (%coh) by
subtracting the PSE during microstimulation trials from that during
control trials.

TODO: Determine the "equivalent visual stimulus" for microstimulation

Two possible approaches:

1) Algebra. Using the regression equation, solve for the signal strength
at which the animal is equally likely to report preferred or null. This
is referred to in the psychophysical literature as the "Point of
Subjective Equality" or "PSE". We calculate the signal strength at PSE
for the stim curve and subtract this from the signal strength at PSE for
the ctrl curve. This will give us an equation in terms of the beta
parameters in our model. Very elegant!

2) Brute force. We have the regression equation, so we can input a very
finely spaced set of coh values and find the one that gives us a value of
0.5. Since we're unlikely to get exactly 0.5, we would choose some narrow
range straddling 0.5 and then take the average. You'll need to use your
indexing chops on this one.


In [None]:
... # Your code here

**QUESTION (Q10)**: How much signal would need to be added to the random dot
display in order to match the effect of microstimulation on the monkey's
choices? Give your answer as a positive percentage to 1 decimal place.

**QUESTION (Q11)**: Upload your final figure to Learning Catalytics.

Be sure to also answer questions 12-14 on LC. They refer to the same
analysis on a different data set (js92dRaw.xlsx); one where there is a
significant interaction term. You don't need to do any additional coding,
just think about the resulting model and answer the 3 questions about the
interaction term. Hint: Write down the equation!!!


In [None]:
!gdown --id 1doXgs3xKi4EdbkN-WvIqTBXHdAoEnFYv

filename = 'js92dRaw.xlsx'
data2 = pd.read_excel(f'/content/{filename}')

data2.head()

## Bonus questions

1) The GLM is a beautiful framework, because we get so much "for free,"
such as standard errors on our betas as well as significance tests for
whether or not they are different from 0. But let's suppose this wasn't
the case. How would you design your own procedure to explicitly test our
primary null hypothesis, which is that microstimulation has no effect on
the monkey's choices?

2) We have a nice point estimate an "effect size" (or "equivalent visual
stimulus") of 14.4% coh, but we would also like to know its precision.
Design a procedure to determine the standard error for the effect size.

3) The logistic regression model you built can be used as a classifier.
The idea is that you would fit the model to some random subset of the
data, then ask how well that model predicts the animal's performance on
the remaining trials (i.e. those NOT used to fit the model). This is
referred to as the 'cross-validated performance' of our classifier
(similar to the cross-validation we did in week #4). Determine the
performance of the logistic regression model using 5-fold
cross-validation (HINT: 'crossvalind') and compare this performance with
that of linear discriminant analysis ('classify') and that of a support
vector machine ('fitcsvm').