## CCNSS 2018 module 2
# Tutorial 3 - Psychophysics and probabilistic modelling



*Please execute the cell below to initialize the notebook environment*

In [2]:
from IPython.display import HTML
from IPython.display import display
import matplotlib.pyplot as plt    # import matplotlib
import numpy as np                 # import numpy
import scipy as sp                 # import scipy
import  pandas  as  pd             # import pandas
import math                        # import basic math functions
import random                      # import basic random number generator functions
import time                        # import time function to time minimize
from scipy.optimize import minimize #import optimization functions
from scipy.stats import beta       # import beta distribution for BetaBinomial model

fig_w, fig_h = (6, 4)
plt.rcParams.update({'figure.figsize': (fig_w, fig_h)})
plt.style.use('ggplot')

In [3]:
# This code allows to call the function 'hide_toggle' that shows/hides solutions for each exercise

def hide_toggle(for_next=False):
    this_cell = """$('div.cell.code_cell.rendered.selected')"""
    next_cell = this_cell + '.next()'

    toggle_text = 'Show/hide Solution below'  # text shown on toggle link
    target_cell = this_cell  # target cell to control with toggle
    js_hide_current = ''  # bit of JS to permanently hide code in current cell (only when toggling next cell)

    if for_next:
        target_cell = next_cell
        toggle_text += ' '
        js_hide_current = this_cell + '.find("div.input").hide();'

    js_f_name = 'code_toggle_{}'.format(str(random.randint(1,2**64)))

    html = """
        <script>
            function {f_name}() {{
                {cell_selector}.find('div.input').toggle();
            }}

            {js_hide_current}
        </script>

        <a href="javascript:{f_name}()">{toggle_text}</a>
    """.format(
        f_name=js_f_name,
        cell_selector=target_cell,
        js_hide_current=js_hide_current, 
        toggle_text=toggle_text
    )

    return HTML(html)

## Objectives

In this notebook we'll clean-up data from a 2-Alternatic Forced Choice task and plot psychometric functions. 

We will also analyse behavioural data from a task that measure participant innate prior, and implement a conjugate Beta-Bernoulli model that performs the task optimally (ideal-observer) and can recover the participants priors from their behaviour.

## Background

Psychophysics quantitatively investigates the relationship between physical stimuli and the sensations and perceptions they produce. It is a general class of methods that can be applied to study a perceptual system. Modern applications rely heavily on threshold measurement, ideal observer analysis, and signal detection theory.

In psychophysics, experiments seek to determine whether the subject can detect a stimulus, identify it, differentiate between it and another stimulus, or describe the magnitude or nature of this difference.

A psychometric function is an inferential model applied in detection and discrimination tasks. It models the relationship between a given feature of a physical stimulus, (e.g. velocity, duration, brightness, weight etc.), and forced-choice responses of a human test subject. The psychometric function therefore is a specific application of the generalized linear model (GLM) to psychophysical data. The probability of response is related to a linear combination of predictors by means of a sigmoid link function (e.g. probit, logit, etc.).

We analyze data from a two-alternative forced choice task (e.g. the random-dot-task). In these tasks ambiguous evidence for two alternative choices is presented to an observer. The ambiguity results in imperfect performance, that varies with the strength of the ambiguity. This relationship is quantified by the "psychometric function".

***EXERCISE 1 : Data wrangling/munging with Pandas***

Loading the data

***Suggestions***

* Read the data from file 'dots_psychophysics.txt'
* Each line is one trial, the columnns encode
    - coherence of random dot pattern
    - direction of random dot pattern
    - the direction the monkey chose
    - if the monkey was rewarded
    - the monkey's reaction time
* direction / choice is encoded as: 1 = 0% coherence, 2 = left stimuli, 3 = right stimuli
* on 0% coherence trials the monkey was rewarded randomly
* replace numbers '2' and '3' by 'left' and 'right' respectively, and rewarded '1' and '0' into booleans (hint: look up the function 'replace()' in pandas)
* print the first 5 rows of the data (hint: look up function 'head()' in pandas )


In [4]:
#insert your code here

hide_toggle(for_next=True)

In [None]:
### Solution

dotsData = pd.read_csv('data/dots_psychophysics.txt', 
                       delimiter=' ', skipinitialspace=True, index_col=None, header=None, 
                       names=['coherence', 'direction', 'choice', 'rewarded', 'rt'])

dotsData.replace({'direction': {1.: '0%', 2.: 'left', 3.: 'right'},
                  'choice': {1.: '0%', 2.: 'left', 3.: 'right'},
                 }, inplace=True)

dotsData['rewarded'] = dotsData['rewarded'].astype(np.bool)

dotsData.head()

***Expected Output***

![](./figures/expected_ex1.png)

***EXERCISE 2: Data manipulation with Pandas***

Checking the data

***Suggestions***

* Check that whenever coherence is 0, direction also encodes the 0% stimulus, and print 'True' or 'False'
* Plot the distribution of choices when coherence == 0
* Check that for non-zero-coherence trials, whenever the direction == choice then rewarded == True. Print 'True' or 'False'

In [5]:
#insert your code here

hide_toggle(for_next=True)

In [None]:
### Solution

check1 = dotsData[dotsData['coherence'] == 0.] \
        ['direction'] \
        .unique() \
        == '0%'
        
coherentTrials = dotsData[dotsData['coherence'] != 0.] 
check2 = coherentTrials[coherentTrials['direction'] == coherentTrials['choice']] \
        ['rewarded'] \
        .unique()

print('- Check 1 (coherence 0 -> 0% stim): ' + str(check1))
print('- Check 2 (direction == choice -> rewarded?): ' + str(check1))

nChoicesAt0Coherence = dotsData[dotsData['coherence'] == 0.] \
                                .groupby('choice') \
                                .count() \
                                ['coherence'] # all columns contain the same information

# normalize to proportions
fracChoisesAt0Coherence = nChoicesAt0Coherence / nChoicesAt0Coherence.sum()
fracChoisesAt0Coherence.plot(kind='bar')
plt.ylabel('fraction of trials chosen')

***Expected Output***

![](./figures/expected_ex2.png)

***EXERCISE 3 : Data Wrangling and manipulation in Pandas***

Checking the data

***Suggestions***

* Using a bar plot, plot of the number of trials, broken down by stimulus direction and coherence. 

In [6]:
#insert your code here

hide_toggle(for_next=True)

In [None]:
### Solutions

for direction, directionData in dotsData.groupby('direction'):
    plt.figure()
    plt.title('Number of available trials for direction == %s' % direction)
    directionData.groupby('coherence')['coherence'] \
                    .count() \
                    .plot(kind='bar')
    plt.ylabel('number of trials')

***Expected Output***

![](./figures/expected_ex3.png)

***EXERCISE 4: Psychometric function***

A psychometric function is an inferential model applied in detection and discrimination tasks. It models the relationship between a given feature of a physical stimulus, (e.g. velocity, duration, brightness, weight etc.), and forced-choice responses of a human test subject. The psychometric function therefore is a specific application of the generalized linear model (GLM) to psychophysical data. The probability of response is related to a linear combination of predictors by means of a sigmoid link function (e.g. probit, logit, etc.).

***Suggestions***

* Remove aborted trials, i.e. trials in which the monkey chose the stimulus direction but wasn't rewarded
* Plot the psychometric function, i.e. fraction of correct choices vs. coherence  (i.e. plot fraction of right choices for all coherence levels, where for left choices coherence is set to negative values)

In [7]:
#insert your code here

hide_toggle(for_next=True)

In [None]:
### Solution

cond_aborted = (dotsData['coherence'] > 0) & \
         (dotsData['direction'] == dotsData['choice']) & \
         (dotsData['rewarded'] == False)
dotsData = dotsData[~cond_aborted]

zeroPsychometricFunction = pd.Series((dotsData[dotsData['direction'] == '0%']['choice'] == 'right').mean(), index=[0.])
    
rightPsychometricFunction = dotsData[dotsData['direction'] == 'right'] \
                            .groupby('coherence') \
                            .apply(lambda df: (df['choice'] == 'right').mean())
        
leftPsychometricFunction = dotsData[dotsData['direction'] == 'left'] \
                            .groupby('coherence') \
                            .apply(lambda df: (df['choice'] == 'right').mean())
# flip sign of coherence (which is the index) for leftPsychometricFunction to combine left and right afterwards
leftPsychometricFunction.index *= -1

# combine the three psychometric functions in one Series
psychometricFunction = pd.concat([leftPsychometricFunction, zeroPsychometricFunction, rightPsychometricFunction]) \
                            .sort_index()

psychometricFunction.plot()
plt.xlabel('Coherence')
plt.ylabel('Fraction of right choices')
plt.title('Psychometric function')

***Expected Output***

![](./figures/expected_ex4.png)

# Probabilistic modelling

## Background : Optimism Task

Optimists hold positive a priori beliefs about the future. In Bayesian statistical theory, a priori beliefs can be overcome by experience. However, optimistic beliefs can at times appear surprisingly resistant to evidence, suggesting that optimism might also influence how new information is selected and learned. 

Here, we will model how inate optimistic biases influence behaviour in a Pavlovian conditioning task. That is, how participants prior beliefs about *'How likely is something good to happen'* will change their behaviour in a pavlovian instrumental task. 

***Task***

The experiment contained two types of screens (See figure below): 
    1 - a series of observation screens which subjects have to passively observe. On each of these screens a fractal stimulus was shown to be associated with a binary reward (the presentation of the fractal was followed after 700 ms by the presentation of a full treasure chest) or not (the fractal led to an empty chest); 
    2 - Participants were then asked to choose between a fractal stimulus (that was observed a couple of times paired with treasures or not),  and a blue square with known probability of winning. In this decision phase, the subjects were told to maximize reward gains.

![](./figures/Optimism_task.png)

In the example above, the subject sees the yellow fractal twice. Once associated with a reward (full treasure chest), and once without a reward (empty treasure chest). That is, the expected value of the yellow fractal is 0.5 (i.e.: 1 reward + 0 reward / 2 presentations). The participant then arrives to a decision screen (D1), where he/she needs to make a 2-Alternative Force Choice (2-AFC) between the yellow fractal , or the blue target. 

The probability of the blue target to result in a reward (expected value), is 0.6, denoted by the number of black dots out of 10 (i.e. 6 black dots out of 10). In this particular example, the participant should choose the blue target if he/she wants to maximize the chance of getting a reward on that trial. This is because the expected value of the yellow fractal is 0.5, while the known expected value of the target is 0.6. This means that the target has a higher probability of leading to a rewards than the fractal.

Participants see 60 different fractals, seeing each fractal from 3 and up to 8 times each. Since participants cannot accurately count and remember the number of times each fractal was presented with a reward, they must rely on heuristics and/or an appoximation and/or 'gut-feeling' about the *'goodness of a fractal'* based on previous experience with that fractal. This is where we believe the participants' *'optimism bias'* will come into play and shift their decisions.

[Ref] http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003605

***EXERCISE 5***

Loading the data

***Suggestions***

* Read the data from file 'Optimism_data.csv' or 'Optimism_data.txt'
* Each line is one decision trial, the columnns encode
    - Column 1: the subject number
    - Column 2: the number of times a fractal was rewarded
    - Column 3: the number of times a fractal was presented
    - Column 4: (ignore for now) 
    - Column 5: the target probability of reward
    - Column 6: Whether participant chose the fractal (1) or target (2)
    - ignore other columns
* decision / choice is encoded as: 1 = chooses fractal stimulus, 2 = chooses target stimulus
* replace numbers '1' and '2' by 'fractal' and 'target' respectively
* print the first 5 rows of the data (hint: look up function 'head()' in pandas )

In [8]:
#insert your code here

hide_toggle(for_next=True)

In [None]:
### Solution

df = pd.read_csv('/Users/vincentvalton/Documents/GitHub/ccnss2018/module2/Psychophysics and probabilistic modelling/data/Optimism_data.csv',delimiter=',')

df.replace({'choice': {1.: '1', 2.: '0'},
                 }, inplace=True)

df['choice'] = df['choice'].astype(np.float)

df.head()

***Expected Output***

![](./figures/expected_ex5.png)

***EXERCISE 6***

Calculate and plot the probability of accepting 'target' as a function of the difference between the true expected value of the fractal and that of the target.

***Suggestions***

* For each trial, calculate the difference between the fractals' and the targets' expected reward
* Bin the difference values into bins of [fractal-target] with the following range : freq_bins=[-1.01, -0.5, 0, 0.5, 1.01]
* Calculate the mean acceptance rate per bin and plot it for subject 24.
* Fit a sigmoidal response curve to the participant data and plot it (hint: use the sigmoid function below, you may want to use `scipy.optimize.curve_fit`)

In [9]:
#insert your code here    

hide_toggle(for_next=True)

In [None]:
### Solution

from scipy.optimize import curve_fit

def sigmoid(x, x0, k):
    y = 1 / (1 + np.exp(-k*(x-x0)))
    return y

hide_toggle(for_next=True)

In [None]:
### Solution

# Create column 'diff_frac2targ' which computes difference in expected value between fractal and target
df['diff_frac2targ']=df['bin_coef']-df['target_coef']

# Define the bins for plotting psychometric curve
freq_bins= np.array([-1.01, -0.5, 0, 0.5, 1.01])

#Create 'binned' column and bin each trial as belonging to either bin defined in freq_bins
df['binned']= pd.cut(df['diff_frac2targ'], bins=freq_bins)

#Select data for subject 1 only
df1 = df.loc[(df['subject_id']==24)]

#Groupby data for subject 1
choice_groupby_binned = df1['choice'].groupby(df1['binned'])
df1['choice'].groupby(df1['binned']).describe()

#Print p(choose fractal) as function of difference in value between fractal & target
plt.figure()
my_vals=np.array(choice_groupby_binned.mean())
choice_groupby_binned.mean().plot(xticks=range(len(my_vals)))
plt.ylabel('Probability of choosing `fractal`')
plt.xlabel('Binned: Difference in reward probability (Fractal-Target)')
plt.title('Psychometric function, Subject: '+ str(24))

#Print sigmoid fit to the data
my_vals=np.array(choice_groupby_binned.mean())
my_vals=my_vals[~np.isnan(my_vals)]
popt, pcov = curve_fit(sigmoid, range(len(my_vals)), my_vals)
plt.plot(range(len(my_vals)), my_vals, 'o', label='data')
plt.plot(np.linspace(0, len(my_vals)-1, 100),sigmoid(np.linspace(0, len(my_vals)-1, 100), *popt), label='Sigmoidal fit')
plt.legend(loc='best')
plt.show()

***Expected Output***

![](./figures/expected_ex6.png)

***(BONUS) EXERCISE 7***

Calculate and plot the psychometric function for all subject 1 to 50

***Suggestions***

* Loop over participants, and process their data sequentially
    * Use your previous function which calculates the psychometric curve for a given participant data.
    * Plot the participant psychometric curve with an alpha=0.4
* Note how subjects can have different psychometric function that are shifted either left or right; What do you think this means? (hint: think of it in terms of a bias -- optimism/pessimism)

In [10]:
# insert your code here

hide_toggle(for_next=True)

In [None]:
### Solution

plt.figure()
plt.ylabel('Probability of choosing `fractal`')
plt.xlabel('Binned: Difference in reward probability (Fractal-Target)')
plt.title('Psychometric function for all subjects')

#Select data for subject 1 only
for i in range(len(df['subject_id'].unique())):
    
    df_subj = df.loc[(df['subject_id']==i+1)]

    #Groupby data for subject i
    choice_groupby_binned = df_subj['choice'].groupby(df_subj['binned'])
    
    #Print sigmoid fit to the data
    my_vals=np.array(choice_groupby_binned.mean())
    my_vals=my_vals[~np.isnan(my_vals)]
    
    plt.plot(range(len(my_vals)), my_vals, '-b', alpha=0.1, label='data')
    plt.xticks(range(len(my_vals)), ['[-1,-0.5]' ,'[-0.5,0]' ,'[0,0.5]' ,'[0.5,1]'])

***Expected Output***

![](./figures/expected_ex7.png)

## Background : Probabilistic model

We will now simulate an optimal observer model for this task.

To do so we will use a Beta-Binomial model, with a softmax link function to add some noise in the responses. 

The probability density for a Binomial distribution is defined as:

\begin{align*} Bern\left(N_i,n_i\right) = \binom{N_i}{n_i} \cdot c_{i}^{n_{i}} (1-c_{i})^{N_{i}-n_{i}}  \end{align*}

where, $n$ is the number of times a fractal was rewarded, and $N$ is the number of times it was presented, $c$ is the probability of being rewarded for that fractal. The maximum likelihood estimates of $c$ can be calculated analytically and correspond to:

\begin{align*} P\left(c\mid n_i, N_i\right) = \frac{n_i}{N_i}\end{align*}

---

The standard $Beta$ distribution gives the probability density of a value $x$ on the interval (0,1):

\begin{align*} Beta\left(\alpha,\beta\right) = \frac{x^{\alpha-1}(1-x)^{\beta-1}} {B(\alpha,\beta)}
\end{align*}

where , $\beta$ is the Beta function. As you can see the numerator of the $Beta$ distribution has the same functional form as the Binomial distribution. 

This is because they come from the same family of probability distributions. This means that the $Beta$ prior is a 'conjugate prior' for the Binomial likelihood. We know the analytical form of the posterior distribution for a BetaBinomial conjugate model, such that the posterior is another Beta distribution. As a result we can simplify our model to taking the expectation (mean) of the posterior (a Beta distribution). That is:

\begin{align*} \hat{C_i}=\frac{n_i+\alpha}{N_i+\alpha+\beta}\end{align*}
    
The Softmax link function is defined as:
    
\begin{align*} P(action=fractal)= \frac{e^{\left(\hat{C_i}/ \tau\right)}}{e^{\left(\hat{C_i} / \tau \right)} + e^{\left({C_{target}}/ \tau \right)}} \end{align*}

where $\tau$ is a temperature parameter that controls stochasticity (the higher $\tau$ the more random the behaviour, the lower $\tau$, the more deterministic the behaviour is).

**EXERCISE 8**

Implement the Softmax function 

**Suggestions**

* Implement the Softmax function described above that takes an array of 2 values $x$ as input, as well as a parameter $\tau$. (hint: for numerical stability you may want to subtract the maximum of the array $x$ to each item in $x$)
* Now test your function using the following values, and plot the probability of chosing option 2 as a function of $\tau$ on the same graph :
    * $x=[0,1],\tau=0.01$, changing $\tau$ in steps of 0.01 up to 1.2
    * $x=[0.2,0.8],\tau=0.01$, changing $\tau$ in steps of 0.01 up to 1.2
    * $x=[0.4,0.6],\tau=0.01$, changing $\tau$ in steps of 0.01 up to 1.2
    * $x=[0.49,0.51],\tau=0.01$, changing $\tau$ in steps of 0.01 up to 1.2
* See how increasing the temperature leads to more 'random' behaviour (choosing the best option -- option 2), while low temperature leads to a 'greedy-like' behaviour (always choosing the option with the highest value). This is why we say that the temperature parameter $\tau$ controls the exploration/exploitation trade-off.
* (optional) Create an interactive bar plot of the softmax probability for $x=[0.4, 0.6, 0.5]$ and interactive values of $\tau$. See how $\tau$ changes increases/decreases the distance between the two options as we decrease/increase $\tau$ respectively. 

In [11]:
# insert your code here

def get_softmax(x,tau):
    """
    Function that implements the softmax link function
    ----------
    x: array of 2 values 
        Success rates Ci for each option 1 & 2 (in our case success probability of Fractal vs. Target)
    tau: integer > 0 
        Temperature parameter
    
    Returns
    -------
    likelihood: 
        probability of chosing each value of x respectively, as a function of tau, and values of x
    """

hide_toggle(for_next=True)

In [None]:
### Solution

def get_softmax(x,tau):
    """
    Function that implements the softmax link function
    ----------
    x: array of 2 values 
        Success rates Ci for each option 1 & 2 (in our case success probability of Fractal vs. Target)
    tau: integer > 0 
        Temperature parameter
    
    Returns
    -------
    likelihood: 
        probability of chosing each value of x respectively, as a function of tau, and values of x
    """
    
    return (np.exp( (x - np.max(x)) / tau) / np.sum(np.exp( (x - np.max(x)) / tau), axis=0))

plt.figure()
tau_sim=np.arange(0.01,1.2,0.01)

#print(tau_sim)

val01 = list()
val28 = list()
val46 = list()
val55 = list()

for i in range(len(tau_sim)):
    probs=get_softmax([0,1],tau_sim[i])
    val01.append(probs[1])
    
for i in range(len(tau_sim)):
    probs=get_softmax([0.2,0.8],tau_sim[i])
    val28.append(probs[1])
    
for i in range(len(tau_sim)):
    probs=get_softmax([0.4,0.6],tau_sim[i])
    val46.append(probs[1])

for i in range(len(tau_sim)):
    probs=get_softmax([0.49,0.51],tau_sim[i])
    val55.append(probs[1])

plt.plot(tau_sim,val01,label='values=[0,1]', lw=5, alpha=0.6)
plt.plot(tau_sim,val28,label='values=[0.2,0.8]', lw=5, alpha=0.6)
plt.plot(tau_sim,val46,label='values=[0.4,0.6]', lw=5, alpha=0.6)
plt.plot(tau_sim,val55,label='values=[0.5,0.5]', lw=5, alpha=0.6)
plt.xlabel('Tau')
plt.ylabel('Probability of chosing x[1] vs. x[0]')
plt.legend(loc='best')

***Expected Output***

![](./figures/expected_ex8.png)

In [12]:
# Optional question --- insert your code here

hide_toggle(for_next=True)

In [None]:
# Solution
from ipywidgets import interact
import ipywidgets as widgets

def plot_hist(tau):
    x=[0.4, 0.6, 0.5]
    plt.bar(range(3), get_softmax(x,tau))
    plt.ylim([0,1])
    plt.ylabel(r"P( choose x_i | values (x1,x2,xN) )")
    plt.xlabel('Action')
    plt.xticks(range(3), ('x1', 'x2', 'x3'))
    plt.show()

interact(plot_hist, tau=(0.01,1.5,0.01));


***EXERCISE 9***

Let's implement the BetaBinomial model and see how the amount of evidence for the Likelihood, or the strength (peakiness) of the prior affects the posterior mean of the BetaBinomial model.

In our case, the prior location (mean) defines the participants' optimism/pessimism, and the prior strength (peakiness) defines the 'strength' of that optimism/pessimism bias. 


***Suggestions***

* Implement the `get_mean_BetaBinomial` function described above that takes arguments the following parameters $a$, $b$, $n$, and $N$. Where $a$ and $b$ control the shape parameters of the Beta distribution, and $n$, $N$ controls the number of rewarded presentations and the total number of presentations respectively.
* Now test your function using the following values, and use the appended plot funciont to plot the probability distribution of Beta given the parameters $a$ and $b$, as well as the mean of the posterior returned by your function :
    * $a=5,b=5$, $n=5$, $N=10$
    * $a=5,b=5$, $n=1$, $N=2$. Although the ratio $\frac{n}{N}$ is the same, see how the number of observation affect the liklihood (try also, $a=1.1,a=1.1$, $n=5$, $N=10$)
* Let's explore how having a pessimistic bias (i.e. low Beta prior) affects our estimations
    * $a=2,b=5$, $n=5$, $N=10$. 
    * $a=2,b=5$, $n=1$, $N=2$. 
* Let's explore how having a strong optimistic bias (i.e. High Beta prior) affects our estimations
    * $a=5,b=1.1$, $n=5$, $N=10$. 
    * $a=5,b=1.1$, $n=1$, $N=2$. 
* For each type of prior (non-informative, pessimistic, optimistic), plot the distribution of the prior, the likelihood as a vertical line, and the mean of the prior as a dashed vertical line.
* See how the relative evidence of the likelihood, and the 'strength' (i.e. peaky-ness) of the prior affects the estimate of the posterior distribution (i.e. the dot-dashed line)

In [13]:
# insert your code here

def get_mean_BetaBinomial(a, b, n, N):
    """
    Function that implements the BetaBinomial conjugate model and returns the mean of the posterior
    
    Parameters
    ----------
    N: integer > 0
        Number of presentations for fractal
    n: integer > 0 & =< N
        Number of rewarded presentations of the fractal
    c: float > [0..1]
    
    Returns
    -------
    negative log-likelihood: 
        probability of occurence given the parameters
    
    """

hide_toggle(for_next=True)

In [None]:
### Solution for functions

def get_mean_BetaBinomial(a, b, n, N):
    """
    Function that implements the BetaBinomial conjugate model and returns the mean of the posterior
    
    Parameters
    ----------
    N: integer > 0
        Number of presentations for fractal
    n: integer > 0 & =< N
        Number of rewarded presentations of the fractal
    c: float > [0..1]
    
    Returns
    -------
    negative log-likelihood: 
        probability of occurence given the parameters
    
    """
    c = ( n + a ) / (N + a + b)
    
    return c

def plot_prior_likelihood_posterior(a, b, n, N, ax, subplot_ids):
    x = np.linspace(0.01,0.99, 100)
    ax[subplot_ids[0],subplot_ids[1]].plot(x, beta.pdf(x, a, b), lw=5, alpha=0.6, label='Prior: a='+ str(a)+' ,b='+str(b))
    ax[subplot_ids[0],subplot_ids[1]].axvline((n/N), label='Likelihood: n='+str(n)+' ,N='+str(N))
    ax[subplot_ids[0],subplot_ids[1]].axvline(get_mean_BetaBinomial(a,b,n,N),linestyle='-.', label='Posterior mean: '+str(round(get_mean_BetaBinomial(a,b,n,N),2)))
    ax[subplot_ids[0],subplot_ids[1]].legend(loc='best', frameon=False)
   
hide_toggle(for_next=True)

In [None]:
### Solution

fig, ax = plt.subplots(3, 2,figsize=(20,15))

plot_prior_likelihood_posterior(1.1,1.1,5,10, ax, [0,0])
plot_prior_likelihood_posterior(1.1,1.1,5,10, ax, [0,1])
plot_prior_likelihood_posterior(5,5,1,2, ax, [0,1])
plot_prior_likelihood_posterior(2,5,1,2, ax, [1,0])
plot_prior_likelihood_posterior(2,5,5,10, ax, [1,1])
plot_prior_likelihood_posterior(5,1.1,1,2, ax, [2,0])
plot_prior_likelihood_posterior(5,1.1,5,10, ax, [2,1])

***Expected Output***

![](./figures/expected_ex9.png)

***EXERCISE 10***

We will now use our Softmax link function and our BetaBinomial function to estimate the $\alpha$, $\beta$, and $\tau$ parameters for each subject.


***Suggestions***

* Complete the function `get_negLL_mean_BetaBinomial(parameters, data)`, where `parameters=[a, b, tau]`
* For each subject, use the optimization function `sp.optimize.minimize` to find the MLE parameters for each subject. Use `initial_guess=[1,1,0.1]` and `bounds=[(0.01,6),(0.01,6),(0.01,0.8)]`. (hint: you may want to use a wrapper as used in the model fitting tutorial)
* For all subjects plot the correlation between true model parameters $tau$ and the prior mean $\frac{a}{a+b}$ and estimated model parameters. The true prior mean, and temperature parameter for each subject is in the dataFrame column 8 and 9 respectively. Does the recovery of parameters look acceptable?
    

In [14]:
# insert your code here

def get_negll_mean_BetaBinom(a,b,tau, data):
    '''
    Determines the negative loglikelihood of the BetaBinomial model with softmax link function
    
    Parameters
    ----------
    parameter : array_like of float
        length 3: 1st entry is a (shape of Beta prior), 2nd is b 
        (2nd shape parameter of Beta prior), 3rd is tau (temperature parameter)
        Note: we pack mu and B in one parameter because we want to
        make it compatible for later use with sp.optimize.minimize
    data : array_like of decision trials.
        contains n, N, target_c, and choice
        
    Returns
    -------
    nll : float
        negative log-likelihood
    '''

hide_toggle(for_next=True)

In [None]:
### Solution for functions

def get_negll_mean_BetaBinom(a,b,tau, data):
    '''
    Determines the negative loglikelihood of the BetaBinomial model with softmax link function
    
    Parameters
    ----------
    parameter : array_like of float
        length 3: 1st entry is a (shape of Beta prior), 2nd is b 
        (2nd shape parameter of Beta prior), 3rd is tau (temperature parameter)
        Note: we pack mu and B in one parameter because we want to
        make it compatible for later use with sp.optimize.minimize
    data : array_like of decision trials.
        contains n, N, target_c, and choice
        
    Returns
    -------
    nll : float
        negative log-likelihood
    '''
    
    l=list()
    
    for i_trial in range(len(data)):
        n=data[i_trial,1]
        N=data[i_trial,2]
        target=data[i_trial,4]
        c_hat= get_mean_BetaBinomial(a, b, n, N)
        choice_likelihoods=get_softmax([target,c_hat],tau)+1e-100     #target always 1st elem (0), and fractal 2nd elem (1), so that it matches df['choice']=0 or 1 for target/fractal respectively  
        l.append(choice_likelihoods[int(data[i_trial,5])])        
        
    return -np.sum(np.log(l))

hide_toggle(for_next=True)

In [None]:
### Solution

get_nll_mean_BetaBinomial_wrapper = lambda parameters, data: get_negll_mean_BetaBinom(parameters[0], parameters[1], parameters[2], data)

# Store parameters found
est_a, est_b, est_tau = list(), list(), list()

fig,ax = plt.subplots(1,2,figsize=(15,7))

# Optimization (find parameters that minimize nll)
initial_guess = [1,1,0.1]
bounds = [(0.01,6),(0.01,6),(0.01,0.8)] # Optimization bounds

#Select data 1 subject at a time
for i_subj in range(len(df['subject_id'].unique())):
    
    df_subj = df.loc[(df['subject_id']==i_subj+1)]
    np_subj = np.array(df_subj)
    
    t_start_optim = time.time()
    res = minimize(get_nll_mean_BetaBinomial_wrapper, 
               args=(np_subj),
               method='SLSQP',x0=initial_guess, bounds=bounds)
    t_end_optim = time.time()
    est_a.append(res.x[0])
    est_b.append(res.x[1])
    est_tau.append(res.x[2])
    ax[0].plot((np_subj[0,6]/(np_subj[0,6]+np_subj[0,7])),(res.x[0]/(res.x[0]+res.x[1])),'or',alpha=0.6)
    ax[1].plot(np_subj[0,9],res.x[2],'or',alpha=0.6)

ax[0].set_ylim([-0.05,1.05])
ax[0].set_xlim([-0.05,1.05])
ax[1].set_ylim([-0.05,1.05])
ax[1].set_xlim([-0.05,1.05])
ax[0].plot(np.arange(0.1,0.95,0.1), 1*np.arange(0.1,0.95,0.1),'--r',LineWidth=2,label='Ideal recovery X=Y')
ax[0].set_ylabel('Estimated mean of prior');
ax[0].set_xlabel('True mean of prior');
ax[1].plot(np.arange(0.1,0.95,0.1), 1*np.arange(0.1,0.95,0.1),'--r',LineWidth=2,label='Ideal recovery X=Y')
ax[1].set_ylabel('Estimated Temperature');
ax[1].set_xlabel('True Temperature');
ax[0].legend(loc='best', frameon=False)
ax[1].legend(loc='best', frameon=False)
    

***Expected Output***

![](./figures/expected_ex10.png)

***EXERCISE 11***

We have now shown that we can recover the parameters used to generate the data (suggesting that we should normally be able to model behaviour of participant correctly).

As a sanity check however, it is usually good practice to plot the fitted model predictions (also called posterior predictive distribution) against the subjects' behaviour.

***Suggestions***

* For subject 24:
    * Complete the function `sim_mean_BetaBinomial` described below. (hint: You may want to copy your subject data frame, use your previous function `get_mean_BetaBinomial`, and replace the user choice with the function `numpy.random.choice`)
    * Run the model with the estimated subject parameters 1000 times and record choices (hint: you may want to simulate the same dataset with the same parameters 1000 times to plot the mean and standard deviation of the model predictions per bin)
    * Calculate the psychometric function from the simulated model parameters and plot them with error bars (hint: You may want to convert your simulated data into a dataFrame)
    * Plot the true psychometric function for that participant
* Does the model do a good job at recapitulating the data (try for subject 46, and 49)?

In [15]:
# insert your code here

def sim_mean_BetaBinomial(a, b, tau, df_data):
    '''
   Simulates choices (target=0, fractal=1) given the BetaBinomial model with data as presented to participant 
   and parameters estimated using MLE
    
    Parameters
    ----------
    a : a (shape of Beta prior),
    b : b (shape parameter of Beta prior)
    tau: temperature parameter
    df_data : dataframe for one subject only, which contains on each line a different trial with:
              n, N, target_c, and choice
        
    Returns
    -------
    dataFrame of containing simulated Choices (ie replace true choices) for the same trace of trials presented to the participant
    '''

hide_toggle(for_next=True)

In [None]:
### Solution for function

def sim_mean_BetaBinomial(a, b, tau, df_data):
    '''
   Simulates choices (target=0, fractal=1) given the BetaBinomial model with data as presented to participant 
   and parameters estimated using MLE
    
    Parameters
    ----------
    a : a (shape of Beta prior),
    b : b (shape parameter of Beta prior)
    tau: temperature parameter
    df_data : dataframe for one subject only, which contains on each line a different trial with:
              n, N, target_c, and choice
        
    Returns
    -------
    dataFrame of containing simulated Choices (ie replace true choices) for the same trace of trials presented to the participant
    '''
    
    np_sim = np.array(df_data)

    for i_trial in range(len(np_sim)):
        n=np_sim[i_trial,1]
        N=np_sim[i_trial,2]
        target=np_sim[i_trial,4]
        c_hat= get_mean_BetaBinomial(a, b, n, N)
        choice_likelihoods=get_softmax([target,c_hat],tau)     #target always 1st elem (0), and fractal 2nd elem (1), so that it matches df['choice']=0 or 1 for target/fractal respectively  
        np_sim[i_trial,5]=np.random.choice([0,1], 1, p=choice_likelihoods)        
    
    df_sim = pd.DataFrame(np_sim)
    df_sim[5] = df_sim[5].astype(np.float)
    return df_sim

hide_toggle(for_next=True)

In [None]:
### Solution

# Select which participant data to plot & simulate
subject_id=24
df1 = df.loc[(df['subject_id']==subject_id)]

# simulate 1000 repeats of the same trial structure
n_repeats=1000
p_accept_sim = np.full([n_repeats,len(my_vals)],np.nan)

for i_repeats in range(1000):
    df_sim1_temp = sim_mean_BetaBinomial(est_a[subject_id-1], est_b[subject_id-1], est_tau[subject_id-1], df1)
    choice_groupby_sim_binned = df_sim1_temp[5].groupby(df_sim1_temp[11])
    p_accept_sim[i_repeats,:]=np.array(choice_groupby_sim_binned.mean())

#Groupby data for subject 'Subject_id'
choice_groupby_binned = df1['choice'].groupby(df1['binned'])
choice_groupby_binned.describe()

#Print p(choose fractal) as function of difference in value between fractal & target
plt.figure()
choice_groupby_binned.mean().plot(xticks=range(len(my_vals)))
plt.ylabel('Probability of choosing `fractal`')
plt.xlabel('Binned: Difference in reward probability (Fractal-Target)')
plt.title('Psychometric function, Subject: '+ str(subject_id))

#Print fit to the data
my_vals=np.array(choice_groupby_binned.mean())
plt.plot(range(len(my_vals)), my_vals, 'o', label='data')
# error bar printing mean simulated choice per bin, error bars represent std of simulated choice proportion
plt.errorbar(range(len(my_vals)), np.mean(p_accept_sim,axis=0), yerr=np.std(p_accept_sim,axis=0), label='Model posterior predictive fit')
plt.legend(loc='best')
plt.show()

***Expected Output***

![](./figures/expected_ex11.png)