**Course organisers**

Jan Grohn (jan.grohn@psy.ox.ac.uk), Miriam Klein-Flügge (miriam.klein-flugge@psy.ox.ac.uk)  


# Introduction and recap
**Aims for today’s session:**
- Prepare figures, statistical tests and your thoughts for your write-up

To wrap things up, we will go over how to visualise and compute some results that you might want to include in your reports. Before doing so, however, we will go over what's needed for your report, and revisit the key equations that we encountered throughout this block practical.

## Report Guidlines

Your report should be as much to the point as possible. As this block practical is concerned with data analysis using computational models, we advice you to focus most on the methods, results and discussion section, keeping the introduction to a minimum.

### Abstract
Keep this very short (<300 words). Allocate one or two sentences to summarize each section of the report. Ideally, you write the abstract last after having written all other sections.

### Introduction
Keep this very short: 200-400 words, including a maximum of 4 citations. The introduction  should cover:  


1.   What our experimental questions are and why we ask them (by citing relevant previous work)
2.   How exactly each experimental question will be tested in the experiment and what you predict the outcomes will be

### Methods
This is an important section of the report. A key aspect we are looking for here is that it is clear why you do the analyses you do and how you do them. We recommend that you look back through your notes and the questions and answers from the previous sessions (including today) to help you with this section. It can also helpful to look at the style of the Methods in related publications.
This section should cover:

1.   Participants: where your data is coming from, the number of participants, whether you decided to exclude any participants, and if so, why you excluded them (or if not, why you decided not to exclude anybody)

2.   Experimental task design: Look back at the slides and notebooks from the previous weeks. These will already include descriptions of the rationale behind designing the task the way it is. Feel free to adapt them and make sure to put them down in your own words in your Methods section. Also look at Blain and Rutledge (2020) for the specific version of the task that we are analysing. Include a figure capturing the main aspects of the task.

3.   Description of the computational model (with equations) used to analyse the data: Again, look at the descriptions from the previous notebooks (sessions 1-3) to complete this section. Do not forget to mention the advantage of using a computational model compared to other simpler ways of analysing the data. Also include how the model has been fitted to the data.

4.   Application of the computational model to the data: describe what measures (i.e., model parameters) that you have derived from the computational model you will use to test your experimental hypotheses.

5. Statistical tests / model comparison: Include what statistical tests and/or model comparisons you will do and why.

6.   Parameter and model recovery: describe why you do these tests (included at the end of this session) and how exactly you do them (including the number of simulated participants and the statistical tests; code to help you make the figures is covered in session 4 today).



### Results
This should follow the same structure as the methods section above, i.e. you should show the results of the analyses that you describe in the methods section.

For each analysis we expect you to summarize briefly what you want to test with the analysis and how the result of the analysis relates to your question:

1.   Figures: Each figure should have a number, a title and a short description of what is shown. You should be able to make most figures that you might need using the notebook from today’s session. Make sure that the axes are labelled. If there are several different variables within the same plot, add a legend (in the figure) to describe what they mean.

2.   Statistical tests (see today's session): report all aspects of the test; e.g. for a t-test the degrees of freedom, the t-value and the p-value, as well as the effect size.

Many of the figures of results produced in the today's Colab notebook will be very relevant to include here.

### Discussion
Keep this section below 1000-1500 words. Given that this block practical is about using modelling for data analysis, the main focus of your discussion should be on interpreting the results, considering limitations to the interpretation (given the method) and discussion of additional analyses you could do in the future if you had more time.

1.   Start this section with a short summary (about 2-4 sentences) of the experiment and the main findings.
2.   Then describe in detail how you interpret the different analyses; what do they mean? Were the results as you expected? If not, why?
3.   Discuss how the findings relate to the previous literature: Here, you could compare how the task design that Blain and Rutledge (2020) used is different from the original Behrens et al. (2007) design. Do you think the changes Blain and Rutledge (2020) made make the task more or less suited for answering our questions? You can also discuss how the way Blain and Rutledge (2020) fitted and analysed their data differed from what you did. Do you think their model fits and analyses are more or less suited for answering the questions we are interested in?
4.   Limitations to the data and analyses and future analyses: Describe whether any of the analyses done were not yet decisive and what analyses you could have done additionally to corroborate your findings further. For this, it might be useful to look back at your answers in the previous sessions where you were asked to reflect on additional analyses you could have done.


### Summary of marking criteria
Below is a summary of the criteria that will be applied to mark your reports:
-	demonstrate a clear understanding of the experimental design and aims of the experiment
-	show understanding of the rationale for and the principles of the computational methods used to analyse the experiment
-	demonstrate ability to use Python to:
  - generate meaningful diagrams (with appropriate labels) to reflect results
  - to run statistical tests for evaluating the key experimental hypotheses
  - output figures and statistics to generate a coherent results section
-	critically interpret these results and the limitations of the data and the performed analyses, also with reference to the relevant literature


## Recap – Task and reinforcement learning model
During the last weeks, we modelled and analysed data from a task that was heavily inspired by [Behrens et al, 2007](https://www.nature.com/articles/nn1954). The actual data we used was taken from a paper by [Blain and Rutledge, 2020](https://elifesciences.org/articles/57977), who ran a task similar to Behrens et al. in order to study the effects of the task on moment-to-moment subjective self reports of happiness. In our analysis we disregarded these happiness measures and instead went back to the question that Behrens et al. asked: are participants adapting their beliefs about reward faster in a volatile compared to a stable environment?

To answer this question, we have motivated and built a computational model of learning and decision making. On each trial $t$, the model computes a prediction error, which is the difference between the observed and the predicted outcome:

$$
\underbrace{\delta_t}_\textrm{prediction error} = \underbrace{o_t}_\textrm{outcome} - \underbrace{p_t}_\textrm{model prediction} \tag{Equation 1}
$$

The model then uses this prediction error to make a new prediction for the next trial. This is done by updating the prediction proportional to the prediction error, scaled by a constant $\alpha$, which we call the learning rate.

$$
\underbrace{p_{t+1}}_\textrm{new prediction} = \underbrace{p_t}_\textrm{old prediction} + \underbrace{\alpha \delta_t}_\textrm{scaled prediction error} \tag{Equation 2}
$$

During the task we are modelling, participants had to take two variables into account when making their choices: the probability that an option is rewarded, which we are modelling according to the above equations, and a number of reward points that are on offer. To make choices, they have to somehow integrate these two variables. During Session 2, we discussed two ways of integrating reward probability and magnitude. Multiplicative utility assumes that participants multiply the magnitude and probabilty

$$
\underbrace{u}_\textrm{utility} = \underbrace{m}_\textrm{reward magnitude} \times \underbrace{p}_\textrm{reward probabilty} \tag{Equation 3}
$$

whereas additive utility assumes that participants compute a weighted average of magnitude and probabilty:

$$
\underbrace{u}_\textrm{utility} = \overbrace{\omega}^\textrm{magnitude weight} \times \underbrace{m}_\textrm{reward magnitude} + \overbrace{(1-\omega)}^\textrm{probability weight} \times \underbrace{p}_\textrm{reward probabilty} \tag{Equation 4}
$$

To fit data from the participants that Blain and Rutledge collected, we need to translate the utility we computed into a predicted probability of making a choice. To do so, we assume that participants pick each option with a certain probability, which depends on the difference in utility between the two options and a 'randomness factor', which we call the inverse temperature $\beta$:
$$
\underbrace{P(c_1)}_\textrm{probability of choosing option 1} = \frac{1}{1+ e^{-\beta(u_1 - u_2)}} \tag{Equation 5}
$$

Using all of this, we then fitted the model to the 75 participants that Blain and Rutledge collected.

 ## Import libraries

In [1]:
# check if we are running on colab
try:
    from google.colab import files
    _ON_COLAB = True
except:
    _ON_COLAB = False

if not _ON_COLAB:
    %pip install -r ./session3/requirements.txt

# numpy is a libarary used to do all kinds of mathematical operations
import numpy as np

# pandas allows us to organise data as tables (called "dataframes")
import pandas as pd

# we are using the chi2 distribution for some statistical tests
from scipy.stats import chi2

# this function allows us to perform one sample t-tests
from scipy.stats import ttest_1samp

# seed the random number genrator
rng = np.random.default_rng(12345)

if _ON_COLAB:
    # this allows us to make interactive figures
    from google.colab import output
    output.enable_custom_widget_manager()

    # load in some custom functions for this block practical
    !rm -r *
    !git clone https://github.com/jangrohn/ComputationalModelingBlockPractical
    !cp -R ComputationalModelingBlockPractical/session4/ session4
    !rm -rf ComputationalModelingBlockPractical

    # download the dataset from Blain & Rudledge 2020.
    !wget "https://github.com/BastienBlain/MSWB_LearningNotReward/raw/main/PublicCode/Blain_MoodTracksLearning_data.mat"

from session4 import loading, plotting, fitting

Note: you may need to restart the kernel to use updated packages.


# Section 1: Preparing for your report

## Loading the model fits
In the interest of time, we will not fit the computational models to the data again here. Instead, the code below loads in the parameters that were fitted by the end of last week's session:

In [2]:
data1AlphaMul, data2AlphaMul, data1AlphaAdd, data2AlphaAdd = loading.load_model_fits()

In total, we have fitted 4 different models to each participant:


1.   A model assuming the same learning rate in the stable and the volatile session and multiplicative utility
2.   A model assuming different learning rates in the stable and the volatile session and multiplicative utility
3.   A model assuming the same learning rate in the stable and the volatile session and additive utility
4.   A model assuming different learning rates in the stable and the volatile session and additive utility

The four outputs of the function we ran in the previous code cell correspond to these four models.

## Plotting a schedule

If you want to showcase an example experimental scheudule in your report, you can visualise the data of a participant using the function below, where you input the ID of the participant you want to plot. In the example below, the data of the participant with ID 0 is being plotted. People usually pick a representative participant to plot.

In [3]:
plotting.plot_schedule(0)

You can also plot a model fit in addition to the participant data by also passing the corresponding model parameters to the function. In the example below, we are plotting the model fit of a model with one learning rate and multiplictive utility. If you decide to also showcase an example model fit, think about which model you would like to visualise and why.

In [4]:
plotting.plot_schedule(0, data1AlphaMul)

## Excluding participants

At the end of last week's session you should have made some choices about which (if any) participants you would like to exclude from the analysis. Fill in the subject IDs of the participants you want to exclude below. This is important for visualising the model fits and conducting statistical tests, which we will do next.

In [5]:
exclude = []

You should also have good reasons, or a plausible overall principle for excluding participants. If you have excluded participants, try to formulate how you selected them. If you have not excluded participants, argue why you think it was not necessary to exclude anybody.

→ Type your answer here

## Visualising fitted parameter distributions

You can visualise fitted learning rate distributions using the function below. In the example below, we are plotting learning rates that are fitted using multiplicative utility, but you can also adapt the function to plot learning rates fitted using additive utility.

In [6]:
plotting.visualise_alpha_distributions(data2AlphaMul[~data2AlphaMul.index.isin(exclude)].alphaStable, data2AlphaMul[~data2AlphaMul.index.isin(exclude)].alphaVolatile, 'learning rates assuming multiplicative utility')


If you want to visualise the difference between the learning rate in the volatile and the stable session, you can use the function below.

In [7]:
plotting.visualise_alpha_difference(data2AlphaMul[~data2AlphaMul.index.isin(exclude)].alphaStable, data2AlphaMul[~data2AlphaMul.index.isin(exclude)].alphaVolatile, 'difference between learning rates assuming multiplicative utility')

Finally, you can plot the distribution of all fitted model parameters, and their correlations, using the function below

In [8]:
plotting.plot_parameter_corrs(data2AlphaMul[~data2AlphaMul.index.isin(exclude)])

Use the above three functions to make figures that you want to use in your report. Think about which models you want to visualise and why, and what the most appropriate visualisations are.

## Running statistical tests and model comparisons

Last week, we discussed 3 different ways to run statistical tests and/or distinguish between different model fits.

### T-tests

We discussed t-tests to assess whether the learning rate in the volatile session is larger than in the stable session. This can be implemented as shown below, where we run this test assuming multiplicative utility.

In [9]:
ttest_1samp(data2AlphaMul[~data2AlphaMul.index.isin(exclude)].alphaVolatile - data2AlphaMul[~data2AlphaMul.index.isin(exclude)].alphaStable, 0, alternative = "greater")

TtestResult(statistic=np.float64(3.7243820752107633), pvalue=np.float64(0.0001898850641762056), df=np.int64(74))

To report a t-test, we would usually write $t(\text{degrees of freedom}) = \text{t-statistic}, p = \text{p-value}$, p values below 0.001 are usually just shown as $p < 0.001$, and the p-values and t-statistic can be rouned to 3 decimal places.

### Likelihood ratio tests

Likelihood ratio tests can be run using the code below, which tests whether a model assuming different learning rates and multiplicative utility is a better fit than a model assuming the same learning rate and multiplicative utility.

In [10]:
degrees_of_freedom = 75 - len(exclude)
lambda_LR = 2*sum(data2AlphaMul[~data2AlphaMul.index.isin(exclude)].LL - data1AlphaMul[~data1AlphaMul.index.isin(exclude)].LL)
p_value = chi2.sf(lambda_LR, degrees_of_freedom)
print('Chi2(' + str(degrees_of_freedom) + ') = ' + str(lambda_LR) + ', p = ' + str(p_value))

Chi2(75) = 392.64721782755305, p = 1.4682134222934281e-44


To report a likelihood ratio tests, we would usually write  $\chi^2(\text{degrees of freedom})=\text{LR-statistic},p=\text{p-value}$ , and round the same way as for the t-test.

### BIC comparisons

The BIC for a model can be calulated as shown below for a model assuming two learning rates and multiplicative utility.

In [11]:
print(sum(data2AlphaMul[~data2AlphaMul.index.isin(exclude)].BIC))

10645.212704314894


The BIC is a heuristic that allows you to assess which model is better, where a model with a lower BIC is considered the model that describes the data better. You can adapt the above function to calculate the BIC associated with different models. As the magnitude of the BIC in itself is not particularly meaningful, people sometimes report the difference in BICs between models they compare rather than their individual BICs.

### Choosing what comparison/test to use

Think through what we discussed last week and then use t-tests, likelihood ratio tests and/or BIC comparisons to analyse the data from Blain and Rutledge. Remember, the question we would like to answer is whether participants have a higher learning rate in the volatile compared to the stable session. But to assess this, we might first have so see what utility function we should use etc.. Once you have decided what tests to run, adapt the code above to generate outcomes you can include in your report.


# Section 2: Validating the model fits

During session 2, we discussed how we can use simulations to validate our computational model. We will do so again here now, but this time we will use the specific schedules used by Blain and Rutledge, and also base our simulations on parameters we fitted earlier. The function in the code cell below simulates data of artificial participants that complete the task Blain and Rutledge designed. It bases these simulations on a previous model fit. The first argument that we give to the function is a list of fitted parameters (that were fitted to the actual participants). The function then takes these parameters and randomly shuffles them. This way a participant's choices will be simulated with a different combination of the parameters. For example, a simulated participant might have the learning rate from participant 3 and the inverse temperature from participant 50. The function then fits all 4 models that we discussed to each simulated participant. This will allow us to verify that the correct model (which is the one we actually simulated with) is also the model that fits the data better than the other 3.

The second input to the function `nReps` indicates how often this whole procedure described above should be repeated. Repeating the procedure multiple times allows us to ensure that anythig we find is not just a fluke but holds up more robustly.

Because running this function with a large number of `nReps` will take a couple of hours, we have already ran the function for you. Thus, instead of running the next code cell, skip over it for now.

In [12]:
recov1AlphaMul = fitting.parameter_recovery(data1AlphaMul, nReps = 100)

  0%|          | 0/100 [00:00<?, ?it/s]

We ran the following command.

`recov1AlphaMul = fitting.parameter_recovery(data1AlphaMul, nReps = 100)`

and saved the output. The saved output can be loaded in by running the next code cell. Because the dataset contained data from 75 participants, and we simulate 100 times, we run 7500 simulations. We then fit 4 models to each simulation. We then also ran the following 3 commands, which also each run 7500 simulations and fit 4 models each.

`recov2AlphaMul = fitting.parameter_recovery(data2AlphaMul, nReps = 100)`

`recov1AlphaAdd = fitting.parameter_recovery(data1AlphaAdd, nReps = 100)`

`recov2AlphaAdd = fitting.parameter_recovery(data2AlphaAdd, nReps = 100)`

All of this is loaded in the next code cell

In [17]:
recov1AlphaMul, recov2AlphaMul, recov1AlphaAdd, recov2AlphaAdd = loading.load_parameter_recovery()

In your own words, describe what the difference between `recov1AlphaMul` and `recov2AlphaMul` is.

→ Type your answer here

## Parameter recovery

We can visualise the values of the simulated and recovered parameters using the function below. You can change the input argument to plot different models. The black line the function plots is the identity line, i.e. simulated = recovered, which indicateds what we expect in the ideal case where we have perfect parameter recovery. The red line is the best fitting least squares linear regression line. Looking at its slope and offset might help you to determine how good the parameter recovery is.

Run the function below the assess the parameter recovery for different models, and assess which of these might be important or useful to include in your report

In [18]:
plotting.plot_recovered_parameters(recov2AlphaMul)

## Model recovery

Because we have fitted all 4 models to each simulated dataset, we can also compare which *model* (as apposed to *model parameters* in the previous section) describes a simulated dataset best.

In total, we have simulated 100 datasets from 4 models, and then fitted 4 models again to each dataset. As such, we can use the BIC to identify which fitted model best describes it. We can do so 100 times. The function below plots the proportion of times a model was identified as the best fitting model according to the BIC. Run the next code cell now.

In [19]:
plotting.visualise_BIC_recovery(recov1AlphaMul, recov2AlphaMul, recov1AlphaAdd, recov2AlphaAdd)

Describe what the plot above shows, and why it is important to run this kind of model recovery.

→ Type your answer here

**Well-done, you have now completed the block practical. Feel free to ask us any questions you might still have. Otherwise you are ready to start working on your report now.** If you were very fast and have a lot of time left, we have also included two optional sections below, where we use our simulated data to recover the results from our t and likelihood ratio tests that you can complete.

## OPTIONAL Validating t-tests

We can also use the simulated data to check how well our statistical tests perform. For a model that we simulated with, we can run a t-test on the fitted learning rates to determine whether the learning rate in the volatile session is larger than in the stable session. This can be repeated 100 times as we have simulated 100 datasets, and thus we can plot how often, out of 100, a t-test is significant. This is done by the function below:

In [20]:
plotting.visualise_t_test_recovery(recov1AlphaMul, recov2AlphaMul, recov1AlphaAdd, recov2AlphaAdd, p = 0.05)

NameError: name 'recov_t_test' is not defined

The blue bars show the proportion of tests that are significant if we simulate with a model *that should not produce a significant difference*. Therefore, the height of the blue bars show the proportion of times we wrongly rejected the null hypothesis (that there is no difference between learning rates in the two conditions). By contrast, the height of the orange bars show how often we rejected the null hypothesis when we simulated with a model that has two different learning rates. Think about how and and to what degree these four bars above are related to [type I and type II errors](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors) in statistical hypothesis testing? In the function above, you can also adjust the significant p-value to further wrap your head around these concepts. Do you think 100 simulations are enough to estimate these error rates?

→ Type your answer here

### OPTIONAL Validating likelihood ratio tests

Just as with the t-test, we can also instead run likelihood ratio tests using the same logic:

In [21]:
plotting.visualise_LR_recovery(recov1AlphaMul, recov2AlphaMul, recov1AlphaAdd, recov2AlphaAdd, p = 0.05)

NameError: name 'recov_chi2_test' is not defined

Again, work through the meaning of the shown proporitons (particularly in realtion to type I and type II errors), and what they mean for the reliability of the likelihood ratio tests you ran on the actual data.

→ Type your answer here