# Introduction

In this tutorial, we will collect data for a 2-armed bandit experiment and then fit a Q-learning model to each subject's data.

# Notebook setup

## Instructions

- Import numpy, scipy and matplotlib
- Configure inline plots

In [1]:
% matplotlib inline
import csv
import numpy as np
import matplotlib.pyplot as plt
from pylab import *
from scipy.optimize import minimize

# 2-armed bandit experiment

In order to run this experiment, please make sure you have installed PsychoPy. The easiest way is to get the standalone package: http://psychopy.org/installation.html

You can generate data for the task using generate_trial_data.py. Then run the experiment code in 2_armed_bandit_experiment.py. You should make sure you understand this code so that you can make your own modifications and improvements.

Collect data from at least 5 different subjects. In order to properly fit a learning model, you will need a substantial amount of data per subject (maybe 100 trials).

# Fitting the Q-learning model using MLE

You will now fit a Q-learning model to each subject's data using maximum likelihood estimation. For each subject, we want to obtain the values of the three free parameters in the model: $\alpha$ (the learning rate), $\beta$ (the inverse temperature) and $\gamma$ (the temporal discount factor).

Use the same MLE approach you learned in the Module 1 model fitting tutorial:
- Write a function that returns the total negative log-likelihood for any parameter values.
- Use an optimization library to numerically find the parameters that minimize the negative log-likelihood (or equivalently, maximize the log-likelihood of the data given the model). Tip: Use the minimize function from the scipy.optimize module.
- Plot 3 likelihood heatmaps, first as a function of $\alpha$ and $\beta$, then $\alpha$ and $\gamma$, then $\beta$ and $\gamma$. For each heatmap, keep the value of the third parameter fixed.