[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shawnrhoads/gu-psyc-347/blob/master/docs/module-03-02_RL-Exercises.ipynb)

# RL Exercises

These exercises were inspired by and adapted from [Models of Learning](http://www.hannekedenouden.ruhosting.nl/RLtutorial/Instructions.html) by Jill O'Reilly and Hanneke den Ouden, [NSCS 344 - Modeling the Mind](http://u.arizona.edu/~bob/web_NSCS344/index.htm) by Robert C. Wilson, [NSCI 526 - Tutorial 2 (Reinforcement Learning)](https://github.com/shawnrhoads/gu-nsci-526) by Shawn Rhoads, the [Gambling Game tutorial](https://github.com/cloudssty/Gambling-Game), and the [Neuromatch Academy tutorials](https://github.com/NeuromatchAcademy/course-content) [[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).

In these exercises, we will fit learning behavior to the Rescorla Wagner model of Reinforcement Learning. The data can be downloaded from GitHub, Canvas, or by using the following code:

In [None]:
import os, requests

# create list of URLs
urls = [f'https://raw.githubusercontent.com/shawnrhoads/gu-psyc-347/master/docs/static/data/sub-{x:02}_RLdata.csv' for x in range(0,15)]

# loop through list and download data
for url in urls:
    r = requests.get(url, allow_redirects=True)
    filename = 'static/data/' + os.path.basename(url)
    open(filename, 'wb').write(r.content)

Now, that we've downloaded our data, let's explore it together!


<hr>

## Part 1
You will break out in groups of 2-3 to discuss the following questions (<font color='red'>highlighted in red font</font>) and implement code to answer them. I have prepared a few functions that will help you along the way.

Then, we will re-group with the entire class to discuss what we've learned. Please remember to save your work. This will count towards your Jupyter Notebook Exercise #3 grade.

In [None]:
# first let's import our packages

from scipy.optimize import minimize # finding optimal params in models
from scipy import stats             # statistical tools
import numpy as np                  # matrix/array functions
import pandas as pd                 # loading and manipulating data
import ipywidgets as widgets        # interactive display
import matplotlib.pyplot as plt     # plotting
%matplotlib inline

np.random.seed(2021)                # set seed for reproducibility

In [None]:
# this function will load the data into memory (assuming that
# the data are downloaded)
def load_subjects(how_many=15):
    '''
    input: number of subjects' data to load from 1-14
    output: dictionary of DataFrames containing the data
    '''
    
    assert (how_many > 0) and (how_many <= 15), "0 < how_many < 15"
    files = [f'static/data/sub-{x:02}_RLdata.csv' for x in range(0,how_many)]
    
    subject_data = {}
    for index, file in enumerate(files):
        subject_data[index] = pd.read_csv(file, index_col=0)
    
    return subject_data

In [None]:
# let's load in our data using the function above
# feel free to adjust `how_many` (the default is all 15 subjects)
subject_data = load_subjects()

### 1. Getting to know your stimuli

Subjects played a few rounds of the two-armed bandit task, in which they learned the reward probability distribution of two slot machines (**stim_A** and **stim_B**) through trial-and-error. 

<font color="red">1a. How many trials did each subject complete?</font> (*Hint: explore the Dictionary of DataFrames*)

In [None]:
# insert your code here

**ANSWER 1a: *insert your answer here***

Each slot machine was associated with a different mean probability (i.e., **stim_A** yielded rewards according to a constant probability and **stim_B** yielded rewards according to a different constant probability). 

<font color="red">1b. What were the probabilities of each stimulus?<br>
1c. Did **stim_A** have the same probability for every subject? **stim_B**? Why or why not?</font>

In [None]:
# insert your code here

**ANSWER 1b: *insert your answer here***

**ANSWER 1c: *insert your answer here***

### 2. Exploring behavior

People learn (or don't) in many different ways. Some people are extremely sensitive when outcomes aren't what they expected. Others aren't willing to update their behaviors so quickly.

People also make decisions differently. Some people are more explorative and are event willing to try a riskier action just to see what happens. Others are more "deterministic" with their actions tend to stick with what they know is best.

While there are plenty more ways people vary in their learning and decision-making behavior, we are going to explore these two aspects. 

*Hint: please use the `plot_behavior()` function to explore different aspects of subjects' behavior and outcomes*

In [None]:
def plot_behavior(subject_data, subject_id, choices=False, outcomes=False, probability=False):
    '''
    input: 
        subject_data: dictionary containing
        subject_id: integer from 0-14 corresponding to an ID number
        choices: boolean indicating whether to plot choices or not
        outcomes: boolean indicating whether to plot outcomes or not
        probability: boolean indicating whether to plot the mean reward over trials for both stimuli or not
    output:
        plot of behavior
    '''
    assert type(subject_data) is dict, "`subject_data` should be a dictionary, run the `load_subjects()` function above to load the data into memory"
    assert type(subject_id) is int and subject_id >= 0 and subject_id <14, "`subject_id` should be an integer between 0 and 14"
    
    data = subject_data[subject_id]
    
    if probability:
        plt.axhline(np.mean(data[data.choice==0].outcome), color="orange", alpha=.4, label=data.columns[0])
        plt.axhline(np.mean(data[data.choice==1].outcome), color="purple", alpha=.4, label=data.columns[1])
        
    if outcomes:
        plt.plot(range(len(data)), data.outcome, 'r--', alpha=.6)
    if choices:
        if np.mean(data.choice) < .5:
            choice_data = [0 if x == 1 else 1 for x in data.choice]
        else:
            choice_data = [x for x in data.choice]
        plt.plot(range(len(data)), choice_data, '+', label='choice')
    
    plt.xlabel('trials')
    plt.ylabel('outcome')
    plt.title(f'Behavior from subject #{subject_id}')
    plt.legend()
    plt.show()

Plot everyone's behavior and answer the following questions (*hint: there's way to plot everyone's data using only two lines of code).
<br><br>

<font color="red">
    2a. Which subjects were most sensitive to previous unexpected outcomes? List the subject ID numbers. Describe which aspect(s) of the data led you to this conclusion. What parameter from the Rescorla-Wagner Model captures this tendency?<br>
    2c. Which subjects were least explorative in their behavior? List the subject ID numbers. Describe which aspect(s) of the data led you to this conclusion. What parameter from the Rescorla-Wagner Model captures this tendency?</font>

In [None]:
# insert your code here

**ANSWER 2a: *insert your answer here***

**ANSWER 2b: *insert your answer here***

### 3. Exploring outcomes

<font color="red">Earlier, we learned that the reward probabilities of each stimulus were fixed, how do these values compare with the actual mean reward over trials across subjects (according to their choices)? Are they similar? Why or why not?</font> (*Hint: see plots above and/or explore different subjects' "outcome" column*)<br>


In [None]:
# insert your code here

**ANSWER 3: *insert your answer here***

Great job! Don't forget to save any of your work. It will also be useful for **Part 2**!

<hr>

## Part 2