In [None]:
import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
import json
import os 

### The probabilistic reversal learning task

During the fMRI scanning session participants carried out the PRL task in two conditions: (1) reward-seeking and (2) punishment-avoiding. Participants were instructed to repeatedly choose between yellow and blue boxes in order to collect as many points as possible in the reward-seeking condition or loose as little points as possible in the punishment-avoiding condition (**Fig. 1**). One of the boxes had the probability to be correct (rewarding or non-punishing depending on the condition) p = 0.8 and the other one p = 0.2. This reward contingency changed four times throughout each task condition. Reward probabilities were unknown to the subjects and had to be learned from experience. Each box had also associated reward magnitude, randomly selected at the beginning of each trial. These reward magnitudes represented as numbers within the box indicated possible gain in the reward-seeking condition or possible loss in the punishment-avoiding condition. To be successful in this task decision maker had to correctly estimate reward probabilities from experience and take into account reward magnitudes to choose an option with higher expected value. 
    
Each task condition was associated with the separate fMRI run and consisted of N=110 trials. Each trial began with the decision phase indicated by the question mark appearing within the fixation circle. During decision phase subject had 2 s to choose one of the boxes by pressing button on the response grip with either left or right thumb. Decision phase was followed by a variable inter-stimulus-interval (ISI; 3-7 s, jittered), after which an outcome was presented for 2 s. During the outcome phase fixation circle was colored accordingly to rewarded or punished box and the number within the circle represented number of gained or lost points (see **Fig. 1**). Outcome phase was followed by a variable inter-trial-interval (ITI; 3-7 s, jittered). 
    
    
The number of points which subject gathered in the reward-seeking condition or the remaining number of points in the punishment-avoiding conditions was represented by the gray account bar on the bottom of the screen. Subjects were informed that if they manage to fill half of the bar or the entire bar in the reward-seeking condition they will receive 10 PLN or 20 PLN respectively. In the punishment-avoiding condition subjects were informed that they will receive 20 PLN if they are left with more than half of the bar, 10 PLN if they are left with less than half of the bar or that do not receive any money if they lose all of their points. To maintain constant level of motivation throughout the task, incentives thresholds were set such that all participants acquired 10 PLN from either task.


PsychoPy software (v. 1.90.1, www.psychopy.org (Peirce, 2007)) was used for task presentation on the MRI compatible NNL goggles (NordicNeuroLab, Bergen, Norway). Behavioral responses were collected using MRI compatible NNL response grips (NordicNeuroLab, Bergen, Norway), which were hold in both hands.  Each condition lasted approximately 24 min. The order of task conditions as well as the colors for left and right box (yellow and blue) were counterbalanced across subjects. Before the MRI scan, subjects practiced both task conditions on the lab computer. 
Heterogeneity in the prior expectations regarding the task structure may lead to heterogeneity in behavior even in simple tasks leading to inaccurate behavioral modelling (Shteingart and Loewenstein, 2014). To tackle this challenge, we explicitly instructed participants that one of the boxes will be more frequently rewarded in the reward-seeking condition or punished in the punishment-avoiding condition and that this contingency may reverse several times during the task. In order to further ensure that participants grasp correct model of the task environment, they were provided with the feedback indicating which box is more frequently correct during the first phase of the training. 

# Behavioral data conventions

## Files
Behavioral data is stored in two separate files `behavioral_data_clean_all.npy` and `behavioral_data_clean_all.json`. File with *.npy extension is array containing actual data and corresponding *.json file contains names of corresponding fields. 
1. Aggregated behavioral data loaded from `behavioral_data_clean_all.npy` is called is represented as `beh` variable in the code. 
2. Corresponding metadata dictionary is called as `meta` variable in the code. 

## Fields
Aggregated behavioral data contains **all task information, all behavioral responses  / recordings and timing of all scanner events**.  Multidimensional array `beh` aggregates data for subjects, conditions, trials for different events. Shape of the array is $N_{subjects} \times N_{conditions} \times N_{trials} \times N_{variables}$. 

### Task structure variables
| variable name | code | values |
|--|--|--|
| Correct side (block) | `block` | -1 / 1 |
| Chosen side (block) | `block_bci` | -1 / 1 |
| Correct side | `side` | -1 / 1 |
| Chosen side | `side_bci` | -1 / 1 |
| Left-side magnitude | `magn_left` | `int` from -45 to 45 |
| Right-side magnitude | `magn_right` | `int` from -45 to 45 |

Note the distinction between correct and chosen side. Correct is related to human utility interpretation where correct means *rewarded or not punished*. Chosen is related to **being chosen interpretation** where chosen means *rewarded or punished* (it is used for RL algorithm calculations). In the reward condition all sides that are chosen are correct at the same time, whereas in the punishment condition all sides that are chosen are incorrect. In different parts of the code, different interpretation are used, so they are separately defined to avoid confusion.

### Subject's behavioral variables
| variable name | code | values |
|--|--|--|
| Response | `response` | -1 / 0 / 1 |
| Reaction time (s) | `rt` | `float` from 0 to 1.5 / `nan` |
| Correct choice label | `won_bool` | 0 / 1 |
| Account update | `won_magn` | `int` from -45 to 45 |
| Account balance (after trial) | `acc_after_trial` | `int` from 0 to 2300 |

Response variable is coded the same way as in task structure variables. It is often desirable to convert -1 / 1 coding into 0 / 1 coding, as it is more convenient for JAGS to represent direct probabilities that one side will be chosen. However, in order to incorporate intuitive representation of missing responses as 0, default coding is -1 / 0 / 1 for left / miss / right. 

### Scanner timing variables
| variable name | code |
|--|--|
| main fmri clock time (used for analysis)|`onset_[iti/dec/isi/out]`|
| planned time of stimulus presentation (planned time for stimulus presentation (always slightly behind registered time)| `onset_[iti/dec/isi/out]_plan`|
| time registered with global clock (only for synchronization validation purpose, not used for analysis) |  `onset_[iti/dec/isi/out]_glob` 

Subscripts represent different events related with single trial. `iti` reflects inter-trial-interval onset (before each trial), `dec` is decision phase onset, `isi` is inter-stimulus-interval onset (waiting phase onset) and `out` is outcome phase onset.

## Latent algorithm variables

### Algorithm parameters
| variable name | code | text |
|--|--|--|
| learning rate | `alpha` | $\alpha$ |
| learning rate for positive PE | `alpha_plus` / `alpha[first_index]` | $\alpha_+$ |
| learning rate for negative PE | `alpha_minus` / `alpha[second_index]` | $\alpha_-$ |
| learning rate for reward condition | `alpha_rew` / `alpha[first_index]` | $\alpha_{rew}$ |
| learning rate for punishment condition | `alpha_pun` / `alpha[second_index]` | $\alpha_{pun}$ |
| inverse-temperature | `beta` | $\beta$ |
| loss aversion (prospect theory) | `gamma` | $\gamma$ |
| risk aversion (prospect theory) | `delta` | $\delta$ |

Note that learning rates can be represented either as separate vectors or arrays with columns corresponding to different learning rates. `first_index` and `second_index` are used instead of specific values because in Python `first_index` is 0, whereas in MATLAB it is 1 (and so on). In case of four learning rates model first dimension corresponds to task condition and second dimension corresponds to PE sign. 


### Tracked and computed variables
| variable name | code | text |
|--|--|--|
| Expected probability for side for being chosen | `wch` / `wch_l` and `wch_r` | $\rho$ |
| Expected probability for side for being correct | `wco` / `wco_l` and `wco_r` | $p$|
| Expected value (utility) | `util` / `util_l` and `util_r` | $v$|
| Reward magnitude | `magn` / `magn_l` and `magn_r` | $x$|
| Choice probability | `prob` / `prob_l` and `prob_r` | $P$|

Whenever `l` and `r` suffixes are not used, **array representation** of tracked variables is assumed. In array representation $N_{trials} \times N_{sides}$ array is representing variable state across task time course for both sides simultaneously. 


Function `process_log_df()` cleans behavioral response dataframe changing data types, inverting interpretation of certain variables depending on task condition and drops irrelevant columns.

In [None]:
def process_log_df(df):
    '''Cleaning and pre-processing of log dataframe.
    
    Args:
        df (pd.Dataframe): raw log dataframe
        
    Returns:
        info (dictionary): contains task metadata
        df_clean (pd.Dataframe): pre-processed log dataframe
    '''

    df_clean = df.copy(deep=True)

    # Grab additional info                                                   
    info = {}
    info['n_trials'] = df_clean.shape[0]
    info['n_blocks'] = 5                                                     
    info['condition'] = df_clean['condition'][0]
    info['subject'] = df_clean['subject_id'][0]
    info['group'] = df_clean['group'][0]

    # Reaname non-intuitive columns according to guidlines
    df_clean.rename(columns={'block': 'block_bci'}, inplace=True)
    df_clean.rename(columns={'rwd': 'side_bci'}, inplace=True)

    if info['condition'] == 'pun':
        df_clean['block'] = (-1) * df_clean['block_bci']  
        df_clean['side'] = (-1) * df_clean['side_bci']
    else:
        df_clean['block'] = df_clean['block_bci']  
        df_clean['side'] = df_clean['side_bci']


    # Convert subject responses to integers
    df_clean.loc[df['response'] == 'a', 'response'] = -1
    df_clean.loc[df['response'] == 'd', 'response'] = 1
    try:
        df_clean.loc[df_clean['response'] == 'None', 'response'] = 0
    except:
        pass

    # Convert reaction time to float
    df_clean['rt'] = pd.to_numeric(df_clean['rt'], errors='coerce')

    # Reverse incorrect sign for punishment variables 
    if info['condition'] == 'pun':
        df_clean['won_bool'] = ~ df_clean['won_bool'] 
        df_clean['won_magn'] *= (-1)  
        df_clean['magn_left'] *= (-1)
        df_clean['magn_right'] *= (-1)

    # Drop unnecessary columns
    df_clean = df_clean[['block', 'block_bci', 'side', 'side_bci', 
                         'magn_left', 'magn_right',
                         'response', 'rt', 
                         'won_bool', 'won_magn', 'acc_after_trial',
                         'onset_iti', 'onset_iti_plan', 'onset_iti_glob',
                         'onset_dec', 'onset_dec_plan', 'onset_dec_glob',
                         'onset_isi', 'onset_isi_plan', 'onset_isi_glob',
                         'onset_out', 'onset_out_plan', 'onset_out_glob']]

    return info, df_clean

(1) load logs for all subjects 

(2) cleaning them using `process_log_df()` function

(3) aggregate them to single list containing all dataframes accompanied with metadata

In [None]:
path_logs = '/home/kmb/Desktop/Neuroscience/Projects/BONNA_decide_net/' \
            'data/main_fmri_study/sourcedata/behavioral/task_logs'

subjects = [f'm{sub:02}' for sub in range(2, 34)]

df_all_rew, df_all_pun = [], []

for subject in subjects:

    # Load behavioral responses for signle subject
    path_rew = f"{path_logs}/sub-{subject}/{subject}_prl_DecideNet_rew.csv"
    path_pun = f"{path_logs}/sub-{subject}/{subject}_prl_DecideNet_pun.csv"

    df_rew = pd.read_csv(path_rew)
    df_pun = pd.read_csv(path_pun)

    # Clean behavioral responses
    info_rew, df_rew = process_log_df(df_rew)
    info_pun, df_pun = process_log_df(df_pun)

    df_all_rew.append((info_rew, df_rew))
    df_all_pun.append((info_pun, df_pun))

### Create single variable to represent all behavioral responses
Use aggregated lists of clean dataframes (`df_all_rew` and `dr_all_pun`) and convert them to single numpy array representing all behavioral responses and task onsets. Size of aggregated array is: n_subjects x n_conditions x n_trials x 21. Array metadata decoding dimensions is stored in variable `meta`.

In [None]:
out_path = '/home/kmb/Desktop/Neuroscience/Projects/BONNA_decide_net/' \
           'data/main_fmri_study/sourcedata/behavioral'
filename = "behavioral_data_clean_all_REF"

beh = np.zeros((len(subjects), 2, 110, 23))

# create & save metadata
meta = {}
meta['dim1'] = subjects
meta['dim2'] = ['rew', 'pun']
meta['dim3'] = [f'trial_{i+1}' for i in range(110)]
meta['dim4'] = list(df_all_rew[0][1].keys())

meta_path = os.path.join(out_path, f"{filename}.json")
with open(meta_path, 'w') as f:
    json.dump(meta, f, indent=4)

# create & save numpy aggregated array
for i, (df_rew, df_pun) in enumerate(zip(df_all_rew, df_all_pun)):
    beh[i, 0] = np.array(df_rew[1], dtype='float')
    beh[i, 1] = np.array(df_pun[1], dtype='float')
    
beh_path = os.path.join(out_path, f"{filename}.npy")
np.save(beh_path, beh)

### Visualisation of subject responses
Function `plot_response()` creates friendly visualisation of subject responses throughout the task. Visualisation consists of:
- **top panel**: represents internal task structure
    - blue and yellow blocks show stable phases for which box reward probabilities do not change (color is coding more profitable side; blue=left, yellow=right)
    - dark blue line show reward magnitude for the left box
    - yellow and blue dots show winning sides (rewarded / not punished)
- **middle panel**: represents subject's reaction times and account balance
    - red line: reaction time, 
    - red rectangles: highlight misses
    - black dashed line: account balance throughout the task
    - dark shaded area: trials for which subject crossed reward threshold
- **bottom panel**: represetnes subject's trialwise responses 
    - green dot = rewarded / not punished; red dot = not rewarded / punished
    - dark dashed line: idle time (how many stable trials subject experienced)
    - colored rectangles: which side is more profitable in terms of reward magnitude

Load aggregated behavioral data.

In [None]:
import sys
sys.path.append('/home/kmb/Desktop/Neuroscience/Projects/BONNA_decide_net/code')
from dn_utils.behavioral_models_REF import load_behavioral_data

beh_path = "/home/kmb/Desktop/Neuroscience/Projects/BONNA_decide_net/" \
           "data/main_fmri_study/sourcedata/behavioral/"
beh_meta = load_behavioral_data(beh_path)

In [None]:
def plot_response(beh, meta, subject, condition, save=False, **kwargs):
    '''Visualising useful aspects of subjects responses.
    
    Args:
        beh (np.array): aggregated behavioral responses
        meta (dict): description of beh array coding
        subject (int): subject index
        condition (int): task condition index
            0 for reward condition or 1 for punishment condition
        save (bool): should I save your plot?
        ...
        **out_path (Str): path to folder to save plot
    '''
    col_blu = "#56B4E9" # left
    col_yel = "#F0E442" # right
    col_blu_d = "#0B3A54"
    
    # Get proper task & response features
    block_bci = beh[subject, condition, :, meta['dim4'].index('block_bci')]
    side_bci = beh[subject, condition, :, meta['dim4'].index('side_bci')]
    magn_left = beh[subject, condition, :, meta['dim4'].index('magn_left')]
    magn_right = beh[subject, condition, :, meta['dim4'].index('magn_right')]
    response = beh[subject, condition, :, meta['dim4'].index('response')]
    rt = beh[subject, condition, :, meta['dim4'].index('rt')]
    won_bool = beh[subject, condition, :, meta['dim4'].index('won_bool')]
    acc_after_trial = beh[subject, condition, :, meta['dim4'].index('acc_after_trial')]
    magn_diff = magn_right - magn_left

    n_blocks = np.nonzero(np.diff(block_bci))[0].shape[0] + 1
    n_trials = beh.shape[2]
    x_trials = np.arange(1, n_trials+1)
    
    # Determine begin and end of the blocks and rewarded side
    blocks = np.zeros((2, n_blocks+1), dtype='int')
    blocks[0, 0:n_blocks] = np.hstack((
        np.ones((1), dtype=int), 
        np.nonzero(np.diff(beh[subject, condition, :, 0]))[0] + 2
    ))
    blocks[0, n_blocks] = n_trials
    blocks[1, 0:n_blocks] = beh[subject, condition, blocks[0][:-1], 0]

    if condition == 1: blocks[1, :] *= (-1) 

    # Create plot
    fig, (ax1, ax2, ax3) = plt.subplots(3, 1, sharex=True, 
                                        figsize=(20, 10), facecolor='w')

    ### Subplot 1 ###########################################################
    # True reward contingencies (more profitable side)
    for i in range(n_blocks):

        if blocks[1, i] == -1:  col = col_blu
        else:                   col = col_yel

        ax1.fill_between(
            x=[blocks[0, i], blocks[0, i+1]], 
            y1=-1, y2=1, 
            color=col, alpha=.5)

    # Rewarded / punished side
    if condition == 1:    
        ax1.scatter(x_trials, (-1)*side_bci*.9, c=(-1)*side_bci, 
                    cmap='cividis', vmin=-1.5, vmax=1.5)
    else: 
        ax1.scatter(x_trials, side_bci*.9, c=side_bci, 
                    cmap='cividis', vmin=-1.5, vmax=1.5)
    ax1.set_ylim(-1, 1)
    ax1.set_yticks([-0.9, 0.9])
    ax1.set_yticklabels(['left', 'right'])
    ax1.set_ylabel('Better option')

    # Magnitude for left box
    ax1b = ax1.twinx()
    ax1b.plot(x_trials, magn_left, color=col_blu_d)
    ax1b.set_ylabel('$x(t)$ for left box', color=col_blu_d)
    ax1b.set_xlim(1, n_trials)
    if condition == 1: 
        ax1b.set_ylim(-50, 0)
    else:               
        ax1b.set_ylim(0, 50)
    ax1b.set_xlim(1, n_trials)

    ### Subplot 2 ###########################################################
    # Misses
    for miss in np.argwhere(np.isnan(rt)):

        ax2.fill_between(
            x=[miss[0]+.5, miss[0]+1.5],
            y1=0, y2=1.5,
            color='r', alpha=.5)

    # Reaction times
    ax2.plot(x_trials, rt, 'r')
    ax2.set_ylabel('reaction time $[s]$', color='r')
    ax2.set_xticks(blocks[0, 1:-1])
    ax2.set_ylim(0, 1.5)
    ax2.grid(axis='x')

    # Account
    ax2b = ax2.twinx()
    ax2b.plot(x_trials, acc_after_trial, 'k--')
    ax2b.set_ylabel('account balance')

    # Crossing predefined task threshold
    if condition == 1:
        acc_thr = np.ones(x_trials.shape) * 650
        ax2b.fill_between(x_trials, acc_after_trial, acc_thr, 
                          where=acc_after_trial <=acc_thr,
                          color='k', alpha=.2)
    else:
        acc_thr = np.ones(x_trials.shape) * 1150
        ax2b.fill_between(x_trials, acc_after_trial, acc_thr, 
                      where=acc_after_trial >=acc_thr,
                      color='k', alpha=.2)

    ### Subplot 3 ###########################################################
    # Idle time (repeated winning / not loosing side)
    idle = np.zeros(side_bci.shape)
    for i in range(1, len(side_bci)):
        current = side_bci[i]
        last_trials = np.flip(side_bci[:i] == current)
        t = 0
        while last_trials[t] == True: 
            t += 1
            if t == len(last_trials): 
                break
        if condition == 1:  idle[i] = t * current * (-1)
        else:               idle[i] = t * current

    ax3.plot(x_trials, idle, 'k')
    ax3.set_ylim(-np.max(np.abs(idle)) - 2, np.max(np.abs(idle)) + 2)
    ax3.set_ylabel('Idle time')

    # Difference in magnitude
    norm = matplotlib.colors.Normalize(-45, 45)
    colors = [[norm(-45), col_blu],
              [norm(0), "white"],
              [norm(45), col_yel]]
    cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)

    ax3b = ax3.twinx()
    for trial in range(n_trials):
        ax3b.fill_between(
            x=[trial+.5, trial+1.5], 
            y1=-1, y2=1, 
            color=cmap(norm(magn_diff[trial])), alpha=.7)

    # Subject respnses
    ax3b.scatter(x_trials, response*.75, c=won_bool, 
                cmap='RdYlGn', vmin=-.2, vmax=1.2, s=50)
    ax3b.set_ylim(-1, 1)
    ax3b.set_yticks([-0.75, 0, .75])
    ax3b.set_yticklabels(['left', 'miss', 'right'])
    ax3b.grid(axis='both')    
    ax3b.spines['top'].set_color(col_yel)
    ax3b.spines['top'].set_linewidth(3)
    ax3b.spines['bottom'].set_color(col_blu)
    ax3b.spines['bottom'].set_linewidth(3)

    if save:

        if "out_path" in kwargs:  out_path = kwargs["out_path"] + "/"
        else:                     out_path = ""

        filename = f"{out_path}sub-{meta['dim1'][subject]}_{meta['dim2'][condition]}_respplot"
        plt.savefig(filename)    
        
        plt.close()

Show example plot.

In [None]:
plot_response(beh, meta, 5, 0)

Generate and save response plots.

In [None]:
out_path = "/home/kmb/Desktop/Neuroscience/Projects/"\
           "BONNA_decide_net/code/behavioral_analysis/figures/respplots"

for i in range(n_subjects):
    # Save respplots to file
    plot_response(beh, meta, i, 0, 
                  save=True, out_path=out_path);
    plot_response(beh, meta, i, 1,
                  save=True, out_path=out_path);