# Optimal bias 

Script to calculate the optimal decision boundary for all participants in all sessions. Results are stored in subdiretories named '/dayxx/sessionxx/' into files 'optimalH_Subxxxx_Dayxx_Sessxx.json'. This script should be run after running the single_session_analysis for every session.

## Non optout trials

Proportion of right answers vs. signed stimuli. 

$y = \frac{1}{1+\exp{-(\beta_0+\beta_1 x)}}$

Sigmoid function as $sigmoid(z)$. 

- Bias: $\beta_0$

- Sensitivity: $ \beta_1 $

If $\beta_0=0$ (no bias), then $y(x=0)=1/2$. If $\beta_0>0$, then then $y(x=0)>1/2$ exposing a bias to rightward answers. Otherwise, there is a bias toward the left option. 

Under the assumption that we have the good stimulus scale, we can estimate the noise in the internal response $\sigma^2$ and the decision boundary $H$ with the non-optout trials.

If $x$ is the stimulus strength (i.e. signed signal), the participant observes $\hat{x} = x + \eta$, where $\eta ~ N(0,\sigma^2)$, and the participant answers to the right if $\hat{x}>H$.

Thus, we have $p(rightward|x) = \int _{[H,+ \infty ]} (x+ \eta) d \eta = \int _{[H-x,+\infty]} \eta d \eta = \Phi(\frac{x-H}{\sigma})$ with $\Phi$ as the standard normal cumulative.

By fitting the psychometric curve we could estimate the noise $\sigma$ and the decision boundary $H$. 

$p(rightward|x) = \Phi(\beta_0 + \beta_1 x)$, with $\beta_1=1/\sigma$ and $\beta_0=-H/\sigma$

We fitted the psychometric curve with the logistic regression:

$p(rightward|x) = \frac{1}{1+e^{-(\beta_0+\beta_1 x)}}$, 

with $\sigma=1/\beta_1$ and $H=-\beta_0\sigma$.

## Opt-out trials

In these trials, the participant has 3 options: L (leftward), R (rightward) and O (opt-out), where the proportion of rightward (leftward) answers vs. the signed stimuli is plotted as psychometric curves. The noise model is assumed to be multinomial, that is:

$p(y_R,y_L,y_O,p_R,p_L,p_O) = \frac{n!}{(y_Ln)!(y_Rn)!(y_On)!}p_L^{y_Ln}p_R^{y_Rn}p_O^{y_On}$, with $y_L+y_R+y_O=1$ and $y_k=\frac{n_k}{n}$ are the fraction of trials where the participant chose each type of response.

Assuming that $p_L \sim 1-\Phi(\frac{x-H_L}{\sigma})$, $p_R \sim \Phi(\frac{x-H_R}{\sigma})$ and $p_O = 1-p_L-p_R$. The stimulus was re-escaled to be bounded within the range $[-1,1]$.

We are going to estimate $H_L,H_R$ and $\sigma$ with maximum likelihood.

### Logistic regression when the participant did not choose the optout

Finally, if the participant did not choose the optout option, then $p_O \sim 0$ and the multinomial probability distribution tend to be binomial. Thus, if the mean optout elections over all the stim is zero, we fit the psychometric curves with the logistic regression over the points where the participant did not choose the optout option.

The x-axis: The 6 presented stimuli ordered as *[l1,l2,l3,r3,r2,r1]*, where 'l' means left, 'r' means right, and the numbers are the corresponding values for difficulty. Right stimuli are positive, while left stimuli are negative. 

## Optimal decision boundary in opt-out trials

In the deterministic stage, the participant receives $3$ points as reward in every correct trial, $2$ points if she opts out, and $0$ points for incorrect answers. In these trials, the participant has the possibility to opt out when she has not a certain answer. Thus, she would have a more conservative criterion to choose between left or right responses. The rightward decision boundary ($H_R$) is defined as the $\hat{x}$ where $p(x>0|\hat{x})=2/3$. That is when the probability of obtaining $3$ points as reward, if the stimulus was rightward, is equal or larger than $2/3$, otherwise, it is more convenient to opt-out and collect $2$ points. Thus, the participant is going to choose when the probability of answering correctly doubles the probability of making a mistake, if the presented stimulus was rightward, then 

$\frac{p(x>0|\hat{x})}{p(x<0|\hat{x})}=\
\frac{\sum_{x>0}p(x|\hat{x})}{\sum_{x<0}p(x|\hat{x})}=\
\frac{\sum_{x>0}p(\hat{x}|x)p(x)}{\sum_{x<0}p(\hat{x}|x)p(x)}=2$.$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  $ (1)

## Perceptual bias and continous stimuli

Assuming that, the bias exposes by the participant in the trials without the optout option is a perceptual bias, then

$p(\hat{x}|x)=\it{N}(\hat{x}|x-H,\sigma^2)=\it{N}(\hat{x}+H|x,\sigma^2)$,

where $\sigma$ is obtained from the resulting fit of the psychometric curves and $x$ corresponds to the presented stimulus.

If the participant does not know there are 6 stimulus values, then 

$p(x)=\it{N}(\hat{x}|0,\epsilon^2)$,

where $\epsilon=std(\textbf{x})$.

In [None]:
import pandas as pd
import os
import json
import numpy as np
from itertools import groupby
import matplotlib.pyplot as plt
from scipy import stats
import matplotlib as mpl
from sklearn.linear_model import LogisticRegression
import random
import re
import csv
from IPython.display import HTML, display, Image
import tabulate
import math as m
import warnings
warnings.filterwarnings('ignore')
import scipy
from scipy.stats import norm

mpl.rcParams['lines.linewidth'] = 3
mpl.rcParams['axes.titlesize'] = 18
mpl.rcParams['axes.labelsize'] = 18
mpl.rcParams['lines.markersize'] = 10
mpl.rcParams['xtick.labelsize'] = 20
mpl.rcParams['ytick.labelsize'] = 20
mpl.rcParams['axes.linewidth'] = 3
#mpl.rcParams['xtick.major.size'] = 20
mpl.rcParams['xtick.major.width'] = 4
#mpl.rcParams['xtick.minor.size'] = 10
mpl.rcParams['xtick.minor.width'] = 2
mpl.rcParams['ytick.major.width'] = 4
mpl.rcParams['ytick.minor.width'] = 2

In [None]:
current_path = os.path.abspath(os.getcwd())
parent_path = os.path.abspath(os.path.join(current_path, os.pardir))
grand_parent_path = os.path.abspath(os.path.join(parent_path, os.pardir))
main_path = os.path.abspath(os.path.join(grand_parent_path, os.pardir))

path_results = main_path+'/results/dots/'

In [None]:
def cumul_norm(x,H,s):
    # cdf = (1/(s*sqrt(2*pi)))*int_-inf^x(exp(-((t-H)/s)^2/2)dt)
    return norm.cdf((x-H)/s)

In [None]:
def find_HR4(xhat,xstims,sigma,H,exp_points):
    epsilon = np.std(xstims)
    e2 = epsilon*epsilon
    s2 = sigma*sigma
    mu_ = (xhat+H)*e2/(s2+e2)
    s_ = np.sqrt(s2*e2/(s2+e2))
    ncdf = cumul_norm(0,mu_,s_)
    return (1-ncdf)/ncdf-exp_points

In [None]:
def pR4(xhat,xstims,sigma,H):
    epsilon = np.std(xstims)
    e2 = epsilon*epsilon
    s2 = sigma*sigma
    mu_ = (xhat+H)*e2/(s2+e2)
    s_ = np.sqrt(s2*e2/(s2+e2))
    ncdf = cumul_norm(0,mu_,s_)
    ctte = scipy.stats.norm(xhat,np.sqrt(e2+s2)).pdf(0)
    return ctte*(1-ncdf)

def pL4(xhat,xstims,sigma,H):
    epsilon = np.std(xstims)
    e2 = epsilon*epsilon
    s2 = sigma*sigma
    mu_ = (xhat+H)*e2/(s2+e2)
    s_ = np.sqrt(s2*e2/(s2+e2))
    ncdf = cumul_norm(0,mu_,s_)
    ctte = scipy.stats.norm(xhat,np.sqrt(e2+s2)).pdf(0)
    return ctte*(ncdf)

In [None]:
fday = [1,2,3,4,5,6,7,8,9,10]
fsession = [1,2]

In [None]:
xhat = np.arange(-0.2,0.2,0.05)
for Day in fday:
    for Ses in fsession:
        sessionid = 2*Day-2+Ses
        path = path_results+'day'+str(Day)+'/session'+str(Ses)+'/'
        # sort files
        NOfit_files = [f for f in os.listdir(path) if f.startswith('NO_fit')]
        subj_NOfit = [int(re.search('%s(.*)%s' % ('NO_fit_Sub', '_Day'), f).group(1)) for f in NOfit_files]
        sorted_subj_NOfit = sorted(subj_NOfit)
        index_subj_NOfit = [subj_NOfit.index(elem) for elem in sorted_subj_NOfit]
        sorted_NOfit_files = [NOfit_files[i] for i in index_subj_NOfit]
        
        theo_pR = np.zeros(len(xhat))
        ind = -1
        for part in sorted_NOfit_files:
            ind += 1
            partid = sorted_subj_NOfit[ind]
            part_sessid = str(partid)+'_'+str(sessionid)
            # psychometric curve NON-optout
            f = sorted_NOfit_files[ind]
            filename=path+f
            with open(filename) as f:
                data = json.load(f)
            for k, v in data.items():
                globals()[k]=v
            HRopt_ = scipy.optimize.fsolve(find_HR4,x0=[0.1],args=(signed_st,Sigma,Hno,2))
            SO_HRopt_ = scipy.optimize.fsolve(find_HR4,x0=[0.1],args=(signed_st,Sigma,Hno,4))
            for h in range(len(xhat)):
                theo_pR[h]=pR4(xhat[h],signed_st,Sigma,Hno)/\
                        (pL4(xhat[h],signed_st,Sigma,Hno)+pR4(xhat[h],signed_st,Sigma,Hno))               
                
            # write the result in file
            filename=path+'optimalH_Sub'+str(partid)+'_Day'+str(Day)+'_Sess'+str(Ses)+'.json'
            dict_ = {
                "HRopt_":HRopt_[0],
                "SO_HRopt_":SO_HRopt_[0],
                "xhat":list(xhat),
                "theo_pR":list(theo_pR),
                "partid": partid,
                "sessionid":sessionid,
                "part_sessid":part_sessid                
            }
            # Serializing json  
            json_object = json.dumps(dict_) 

            # Writing to sample.json 
            with open(filename, "w") as outfile: 
                outfile.write(json_object) 

In [None]:
xhat = np.arange(-0.2,0.2,0.05)
path = path_results+'day'+str(10)+'/session'+str(2)+'/'
sessionid = 2*10-2+2
# sort files
NOfit_files = [f for f in os.listdir(path) if f.startswith('NO_fit')]
subj_NOfit = [int(re.search('%s(.*)%s' % ('NO_fit_Sub', '_Day'), f).group(1)) for f in NOfit_files]
sorted_subj_NOfit = sorted(subj_NOfit)
index_subj_NOfit = [subj_NOfit.index(elem) for elem in sorted_subj_NOfit]
sorted_NOfit_files = [NOfit_files[i] for i in index_subj_NOfit]

theo_pR = np.zeros(len(xhat))
theo_pL = np.zeros(len(xhat))
ind = -1
for part in sorted_NOfit_files:
    ind += 1
    partid = sorted_subj_NOfit[ind]
    part_sessid = str(partid)+'_'+str(sessionid)
    # psychometric curve NON-optout
    f = sorted_NOfit_files[ind]
    filename=path+f
    with open(filename) as f:
        data = json.load(f)
    for k, v in data.items():
        globals()[k]=v
    HRopt_ = scipy.optimize.fsolve(find_HR4,x0=[0.1],args=(signed_st,Sigma,Hno))
    if part == 'NO_fit_Sub3060_Day10_Sess2.json':
        print(part)
        for h in range(len(xhat)):
            theo_pR[h]=pR4(xhat[h],signed_st,Sigma,Hno)/\
                    (pL4(xhat[h],signed_st,Sigma,Hno)+pR4(xhat[h],signed_st,Sigma,Hno))
            theo_pL[h]=pL4(xhat[h],signed_st,Sigma,Hno)/\
                    (pL4(xhat[h],signed_st,Sigma,Hno)+pR4(xhat[h],signed_st,Sigma,Hno))    

In [None]:
print(theo_pL+theo_pR)

In [None]:
HRopt_

In [None]:
plt.plot(xhat,theo_pR)
plt.plot(xhat,theo_pL)