#    Simulation Equations

#### Assumptions

The two-alternative drift diffusion model [DDM] relies on three fundamental assumptions: 

I. Evidence favoring each alternative is integrated over time.

II. Decision process errors are normally distributed with a dispersion of $\sigma^2$, [$\epsilon \textrm{~} N(0,\sigma^2)$] 

III. A decision is made when sufficient evidence has accumulated for one alternative over the other.

*(The physics of optimal decision making ~ Bogacz et al., 2006)*

#### Nomenclature

<center>**Bayesian belief model parameters**</center>
$$
\begin{align}
\mu = \textrm{mean reward difference between targets} && \Omega = \textrm{change point probability}\\
\hat{\delta} = \textrm{reward prediction error} && \delta = \textrm{dist. mean belief prediction error}\\
\sigma^2_n = \textrm{variance of the generative distribution} && \sigma^2_t = \textrm{estimated variance}\\ 
\phi = \textrm{model confidence} && H = \textrm{hazard rate}\\
r_t = \textrm{reward observed}\\ 
\end{align}
$$
<br>
<center>**DDM parameters** </center>
$$
\begin{align}
\theta= \textrm{execution state} && \tau = \textrm{time step}\\
a = \textrm{boundary} && v_{\theta} = \textrm{execution drift rate}\\  
 && \epsilon = \textrm{error} 
\end{align}
$$
<br>
<center>**Adaptive parameters**</center>
$$
\begin{align}
A = \textrm{boundary height learning rate}  && V = \textrm{drift learning rate} \\
\end{align}
$$

### Decision

For a given trial, 
$$\theta(\tau)= \theta(\tau-\Delta\tau) + v_{\theta}\Delta\tau + \epsilon(\tau)$$

where 
$$\theta = V_{t_1} - V_{t_2}$$ 

<br>
An action is executed when the decision process for a given time step crossed the boundary [$a$]: 
$$\textrm{response} = \begin{cases} 1 \;  \textrm{if} \; \theta(\tau) \geq a \\
\textrm{else} \; 0\end{cases}\
$$


### Bayesian belief and learning rate estimation

**Empirically derived learning rates**<br>
Participant learning rates are estimated for each trial by calculating the ratio of observed reward displacement from one trial to the next to the prediction error. How much does each new outcome influence subsequent prediction?
$$\hat{\alpha_t} = \frac{r_{t+1} - r_t}{\hat\delta_t}$$

The reward prediction error for the learning rate estimate for each participant is the difference between the highest mean reward on that trial [$\mu_t$] and the reward associated with the selected target [$r_t$]: 
$$\hat{\delta_t} = \mu_t - r_t$$

**Model-based belief and learning rate estimation**<br>
The learning rate of the model [$\alpha$] is influenced by the change point probability, [$\Omega$, the model's suspicion that the location of the mean has shifted] and the model confidence [$\phi$, uncertainty arising from imprecise estimate of the mean]. The learning rate should be high if either 1) a change in the mean of the distribution of reward is likely [$\Omega$ is high] or 2) the estimate of the mean is highly imprecise [$\sigma^2_n$ is high].
$$\alpha_t = \Omega_t + (1-\Omega)(1-\phi_t)$$

The belief estimate of the mean of the distribution of rewards on the next trial: 
$$B_{t+1} = B_t + \alpha_t\delta_t$$

The prediction error, $\delta$, is the difference between the model belief and the current sample: 
$$\delta_t = r_t - B_t$$

If $\alpha_t$ is 0, the current sample will not update the model belief estimate at all but if 
$\alpha_t$ is 1, the current sample will entirely dictate the model's belief estimate. 
***
<br>
**Changepoint probability**<br>
The changepoint probability is the likelihood that a new sample is drawn from the same Gaussian distribution centered about the current belief estimate of the model relative to the likelihood that a new sample is drawn from a uniform distribution. The changepoint probability will be close to 1 as the relative probability of a sample coming from a uniform distribution increases. H is the probability that the mean of the distribution has changed. 

$$\Omega_t = \frac{U(r_t)H}{U(r_t)H + N(r_t|B_t,\sigma^2_t)(1-H)}$$

**Estimated variance**<br>
$$\sigma^2_t = \sigma^2_n + \frac{(1-\phi_t)\sigma^2_n}{\phi_t}$$


**Model confidence**<br>
The model confidence [$\phi$] is a function of the changepoint probability [$\Omega$] and the variance of the generative distribution [$\sigma^2_n$]. The first term is the variance when a changepoint is assumed to have occurred. The second term is the variance conditional on no changepoint (slowly decaying uncertainty). The third term is the rise in uncertainty when the model is unsure whether a changepoint has occurred. The same terms are in the denominator with an added variance term to reflect uncertainty arising from noise. 

$$RU = \frac{\Omega_t\sigma^2_n + (1-\Omega_t)(1-\phi_t)\sigma^2_n + \Omega_t(1-\Omega_t)(\delta_t\phi_t)^2}{\Omega_t\sigma^2_n + (1-\Omega_t)(1-\phi_t)\sigma^2_n + \Omega_t(1-\Omega_t)(\delta_t\phi_t)^2+\sigma^2_n}$$
<br>
$$\phi_{t+1} =  1 - RU$$

_*note that the calculation of model confidence in the paper is actually reward uncertainty, so we take the additive inverse*_ <br>
*Vaghi et al., 2017*
<br>



### Potential ways to update adaptive parameters for RADD model


<center>**Bound**</center>
Boundary height may adapt as a function of change point probability alone: 
$$\hat{A_t} = a_o + \beta_1\Omega_t $$
Or boundary height may adapt as a function of an interaction between change point probability and model confidence: 
$$\hat{A_t} = a_0 + \beta_1\Omega_t + \beta_2\phi_t + \beta_3\Omega_t\phi_t$$
<br>

<center>**Drift**</center>
Drift rate may adapt as a function of the difference in mean target reward alone: 
$$\hat{V_t} = v_o + \beta_1\mu_t$$
Or drift rate may adapt as a function of an interaction between the difference in mean target reward and model confidence: 
$$\hat{V_t}= v_o + \beta_1\mu_t + \beta_2\phi_t + \beta_3\mu_t\phi_t$$


### Cost [to be updated]

The deviance of model predictions [$\hat{\mu}$] from the observed data [$\mu$] is evaluated according to the following chi-square function of residuals weighted by variance, where $N$ corresponds to the total number of epochs: 

$$X^2 = \sum_i^{N}W_{acc,\:i}\;(\mu_{acc,i} - \hat{\mu}_{acc,i})^2 + \sum_i^{N}W_{rt,\:i}\;(\mu_{rt,i} - \hat{\mu}_{rt,i})^2$$