# Stimulus Coding Example

In this tutorial we illustrate how to use the regression approach to model the effect of stimulus coding on the drift rate parameter of the DDM.

### Import Modules

In [1]:
from copy import deepcopy

import arviz as az
import pandas as pd

import hssm

## What is Stimulus Coding?

There are two core approaches to coding the stimuli when fitting paramters of 2-choice SSMs (the discussion here is simplified, to bring across the core ideas, de facto ideas from both approaches can be mixed):

1. *Accuracy coding*: Responses are treated as **correct** or **incorrect**
2. *Stimulus coding*: Responses are treated as **stimulus_1** or **stimulus_2**

Take as a running example a simple random dot motion task with two conditions, `left` and `right`. Both conditions are equally *difficult*, but for half of the experiments the correct motion direction is left, and for the other half it is right.

So it will be reasonable to assume that, ceteris paribus, nothing should really change in terms of participant behavior, apart from symmetrically preferring right to left when it is correct and vice versa. 

Now, when applying *Accuracy coding*, we would expect the drift rate to be the same for both conditions, any condition effect to vanish by the time we code responses as correct or incorrect.

When we apply *Stimulus coding* on the other hand, we actively need to account for the direction change (since we now attach our *response values*, e.g. `-1`, `1`, permanently to specific choice-options, regardless correctness). 

To formulate a model that is equivalent to the one described above in terms of *accuracy coding*, we again want to estimate only a single `v` parameter, but we have to respect the direction change in response when respectively completing experiment conditions `left` and `right`.

Note that an important aspect of what we describe above is that we want to estimate a single `v` parameter in each of the two *coding approaches*.

For *Accuracy coding* we simply estimate a single `v` parameter, and no extra work is necessary.

For *Stimulus coding* we need to account for **symmetric** shift in direction from the two experiment conditions. One way to do this, is the following:

We can simply assign a covariate, `direction`, which codes `-1` for `left` and `1` for `right`.
Then we use the following regression formula for the `v` parameter: `v ~ 0 + direction`. 

Note that we are *not using an intercept* here.

Let's how this works in practice.

## Simulate Data


In [2]:
# Condition 1
stim_1 = hssm.simulate_data(
    model="ddm", theta=dict(v=-0.5, a=1.5, z=0.5, t=0.1), size=500
)

stim_1["stim"] = "C-left"
stim_1["direction"] = -1
stim_1["response_acc"] = (-1) * stim_1["response"]

# Condition 2
stim_2 = hssm.simulate_data(
    model="ddm", theta=dict(v=0.5, a=1.5, z=0.5, t=0.1), size=500
)

stim_2["stim"] = "C-right"
stim_2["direction"] = 1
stim_2["response_acc"] = stim_2["response"]

data_stim = pd.concat([stim_1, stim_2]).reset_index(drop=True)

data_acc = deepcopy(data_stim)
data_acc["response"] = data_acc["response_acc"]

print(data_acc.head())
print(data_stim.head())

         rt  response    stim  direction  response_acc
0  1.935823       1.0  C-left         -1           1.0
1  1.392043       1.0  C-left         -1           1.0
2  2.474422       1.0  C-left         -1           1.0
3  1.984574       1.0  C-left         -1           1.0
4  6.067898       1.0  C-left         -1           1.0
         rt  response    stim  direction  response_acc
0  1.935823      -1.0  C-left         -1           1.0
1  1.392043      -1.0  C-left         -1           1.0
2  2.474422      -1.0  C-left         -1           1.0
3  1.984574      -1.0  C-left         -1           1.0
4  6.067898      -1.0  C-left         -1           1.0


## Set up Models

### Accuracy Coding


In [3]:
m_acc_stim_dummy = hssm.HSSM(
    data=data_acc,
    model="ddm",
    include=[{"name": "v", "formula": "v ~ 1 + stim"}],
    z=0.5,
)

m_acc_stim_dummy.sample(sampler="mcmc", tune=500, draws=500)

m_acc_stim_dummy.summary()

Model initialized successfully.
Using default initvals. 



Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [t, a, v_Intercept, v_stim]


Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 7 seconds.
100%|██████████| 2000/2000 [00:00<00:00, 2732.49it/s]


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
t,0.125,0.021,0.086,0.164,0.001,0.0,1688.0,1333.0,1.0
a,1.546,0.03,1.493,1.603,0.001,0.001,1667.0,1600.0,1.0
v_stim[C-right],-0.052,0.048,-0.14,0.04,0.001,0.001,1709.0,1342.0,1.0
v_Intercept,0.612,0.037,0.54,0.68,0.001,0.001,1594.0,1540.0,1.0


In [4]:
m_acc_simple = hssm.HSSM(
    data=data_acc,
    model="ddm",
    include=[
        {
            "name": "v",
            "formula": "v ~ 1",
            "prior": {"Intercept": {"name": "Normal", "mu": 0.0, "sigma": 3.0}},
        }
    ],
    z=0.5,
)

m_acc_simple.sample(sampler="mcmc", tune=500, draws=500)

m_acc_simple.summary()

Model initialized successfully.
Using default initvals. 



Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [t, a, v_Intercept]


Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 8 seconds.
100%|██████████| 2000/2000 [00:00<00:00, 2758.42it/s]


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
t,0.124,0.021,0.082,0.159,0.001,0.001,1216.0,893.0,1.0
a,1.548,0.03,1.491,1.602,0.001,0.001,1290.0,1204.0,1.0
v_Intercept,0.585,0.026,0.537,0.637,0.001,0.001,1282.0,1350.0,1.01


In [5]:
az.compare({"m_acc_simple": m_acc_simple.traces, 
            "m_acc_stim_dummy": m_acc_stim_dummy.traces})

Unnamed: 0,rank,elpd_loo,p_loo,elpd_diff,weight,se,dse,warning,scale
m_acc_simple,0,-1984.071347,2.891709,0.0,0.89952,34.937772,0.0,False,log
m_acc_stim_dummy,1,-1984.485112,3.889187,0.413765,0.10048,35.007229,1.018139,False,log


## Stim Coding

In [6]:
m_stim = hssm.HSSM(
    data=data_stim,
    model="ddm",
    include=[
        {
            "name": "v",
            "formula": "v ~ 0 + direction",
            "prior": {"direction": {"name": "Normal", "mu": 0.0, "sigma": 3.0}},
        }
    ],
    z=0.5,
)

m_stim.sample(sampler="mcmc", tune=500, draws=500)

m_stim.summary()

Model initialized successfully.
Using default initvals. 



Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [t, a, v_direction]


Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 7 seconds.
100%|██████████| 2000/2000 [00:00<00:00, 2758.27it/s]


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
t,0.126,0.021,0.087,0.165,0.001,0.0,1060.0,1018.0,1.0
a,1.544,0.03,1.487,1.597,0.001,0.001,1016.0,1112.0,1.0
v_direction,0.584,0.026,0.537,0.635,0.001,0.001,1297.0,1328.0,1.0


In [7]:
az.compare(
    {
        "m_acc_simple": m_acc_simple.traces,
        "m_acc_stim_dummy": m_acc_stim_dummy.traces,
        "m_stim": m_stim.traces,
    }
)

Unnamed: 0,rank,elpd_loo,p_loo,elpd_diff,weight,se,dse,warning,scale
m_acc_simple,0,-1984.071347,2.891709,0.0,0.899303,34.937772,0.0,False,log
m_stim,1,-1984.08376,2.914335,0.012413,0.0,34.94744,0.120808,False,log
m_acc_stim_dummy,2,-1984.485112,3.889187,0.413765,0.100697,35.007229,1.018139,False,log


## Stim coding advanced

So far we focused on the `v` parameter. The are two relevant concepts concerning `bias` that we need to account for in the *stimulus coding* approach: 

#### 1. Bias in `v`:

What is drift bias? Imagine our experimental design is such that the correct motion direction is left for half of the experiments and right for the other half. However, the sensory stimuli are such that the participant will nevertheless be accumulating excess evidence toward the left direction, even when the correct motion direction is right for a given trial.
To account for drift bias, we simply include an `Intercept` term, which will capture the drift bias, so that the `direction` term will capture the *direction effect*, a symmetric shift around the `Intercept` (previously this `Intercept` was set to 0, or appeared in the model that operated on a dummy `stim` variable, which remember, creates a models that is too complex / has unnecessary extra parameters).

 #### 2. Bias in `z`:

Bias in the `z` parameter gets a bit more tricky. What's the idea here? The `z` parameter represents the *starting point bias*. This notion is to some extend more intuitive when using *stimulus coding* than *accuracy coding*. A starting point bias under the stimulus coding approach is a bias toward a specific choice option (direction). A starting point bias under the accuracy coding approach is a ... bias toward a correct or incorrect response ... (?)

By itself not a problem, but to create the often desired symmetry in the `z` parameter across `stim` conditions, keeping in mind that bias takes values in the interval `[0, 1]`, we need to account for the direction effect in the `z` parameter. So if in the `left` condition $z_i = z$, then in the `right` condition $z_i = 1 - z$.

How might we incoporate this into our regression framework?

Consider the following varible $\mathbb{1}_{C_i = c}, \text{for} \ c \in \{left, right\}$ which is 1 if the condition is `left` and 0 otherwise for a given trial. Now we can write the following function for $z_i$,


$$  z_i = \mathbb{1}_{(C_i = left)} \cdot z + (1 - \mathbb{1}_{(C_i = left)}) \cdot (1 - z) $$

which after a bit of algebra can be rewritten as,

$$ z_i = \left((2 \cdot \mathbb{1}_{(C_i = left)}) - 1\right) \cdot z + (1 - \mathbb{1}_{(C_i = left)}) $$

or,

$$ z_i = \left((2 \cdot \mathbb{1}_{(C_i = left)}) - 1\right) \cdot z + \mathbb{1}_{(C_i = right)} $$

This is a linear function of the `z` parameter, so we will be able to include it in our model, with a little bit of care.

You will see the use of the `offset` function, to account for the `right` condition, and we will a priori massage our data a little to define the `left.stimcoding` and `right.stimcoding` covariates (dummy variables that identify the `left` and `right` conditions). 

### Defining the new covariates

In [8]:
# Folling the math above, we can define the new covariates as follows:
data_stim["left.stimcoding"] = (2 * (data_stim["stim"] == "C-left").astype(int)) - 1
data_stim["right.stimcoding"] = (data_stim["stim"] == "C-right").astype(int)

### Defining the model

Below an example of a model that take into account both the bias in `v` and in `z`.

In [9]:
m_stim_inc_z = hssm.HSSM(
    data=data_stim,
    model="ddm",
    include=[
        {
            "name": "v",
            "formula": "v ~ 0 + direction",
            "prior": {"direction": {"name": "Normal", "mu": 0.0, "sigma": 3.0}},
        },
        {
            "name": "z",
            "formula": "z ~ 0 + left.stimcoding + offset(right.stimcoding)",
            "prior": {
                "left.stimcoding": {"name": "Uniform", "lower": 0.0, "upper": 1.0},
            },
        },
    ],
)

m_stim_inc_z.sample(sampler="mcmc", tune=500, draws=500)

m_stim_inc_z.summary()

Model initialized successfully.
Using default initvals. 



Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [t, a, v_direction, z_left.stimcoding]


Output()

Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 10 seconds.
100%|██████████| 2000/2000 [00:00<00:00, 2704.58it/s]


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
t,0.121,0.025,0.077,0.17,0.001,0.001,918.0,891.0,1.0
a,1.547,0.031,1.487,1.602,0.001,0.001,971.0,1118.0,1.0
z_left.stimcoding,0.504,0.013,0.479,0.528,0.0,0.0,1273.0,1304.0,1.0
v_direction,0.591,0.033,0.53,0.653,0.001,0.001,1345.0,1200.0,1.0


In [10]:
az.compare(
    {
        "m_acc_simple": m_acc_simple.traces,
        "m_acc_stim_dummy": m_acc_stim_dummy.traces,
        "m_stim": m_stim.traces,
        "m_stim_inc_z": m_stim_inc_z.traces,
    }
)

Unnamed: 0,rank,elpd_loo,p_loo,elpd_diff,weight,se,dse,warning,scale
m_acc_simple,0,-1984.071347,2.891709,0.0,0.8995027,34.937772,0.0,False,log
m_stim,1,-1984.08376,2.914335,0.012413,0.0,34.94744,0.120808,False,log
m_acc_stim_dummy,2,-1984.485112,3.889187,0.413765,0.1004973,35.007229,1.018139,False,log
m_stim_inc_z,3,-1985.096863,3.933478,1.025517,5.662137e-15,34.972019,0.226523,False,log
