In [1]:
import hydra
from ddm_stride.pipeline.evaluate import load_experimental_data
from ddm_stride.pipeline.simulate import build_simulator
from ddm_stride.pipeline.infer import build_proposal
import warnings
warnings.filterwarnings('ignore')

## Pre-implemented DDMs

The config files contained in *config/ddm_model* provide information about the Drift-Diffusion Model you want to use for simulating data. The following DDMs have been pre-implemented:

- Basic DDM with parameters `drift`, `starting_point` of the decision variable and `boundary_separation`.    
    SDE: $dy = v dt + \sigma dW$  
    ddm_model file: *basic_ddm.yaml*  
    ddm implementation: `ddm_stride.ddm_models.basic_ddm.BasicDDM`

If you want to use a pre-implemented DDM, copy the content of its ddm_model file to your ddm_model file. Take a look at the following steps to determine if you want to adapt the prior distributions of parameters and experimental conditions or if you need to change the observation names. You can skip the blue box **Implement a DDM**.

## Define inputs to the simulator and simulator results

Open your file in *config/ddm_model*.  
Below the simulator configuration, you can see configurations for `parameters`, `experimental conditions` and `observations`. A DDM models observations as a function of parameters and optionally experimental conditions.
- `parameters` represent the values that you want to infer from the experimental data, e.g. the drift rate, boundary height or non-decision time.  
- `experimental conditions` represent additional information that is relevant for the DDM to model observations, but that is known by the user. In contrast to the unknown parameter values, experimental conditions will thus not be inferred! Further information on experimental conditions can be found in the blue box below.
- `observations` usually consist of a reaction time and a choice.

#### Define parameters

You need to define a name for each DDM parameter that you want to infer as well as its prior distribution. The prior distribution specifies the range of sensible parameter values. In order to simulate data with the DDM, samples are drawn from the prior distribution and used as an input to the simulator. It is recommended to use one of the [pytorch distributions](https://pytorch.org/docs/stable/distributions.html) as a prior.

Example:  
Define a parameter `drift` with a prior distribution consisting of a normal distribution with mean 2 and standard deviation 3 as well as a parameter `boundary_separation` with a uniform prior distribution between 2.5 and 5:

```
parameters:
- name: drift
  distribution: 
    _target_: torch.distributions.normal.Normal
    loc: 2
    scale: 3 
- name: boundary_separation
  distribution: 
    _target_: torch.distributions.uniform.Uniform
    low: 2.5
    high: 5
```
`_target_` specifies the name of the distribution function you want to use. `loc` and `scale` are the parameters of the [normal distribution](https://pytorch.org/docs/stable/distributions.html#normal) and must be specified in order to pass the required arguments to the distribution function. Similarly, `low` and `high` are parameters of the [uniform distribution](https://pytorch.org/docs/stable/distributions.html#uniform).

#### Read in experimental data

In order to define the observations and experimental conditions, you need to take a look at your experimental data first. Take the CSV file containing your experimental data and drag it into the *data* folder.  
Subsequently, open your file in *config/tasks* and scroll to the bottom. You will find a configuration called `experimental_data_path`. Add the name of your experimental data file as a value. If everything works correctly, you should be able to see a preview of your data when running the next cell:


In [2]:
with hydra.initialize(config_path='../config'):
    cfg = hydra.compose(config_name='config')

load_experimental_data(cfg)

Unnamed: 0,monkey,rt,coh,correct,choice
0,1,0.355,0.512,1.0,0.0
1,1,0.359,0.256,1.0,1.0
2,1,0.525,0.128,1.0,1.0
3,1,0.332,0.512,1.0,1.0
4,1,0.302,0.032,0.0,0.0
...,...,...,...,...,...
6144,2,0.627,0.032,1.0,1.0
6145,2,0.581,0.256,1.0,1.0
6146,2,0.293,0.512,1.0,1.0
6147,2,0.373,0.128,1.0,0.0


<div style="display:flex"><div style="border-color:rgba(102,178,255,0.75); border-style:solid; padding: 7px; border-width:2px; margin-right:9px"> 
<h3>Experimental conditions</h3>

Experimental conditions might consist of task difficulty or previous choices of a subject.
They are specified similarly to parameters. Even though the experimental conditions are not inferred, a proposal distribution is necessary to sample experimental conditions that are subsequently used to simulate training data.
Make sure that the <code>name</code> of an experimental condition corresponds to its column name in the experimental data.</br></br>

Example:  
The experimental data specifies three levels of task difficulty. 50% of experiments have been performed with task difficulty 1 and 25% each with difficulty levels 2 and 3.  </br>
Use the class <code>ddm_stride.utils.distributions.Categorical</code> to sample the task difficulty levels. The distribution is based on the <a href="https://pytorch.org/docs/stable/distributions.html#categorical">pytorch categorical distribution</a>, but samples the values specified in <code>discrete_values</code> instead of class indices. If you want the probability of experimental conditions to be taken into account, you can specify a probability for each value. Leave <code>probs</code> empty to sample each value with equal probability. Probabilities are normalized automatically. </br>

<pre>
<code>
- name: task_difficulty   
  distribution:
      _target_: ddm_stride.utils.distributions.Categorical
      discrete_values: [1, 2, 3]
      probs: [50, 25, 25]
</code>
</pre>

</div> </div>


#### Define observations

Add observations to the `obervations` config such that `name` corresponds to the column name containing the observation and `variable_type` describes whether the observation is discrete (usually the choice) or continuous (usually the reaction time).  
If all observations and experimental conditions defined in the config file have been found in the experimental data, running the next cell will show you the experimental data again. Otherwise, you will get an error that tells you which observation or experimental condition can't be found.

Example:  
Define a discrete observation `choice` and a continuous observation `rt`:

```
observations:
- name: choice
  variable_type: discrete
- name: rt
  variable_type: continuous
```

In [3]:
with hydra.initialize(config_path='../config'):
    cfg = hydra.compose(config_name='config')

load_experimental_data(cfg)

Unnamed: 0,monkey,rt,coh,correct,choice
0,1,0.355,0.512,1.0,0.0
1,1,0.359,0.256,1.0,1.0
2,1,0.525,0.128,1.0,1.0
3,1,0.332,0.512,1.0,1.0
4,1,0.302,0.032,0.0,0.0
...,...,...,...,...,...
6144,2,0.627,0.032,1.0,1.0
6145,2,0.581,0.256,1.0,1.0
6146,2,0.293,0.512,1.0,1.0
6147,2,0.373,0.128,1.0,0.0


<div style="display:flex"><div style="border-color:rgba(102,178,255,0.75); border-style:solid; padding: 7px; border-width:2px; margin-right:9px"> 

<h2>Implement a DDM</h2>

<i>ddm_stride/ddm_models/base_simulator.py</i> defines an interface for the simulator class. All other files in <i>ddm_stride/ddm_models</i> contain implementations of this interface.

The file <i>ddm_stride/ddm_models/basic_ddm.py</i> contains the implementation of a simple DDM you can use as a reference. Duplicate <i>ddm_stride/ddm_models/basic_ddm.py</i>, rename the duplicated file and open it.   
Choose a name for your DDM and replace <code>BasicDDM</code> as a class name. Adapt the docstring such that it contains a description of your DDM.  

Usually, the only function you need to implement is <code>generate_data</code>. The function receives a dictionary <code>input_dict</code> that contains one value for each parameter and experimental condition you have defined in <i>config/ddm_model</i>. Assuming a parameter is named <code>drift</code> you can access its value by calling <code>input_dict[drift]</code>.  
Use the parameters and experimental conditions to simulate one observation. Return the observation in an numpy array or torch tensor that contains the continuous observations first (e.g. the reaction time) and the discrete observations second (e.g. the choice). 

When initializing the simulator (see below) the class attributes <code>self.inputs</code> and <code>self.simulator_results</code> are printed. <code>self.inputs</code> reminds you of the parameters and experimental conditions available for simulating one observation. All of them can be accessed via the <code>input_dict</code>. <code>self.simulator_results</code> shows you how the observations should be ordered. Please double check that the order of simulation results is correct.

The current DDMs are implement by means of a Runge-Kutte solver provided by <a href="https://pypi.org/project/sdeint/">sdeint</a>. You can use the existing implementations as an example for your DDM implementation. 

</div> </div>

## Initialize the simulator

Open your file in *config/ddm_model* and add the path to your DDM model to the `simulator._target_` configuration. You will need to define the file name as well as the class name, e.g. `ddm_stride.ddm_models.my_ddm.MyDDM`.

Run the cell below to check if your DDM simulator can be used correctly. The simulator should print the parameter names and experimental conditions it utilizes to simulate data as well as the observations/ simulation results.  
Additionally, one sample is drawn from the prior and passed through the simulator.

In [4]:
with hydra.initialize(config_path='../config'):
    cfg = hydra.compose(config_name='config')

simulator = build_simulator(cfg)
random_input = build_proposal(cfg, 'cpu').sample()
simulation_result = simulator(random_input)
print(f'simulation result for input {random_input}: {simulation_result}')

parameter names and experimental conditions:  ['drift', 'boundary_separation', 'starting_point']
simulation results:  ['rt', 'choice']
simulation result for input tensor([-0.7312,  1.5007,  0.5790]): tensor([[1.1972, 0.0000]])


            interpreted as independent of each other and matched in order to the
            components of the parameter.


## Specify the number of simulations

Open your file in *config/task*. In most cases, the only simulation config you need to adapt here is `sim_training_data_params.num_simulations`. DDMs with few parameters might only need 20000 to 50000 simulations, while more complex DDMs or very large prior spaces might need 100000 simulations or more. If you want to increase the number of training or test simulations, you can add new simulations to your previously simulated data later. Take a look at **tutorial_7_reuse_results.ipynb** to learn how to do this.

For some DDM implementations simulating in parallel, i.e. with num_workers > 1 might throw an error.

<div style="display:flex"><div style="border-color:rgba(102,178,255,0.75); border-style:solid; padding: 7px; border-width:2px; margin-right:9px"> 
The value of <code>sim_iid_test_data_params.num_params</code> determines the number of plots in <i>compare_observations.png</i> and <i>posterior_predictive.png</i> that will be created in the diagnose stage. You might want to adapt the number of plots depending on the number of parameters and experimental conditions, since every plot will be based on only one sample of parameters and experimental conditions.

If your simulator is rather slow, it might be useful to play around with different values for <code>simulation_batch_size</code> and <code>num_workers</code> in order to improve simulation performance. You can find more information in the sbi function <a href="https://github.com/mackelab/sbi/blob/7799de5e4bc676ef4a9db304b225503126735f2c/sbi/inference/base.py#L478">simulate_for_sbi</a>. 

</div> </div>

## Simulation step

Progress bars show the current number of simulations that have been computed.
After completing all simulations, the results subfolder should contain a subfolder *simulation_data* containing *training_data.csv*, *test_data.csv* and *iid_test_data.csv*.  
A csv file should only be missing if you load previously simulated data and do not add simulations to the file. For further information see **tutorial_7_reuse_results.ipynb**.

In [5]:
dir = '../results/${result_folder}'

In [8]:
%run ../ddm_stride/run.py hydra.run.dir={dir} run_simulate=True 

parameter names and experimental conditions:  ['drift', 'boundary_separation', 'starting_point']
simulation results:  ['rt', 'choice']


            interpreted as independent of each other and matched in order to the
            components of the parameter.


Running 1000 simulations in 10 batches.:   0%|          | 0/10 [00:00<?, ?it/s]

Running 1000 simulations in 10 batches.:   0%|          | 0/10 [00:00<?, ?it/s]

Running 5000 simulations in 50 batches.:   0%|          | 0/50 [00:00<?, ?it/s]