In [1]:
import hydra
from ddm_stride.pipeline.evaluate import load_experimental_data

import warnings
warnings.filterwarnings('ignore')

### Load data

Open your *config/task* file. You should already have specified the `experimental_data_path` during the simulation phase. The subsequent cell will read in the data again, in case you want to make changes to the data or plot it. 

In [2]:
with hydra.initialize(config_path='../config'):
    cfg = hydra.compose(config_name='config')

experimental_data = load_experimental_data(cfg)
experimental_data

Unnamed: 0,monkey,rt,coh,correct,choice
0,1,0.355,0.512,1.0,0.0
1,1,0.359,0.256,1.0,1.0
2,1,0.525,0.128,1.0,1.0
3,1,0.332,0.512,1.0,1.0
4,1,0.302,0.032,0.0,0.0
...,...,...,...,...,...
6144,2,0.627,0.032,1.0,1.0
6145,2,0.581,0.256,1.0,1.0
6146,2,0.293,0.512,1.0,1.0
6147,2,0.373,0.128,1.0,0.0


## Posterior plots

If no experimental conditions have been specified leave the `group_by` configuration in *config/task* empty. Otherwise, read the blue box below.

The evaluate step will compute the posterior $P(\theta |x)$ on the experimental data. This means, it will compute the probability of parameters $\theta$ to generate the experimental data.  
Since a MCMC sampler is used to approximate the posterior, the posterior is plotted as a histogram of samples. An example plot is visualized below. Here, the posterior is shown for the parameters `drift`, `boundary_separation` and `starting_point`.  
Posteriors get more narrow the more experimental data is passed to them, since more data causes the posterior to be more certain about $\theta$.
Additionally, parameters with large posterior variances imply that the parameter has a low sensitivity and thereby less influence on the result than parameters with high sensitivity.  
If you have chosen a very large prior due to being uncertain about a reasonable prior space, you can use the posterior result to narrow down the prior space towards the posterior and run the pipeline again. This might improve results since more training data will be available for the prior region of interest.

A number of point estimates and metrics are computed on the posterior samples and saved to *evaluate/best_thetas.json*.  
The pink line in the plot shows the maximum a posteriori (map) estimate for each parameter. The map searches the best point estimate $\hat{\theta}$ via gradient ascent and should lie close to the maximum of the posterior. Additionally, the mean, median and a 98%-confidence intervall for the median are computed. The variance, 5%- and 95%-quantiles aim at quantifying the width of the posterior. You should take into consideration that mean, median and variance only return reasonable values of the posterior is uni-modal and preferably similar to a gaussian distribution. Therefore, the map should yield the best point estimate for $\theta$.  

<img src="tutorial_images/posterior.png" width=550>


## Verify the posterior performance

The diagnosis step verified that the MNLE allows to infer the best parameters of data generated by the DDM simulator. However, the DDM might not be capable of explaining the experimental data. An example can be seen in the plots below. The plot on the left shows the experimental data as a grey histogram. In this example, the histogram shows the reaction times for choice 0 in the negative space and reaction times for choice 1 in the positive space. The blue line visualizes the probability $q(x|\hat{\theta}, \pi) \cdot p(\hat{\theta})$ of observations $x$ using the map $\hat{\theta}$ as a point estimate for the best parameters. This probability of observations should match the shape of the observations in the experimental data. In the figure below, the blue line does not match the shape of the observations, indicating that a different DDM is needed to explain the experimental data.  
The posterior predictive check plot confirms this assessment. Using samples from the posterior, simulations are generated and plotted against the experimental data. If the DDM is able to generate the experimental data, the simulations histogram should look similar to the histogram describing the experimental data. This holds especially if the posterior sample used to generate the simulations has a high probability within the posterior distribution. 

Example plots of a DDM that explains the data well can be seen in the blue box below.  
Currently, the probability density function plot is only available in case of using a single continuous measurement.

<img src="tutorial_images/pdf.png" width=700>

<img src="tutorial_images/posterior_predictive_eval.png" width=900>

<div style="display:flex"><div style="border-color:rgba(102,178,255,0.75); border-style:solid; padding: 7px; border-width:2px; margin-right:9px"> 
<h3>Group by experimental conditions</h3>

If you leave the <code>group_by</code> configuration in <i>config/task</i> empty, the posterior will be computed over all experimental conditions. This means, the experimental condition will be marginalized out to compute $P(\theta | x) = \sum_{\pi} P(\theta | x, \pi)$ for parameters $\theta$, data $x$ and experimental conditions $\pi$ and plots will be generated as shown above. 
</br></br>
Some experimental conditions might show a large influence onto the experimental data. In this case, you might want to compute the posterior for each of these experimental condition separately. 
The <code>group_by</code> configuration allows you to group the data by the specified experimental conditions and compute separate results for each group. 
</br></br>
Example:  </br>
The experimental data specifies two levels of task difficulty via the experimental condition <code>coh</code>. The subsequent plot shows an example for *evaluate/pdf.png* when defining <code>group_by: coh</code>. The plots on the left visualize the experimental data as well as the probability $P(x| \theta, \pi) \cdot P(\theta)$. The title of the plot indicates the experimental condition that the data has been grouped by. The plot on the right shows the posterior for each parameter $\theta$ and for each group of data. The posterior predictive plot is grouped similarly.  </br></br>
If you have an additional experimental condition, e.g. <code>previous_choice</code>, you can still only use <code>group_by: coh</code>, but it is also possible to group by both experimental conditions via <code>group_by: [coh, previous_choice]</code>.

TODO: plots
 

</div> </div>

### Run evaluate step

After running the evaluate step, you can find *evaluate/posterior.png*, *evaluate/pdf.png*, *evaluate/posterior_predictive.png* and *evaluate/best_thetas.json* in the results folder.

In [3]:
dir = '../results/${result_folder}'

In [None]:
%run ../ddm_stride/run.py run_evaluate=True    
# show output, interpretation/recommendations