# Nuts & bots of HDDMRegressor

In [1]:
%matplotlib inline

# Preparation
import os, hddm, time, csv
import glob
import kabuki 
import datetime
from datetime import date
from copy import deepcopy

import pymc as pm
import hddm
import kabuki
print("The current HDDM version is: ", hddm.__version__)

import arviz as az
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import seaborn as sns

# import sparse # test whether package `sparse` is installed; doesn't matter if not installed.

from p_tqdm import p_map
from functools import partial

# set the color of plots
from cycler import cycler
plt.rcParams['axes.prop_cycle'] = cycler(color='bgrcmykw')



The current HDDM version is:  0.8.0


##  Related scripts

You can find the source code of HDDMRegressor at `https://github.com/hddm-devs/hddm/models/hddm_regression.py`. 

There are only three import `class`/function defined in this script:

First, the likelihood function `wfpt_reg_like`, which has two important functions, `wiener_multi_like` and `random`. `random` is used to generate random value based on the current likelihooh, and will be very useful in posterior predictive check.

Second, the `pymc` knode `KnodeRegress` (Line 67 ~ 104). In this class, you will find how the model description is translated into a hierarchical model.

Third, the `HDDMRegressor(HDDM)` class. This is the one we will use when we fit our data. This class will test whether the input is correct and create a regressor knode together with `KnodeRegress`.

## Understand the design matrix.

Design matrix is generated by `patsy`, which resembles the `lme4` grammar. Note that in `HDDM`, you can build different models/contrasts, but the should be careful about the priors, because the current HDDM only has two priors: prior of the parameter, which is assign to intercept, and a normal distribution, which is assign to the slope/coefficients.

In [2]:
from patsy import dmatrix, demo_data
demo = demo_data("a", "b", "x1", "x2", "y", "z column")

In [3]:
dmatrix("x1 + x2", demo)

DesignMatrix with shape (8, 3)
  Intercept        x1        x2
          1   1.76405  -0.10322
          1   0.40016   0.41060
          1   0.97874   0.14404
          1   2.24089   1.45427
          1   1.86756   0.76104
          1  -0.97728   0.12168
          1   0.95009   0.44386
          1  -0.15136   0.33367
  Terms:
    'Intercept' (column 0)
    'x1' (column 1)
    'x2' (column 2)

In [4]:
data = hddm.load_csv('/opt/conda/lib/python3.7/site-packages/hddm/examples/cavanagh_theta_nn.csv')
data.tail()

Unnamed: 0,subj_idx,stim,rt,response,theta,dbs,conf
3983,13,LL,1.45,0.0,-1.237166,0,HC
3984,13,WL,0.711,1.0,-0.37745,0,LC
3985,13,WL,0.784,1.0,-0.694194,0,LC
3986,13,LL,2.35,0.0,-0.546536,0,HC
3987,13,WW,1.25,1.0,0.752388,0,HC


We cna easily check the design matrix of parameter `a` for the example mentioned in the HDDM official toturails.

In [5]:
# show first 20 row of design matrix
# dmatrix("theta:C(conf, Treatment('LC'))", data.head(30))

pd.DataFrame(dmatrix("theta:C(conf, Treatment('LC')):C(dbs, Treatment('0'))", data))

Unnamed: 0,0,1,2,3,4
0,1.0,0.000000,0.000000,0.656275,0.000000
1,1.0,-0.000000,-0.000000,-0.000000,-0.327889
2,1.0,-0.000000,-0.000000,-0.480285,-0.000000
3,1.0,0.000000,0.000000,0.000000,1.927427
4,1.0,-0.000000,-0.000000,-0.213236,-0.000000
...,...,...,...,...,...
3983,1.0,-1.237166,-0.000000,-0.000000,-0.000000
3984,1.0,-0.000000,-0.377450,-0.000000,-0.000000
3985,1.0,-0.000000,-0.694194,-0.000000,-0.000000
3986,1.0,-0.546536,-0.000000,-0.000000,-0.000000


## Fit a model
 With `a` has the same regression model as in the official tutorial, and `v` as within effect.

In [6]:
# M7: Regression for both parameters
def run_m7(id, df=None, samples=None, burn=None, save_name="ms7"): 
    import hddm
    
    dbname = save_name + '_chain_%i.db'%id 
    mname  = save_name + '_chain_%i'%id
    a_reg = {'model': "a ~ theta:C(conf, Treatment('LC')):C(dbs, Treatment('0'))", 'link_func': lambda x: x}
    v_reg = {'model': "v ~ C(conf, Treatment('LC'))", 'link_func': lambda x: x}
    reg_descr = [a_reg, v_reg]
    
    m = hddm.HDDMRegressor(data,
                           reg_descr,
                           group_only_regressors=False,
                           keep_regressor_trace=True,
                           include=['z', 'sv', 'st', 'sz'])
    m.find_starting_values()
    m.sample(samples, burn=burn, dbname=dbname, db='pickle') # it's neccessary to save the model data
    m.save(mname)
    
    return m

# parameters for model fitting
samples = 2000
burn = 500
chains = 4


# below are for run multiple models

model_func = [run_m7]  # create a list of functions that define models

file_path = "/home/jovyan/hddm/temp/"  # file path that store the model data

models = {"ms7": [],}  # create a dictionary to store the list of models

models_name = ["ms7"]  # key to retrieval models

# run all models, currently we only have one.
for ii in range(len(model_func)):
    file_full_path = file_path + "*" + models_name[ii] + "_chain_*[!db]" 
    file_names = glob.glob(file_full_path, recursive=False)
    # print(file_names)
    
    # if the model data already in the path, load the model data.
    if file_names:
        for jj in file_names:
            print('current loading: ', jj, '\n')
            models[models_name[ii]].append(hddm.load(jj))
    
    # if there is no model data, run the model
    else:
        print('current estimating: models_name[ii]')
        models[models_name[ii]] = p_map(partial(model_func[ii], df=data, samples=samples, 
                                         burn=burn),
                                 range(chains))

current loading:  /home/jovyan/hddm/temp/ms7_chain_3 

current loading:  /home/jovyan/hddm/temp/ms7_chain_2 

current loading:  /home/jovyan/hddm/temp/ms7_chain_1 

current loading:  /home/jovyan/hddm/temp/ms7_chain_0 



## Inside the model objects

Now that we fitted a regressor model, with four chains. We will select one chain and have a look at the inside of the model object

In [7]:
ms7_tmp = models['ms7'][0]
ms7_tmp

<hddm.models.hddm_regression.HDDMRegressor at 0x7f82e1c9ddd0>

## Model's input

In [8]:
ms7_tmp.data

Unnamed: 0,subj_idx,stim,rt,response,theta,dbs,conf
0,0,LL,1.210,1.0,0.656275,1,HC
1,0,WL,1.630,1.0,-0.327889,1,LC
2,0,WW,1.030,1.0,-0.480285,1,HC
3,0,WL,2.770,1.0,1.927427,1,LC
4,0,WW,-1.140,0.0,-0.213236,1,HC
...,...,...,...,...,...,...,...
3983,13,LL,-1.450,0.0,-1.237166,0,HC
3984,13,WL,0.711,1.0,-0.377450,0,LC
3985,13,WL,0.784,1.0,-0.694194,0,LC
3986,13,LL,-2.350,0.0,-0.546536,0,HC


### Model descriptor
This is very useful to understand the model. And we may need this when we want to reconstruct each parameters' value for each condition using the MCMC traces.

In [9]:
# get the model information from the model objects
ms7_tmp.model_descrs

[{'outcome': 'a',
  'model': " theta:C(conf, Treatment('LC')):C(dbs, Treatment('0'))",
  'params': ['a_Intercept',
   "a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]",
   "a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]",
   "a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]",
   "a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]"],
  'link_func': <function hddm.models.hddm_regression.HDDMRegressor.__setstate__.<locals>.<lambda>(x)>},
 {'outcome': 'v',
  'model': " C(conf, Treatment('LC'))",
  'params': ['v_Intercept', "v_C(conf, Treatment('LC'))[T.HC]"],
  'link_func': <function hddm.models.hddm_regression.HDDMRegressor.__setstate__.<locals>.<lambda>(x)>}]

Check all the nodes in the model.

We can see from the column names that there are stochastic nodes, observed nodes, and hidden nodes.

For stochastic nodes, there are also their means, stds etc.

In [10]:
ms7_tmp.nodes_db

Unnamed: 0,knode_name,stochastic,observed,subj,node,tag,depends,hidden,subj_idx,stim,...,dbs,conf,mean,std,2.5q,25q,50q,75q,97.5q,mc err
t,t,True,False,False,t,(),[],False,,,...,,,0.624151,0.0369645,0.555071,0.599169,0.622722,0.647556,0.707345,0.00196716
t_std,t_std,True,False,False,t_std,(),[],False,,,...,,,0.113356,0.0285804,0.0718465,0.093736,0.108587,0.127678,0.179078,0.0010597
t_rate,t_rate,False,False,False,t_rate,(),[],True,,,...,,,,,,,,,,
t_shape,t_shape,False,False,False,t_shape,(),[],True,,,...,,,,,,,,,,
t_subj.0,t_subj,True,False,True,t_subj.0,(),[subj_idx],False,0,,...,,,0.707286,0.0390466,0.631725,0.680583,0.706971,0.731965,0.788962,0.00234585
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
wfpt.9,wfpt,False,True,False,wfpt.9,(),[subj_idx],False,9,,...,,,,,,,,,,
wfpt.10,wfpt,False,True,False,wfpt.10,(),[subj_idx],False,10,,...,,,,,,,,,,
wfpt.11,wfpt,False,True,False,wfpt.11,(),[subj_idx],False,11,,...,,,,,,,,,,
wfpt.12,wfpt,False,True,False,wfpt.12,(),[subj_idx],False,12,,...,,,,,,,,,,


We can get the summary of stochastics as below. For the values of each draw in MCMC, we need go the traces.

In [11]:
ms7_tmp.get_stochastics()

Unnamed: 0,knode_name,stochastic,observed,subj,node,tag,depends,hidden,subj_idx,stim,...,dbs,conf,mean,std,2.5q,25q,50q,75q,97.5q,mc err
t,t,True,False,False,t,(),[],False,,,...,,,0.624151,0.0369645,0.555071,0.599169,0.622722,0.647556,0.707345,0.00196716
t_std,t_std,True,False,False,t_std,(),[],False,,,...,,,0.113356,0.0285804,0.0718465,0.093736,0.108587,0.127678,0.179078,0.0010597
t_subj.0,t_subj,True,False,True,t_subj.0,(),[subj_idx],False,0,,...,,,0.707286,0.0390466,0.631725,0.680583,0.706971,0.731965,0.788962,0.00234585
t_subj.1,t_subj,True,False,True,t_subj.1,(),[subj_idx],False,1,,...,,,0.55223,0.0361706,0.482611,0.527048,0.553322,0.576448,0.626342,0.00232477
t_subj.2,t_subj,True,False,True,t_subj.2,(),[subj_idx],False,2,,...,,,0.606158,0.0217745,0.562911,0.592256,0.606589,0.619763,0.650331,0.00126748
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"v_C(conf, Treatment('LC'))[T.HC]_subj.9","v_C(conf, Treatment('LC'))[T.HC]_subj",True,False,True,"v_C(conf, Treatment('LC'))[T.HC]_subj.9",(),[subj_idx],False,9,,...,,,-0.587479,0.113156,-0.83386,-0.647768,-0.582453,-0.51215,-0.391509,0.00678584
"v_C(conf, Treatment('LC'))[T.HC]_subj.10","v_C(conf, Treatment('LC'))[T.HC]_subj",True,False,True,"v_C(conf, Treatment('LC'))[T.HC]_subj.10",(),[subj_idx],False,10,,...,,,-0.502287,0.0999438,-0.685379,-0.571759,-0.508993,-0.440562,-0.303152,0.0050307
"v_C(conf, Treatment('LC'))[T.HC]_subj.11","v_C(conf, Treatment('LC'))[T.HC]_subj",True,False,True,"v_C(conf, Treatment('LC'))[T.HC]_subj.11",(),[subj_idx],False,11,,...,,,-0.623354,0.101676,-0.840236,-0.684346,-0.614123,-0.552663,-0.446389,0.0062167
"v_C(conf, Treatment('LC'))[T.HC]_subj.12","v_C(conf, Treatment('LC'))[T.HC]_subj",True,False,True,"v_C(conf, Treatment('LC'))[T.HC]_subj.12",(),[subj_idx],False,12,,...,,,-0.606482,0.1146,-0.856969,-0.672062,-0.596611,-0.528993,-0.402171,0.00651002


In [12]:
ms7_tmp_traces = ms7_tmp.get_traces()
ms7_tmp_traces.columns

Index(['t', 't_std', 't_subj.0', 't_subj.1', 't_subj.2', 't_subj.3',
       't_subj.4', 't_subj.5', 't_subj.6', 't_subj.7',
       ...
       'v_C(conf, Treatment('LC'))[T.HC]_subj.4',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.5',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.6',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.7',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.8',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.9',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.10',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.11',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.12',
       'v_C(conf, Treatment('LC'))[T.HC]_subj.13'],
      dtype='object', length=147)

## Observeds

The MDDM model object is an `pymc` model object *per se*, so it have almost all the properties of `pymc` objects, but more.

The observed nodes are the nodes for input data (observed = True).

Depends on the model specification, the number of observeds are different. Here in a regression model without depends_on settings, the observeds are equal to the number of participants.

In [13]:
ms7_tmp.get_observeds()

Unnamed: 0,knode_name,stochastic,observed,subj,node,tag,depends,hidden,subj_idx,stim,...,dbs,conf,mean,std,2.5q,25q,50q,75q,97.5q,mc err
wfpt.0,wfpt,False,True,False,wfpt.0,(),[subj_idx],False,0,,...,,,,,,,,,,
wfpt.1,wfpt,False,True,False,wfpt.1,(),[subj_idx],False,1,,...,,,,,,,,,,
wfpt.2,wfpt,False,True,False,wfpt.2,(),[subj_idx],False,2,,...,,,,,,,,,,
wfpt.3,wfpt,False,True,False,wfpt.3,(),[subj_idx],False,3,,...,,,,,,,,,,
wfpt.4,wfpt,False,True,False,wfpt.4,(),[subj_idx],False,4,,...,,,,,,,,,,
wfpt.5,wfpt,False,True,False,wfpt.5,(),[subj_idx],False,5,,...,,,,,,,,,,
wfpt.6,wfpt,False,True,False,wfpt.6,(),[subj_idx],False,6,,...,,,,,,,,,,
wfpt.7,wfpt,False,True,False,wfpt.7,(),[subj_idx],False,7,,...,,,,,,,,,,
wfpt.8,wfpt,False,True,False,wfpt.8,(),[subj_idx],False,8,,...,,,,,,,,,,
wfpt.9,wfpt,False,True,False,wfpt.9,(),[subj_idx],False,9,,...,,,,,,,,,,


We can also iterate the observeds:

In [14]:
iter_data_tmp7 = ((name, ms7_tmp.data.loc[obs['node'].value.index]) for name, obs in ms7_tmp.iter_observeds())
iter_data_tmp7

<generator object <genexpr> at 0x7f82e1cb9a50>

In each iterate, we can extract each node and check the parents,  extended_parents, and other properties of the nodes.

In [15]:
for name, data in iter_data_tmp7:
    print(name)
    node = ms7_tmp.get_data_nodes(data.index) # get the node corresponding to data.index.
    # node
    
    for i, parent in enumerate(node.extended_parents):
        if name == 'wfpt.13':
            print("Order of extended_parent: ", i)
            print(parent)

wfpt.0
wfpt.1
wfpt.2
wfpt.3
wfpt.4
wfpt.5
wfpt.6
wfpt.7
wfpt.8
wfpt.9
wfpt.10
wfpt.11
wfpt.12
wfpt.13
Order of extended_parent:  0
st
Order of extended_parent:  1
a_Intercept_subj.13
Order of extended_parent:  2
a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]_subj.13
Order of extended_parent:  3
a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13
Order of extended_parent:  4
sv
Order of extended_parent:  5
a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]_subj.13
Order of extended_parent:  6
v_C(conf, Treatment('LC'))[T.HC]_subj.13
Order of extended_parent:  7
sz
Order of extended_parent:  8
a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13
Order of extended_parent:  9
v_Intercept_subj.13
Order of extended_parent:  10
t_subj.13
Order of extended_parent:  11
z_subj_trans.13


After the iteration, we can see that `node` is the `wfpt.13`, the last element of `observeds`.

You can also check the input data of each node.

In [16]:
isinstance(node, pm.Node) # check if the node is a pymc node.

True

In [17]:
node.value

Unnamed: 0,rt
3714,1.500
3715,0.929
3716,1.880
3717,-1.180
3718,1.810
...,...
3983,-1.450
3984,0.711
3985,0.784
3986,-2.350


Or node's name

In [18]:
node.__name__

'wfpt.13'

In [19]:
node.parents

{'p_outlier': 0.05,
 'v': <pymc.PyMCObjects.Deterministic 'v_reg.13' at 0x7f82df495d90>,
 'sv': <pymc.distributions.new_dist_class.<locals>.new_class 'sv' at 0x7f82e1b6fe90>,
 'a': <pymc.PyMCObjects.Deterministic 'a_reg.13' at 0x7f82df542750>,
 'z': <pymc.CommonDeterministics.InvLogit 'z_subj.13' at 0x7f82df708c50>,
 'sz': <pymc.distributions.new_dist_class.<locals>.new_class 'sz' at 0x7f82e1c5acd0>,
 't': <pymc.distributions.new_dist_class.<locals>.new_class 't_subj.13' at 0x7f82df7391d0>,
 'st': <pymc.distributions.new_dist_class.<locals>.new_class 'st' at 0x7f82e1c9de10>,
 'reg_outcomes': frozenset({'a', 'v'})}

In [20]:
node.parents.value

{'v': 3714    0.037319
 3715    0.516295
 3716    0.516295
 3717    0.037319
 3718    0.516295
           ...   
 3983    0.037319
 3984    0.516295
 3985    0.516295
 3986    0.037319
 3987    0.037319
 Name: 0, Length: 274, dtype: float64,
 'sv': array(0.25823629),
 'a': 3714    2.068646
 3715    2.017323
 3716    2.032026
 3717    2.098875
 3718    2.025883
           ...   
 3983    1.916442
 3984    1.989934
 3985    1.957079
 3986    1.979324
 3987    2.097590
 Name: 0, Length: 274, dtype: float64,
 'z': array(0.53853632),
 'sz': array(0.13495805),
 't': array(0.68599143),
 'st': array(0.47464767),
 'reg_outcomes': {'a', 'v'},
 'p_outlier': 0.05}

In [21]:
node.extended_parents

{<pymc.distributions.new_dist_class.<locals>.new_class 'z_subj_trans.13' at 0x7f82df6e1fd0>,
 <pymc.distributions.new_dist_class.<locals>.new_class 't_subj.13' at 0x7f82df7391d0>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'v_Intercept_subj.13' at 0x7f82df4eb710>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13' at 0x7f82df5e5710>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'sz' at 0x7f82e1c5acd0>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'v_C(conf, Treatment('LC'))[T.HC]_subj.13' at 0x7f82df51ce90>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]_subj.13' at 0x7f82df615e90>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'sv' at 0x7f82e1b6fe90>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13' at 0x7f82df67b250>,
 

Note, extended_parents do not have value

In [22]:
node.extended_parents.value

AttributeError: 'set' object has no attribute 'value'

And as we said, `random()` function of the node can be used to generate random values based on the **current** parents' value. You can re-run the code below for a few times, see how the value of RT changes each time.

In [23]:
node.random()

Unnamed: 0,rt
3714,2.273329
3715,1.932588
3716,0.800018
3717,-3.566316
3718,1.294022
...,...
3983,1.725299
3984,1.544213
3985,1.047355
3986,-4.020927


Very conveniently, we can directly retriviel the traces of a node's parents and extended_parent here too.

In [24]:
node.parents['v'].trace()

array([[ 0.07079084,  0.58331565,  0.58331565, ...,  0.58331565,
         0.07079084,  0.07079084],
       [-0.04693306,  0.43199648,  0.43199648, ...,  0.43199648,
        -0.04693306, -0.04693306],
       [ 0.00878048,  0.48895526,  0.48895526, ...,  0.48895526,
         0.00878048,  0.00878048],
       ...,
       [ 0.01091294,  0.53718705,  0.53718705, ...,  0.53718705,
         0.01091294,  0.01091294],
       [ 0.06403487,  0.56229785,  0.56229785, ...,  0.56229785,
         0.06403487,  0.06403487],
       [ 0.03731875,  0.51629492,  0.51629492, ...,  0.51629492,
         0.03731875,  0.03731875]])

In [25]:
# we can not directly index extended_parents but can get the trace in a for loop
for i, parent in enumerate(node.extended_parents):
    print(parent)
    print(parent.trace())

st
[0.45661132 0.45625829 0.48352744 ... 0.51125925 0.52033121 0.47464767]
a_Intercept_subj.13
[2.14894803 2.04464093 2.24554997 ... 2.2302547  2.19780888 2.0290853 ]
a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]_subj.13
[ 0.12125852  0.19985113  0.0641674  ... -0.01330511  0.09896035
  0.09104917]
a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13
[-0.06125472 -0.0779277  -0.04243819 ...  0.04367235  0.05348451
  0.10372685]
sv
[0.21481978 0.2732277  0.24124355 ... 0.25024603 0.29303582 0.25823629]
a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]_subj.13
[ 0.03394697 -0.11044463 -0.05594438 ...  0.03232501 -0.00100048
 -0.00782412]
v_C(conf, Treatment('LC'))[T.HC]_subj.13
[-0.51252481 -0.47892954 -0.48017478 ... -0.52627411 -0.49826299
 -0.47897617]
sz
[0.220693   0.19928462 0.14847611 ... 0.13398106 0.02876901 0.13495805]
a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13
[-0.03347591 -0.01104332  0.01372463 ...  

## Make sure that change `extended_parents` do changed `parents`

It is not very intuitive that the value of a node's `parents` is dependes on the value of `extended_parents`. 

Again, let's check the extended_parents and parents

In [26]:
node.extended_parents

{<pymc.distributions.new_dist_class.<locals>.new_class 'z_subj_trans.13' at 0x7f82df6e1fd0>,
 <pymc.distributions.new_dist_class.<locals>.new_class 't_subj.13' at 0x7f82df7391d0>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'v_Intercept_subj.13' at 0x7f82df4eb710>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[1]_subj.13' at 0x7f82df5e5710>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'sz' at 0x7f82e1c5acd0>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'v_C(conf, Treatment('LC'))[T.HC]_subj.13' at 0x7f82df51ce90>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[1]_subj.13' at 0x7f82df615e90>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'sv' at 0x7f82e1b6fe90>,
 <pymc.distributions.new_dist_class.<locals>.new_class 'a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13' at 0x7f82df67b250>,
 

In [27]:
node.parents

{'p_outlier': 0.05,
 'v': <pymc.PyMCObjects.Deterministic 'v_reg.13' at 0x7f82df495d90>,
 'sv': <pymc.distributions.new_dist_class.<locals>.new_class 'sv' at 0x7f82e1b6fe90>,
 'a': <pymc.PyMCObjects.Deterministic 'a_reg.13' at 0x7f82df542750>,
 'z': <pymc.CommonDeterministics.InvLogit 'z_subj.13' at 0x7f82df708c50>,
 'sz': <pymc.distributions.new_dist_class.<locals>.new_class 'sz' at 0x7f82e1c5acd0>,
 't': <pymc.distributions.new_dist_class.<locals>.new_class 't_subj.13' at 0x7f82df7391d0>,
 'st': <pymc.distributions.new_dist_class.<locals>.new_class 'st' at 0x7f82e1c9de10>,
 'reg_outcomes': frozenset({'a', 'v'})}

Note that `v` is a `pymc.PyMCObjects.Deterministic` with a name `v_reg.13`, and `a` is a `pymc.PyMCObjects.Deterministic` with a name `a_reg.13`.

Here the `Deterministic` means that this object's value is determined by its parents, i.e., the extended_parents here.

Also note that `sv`, `st`, and `sz` are the same in both `extended_parents` and in `parents`.

`sv` in `extended_parents`: `<pymc.distributions.new_dist_class.<locals>.new_class 'sv' at 0x7fc9c6f4c090`
`sv` in `parents`: `<pymc.distributions.new_dist_class.<locals>.new_class 'sv' at 0x7fc9c6f4c090>`

We can verify that changing `extended_parents`' value also simutanously changed the values of those that are deterministic (determined by values of extended_parents' value and the design matrix).

I will test with 5 samples from posterior, in each sample, all the extended_parents's value will be changed to their posterior at that draw. I will record all values of extended_parents and parents' value. 

Then I will calculate the parents' value by dot muptiply the extended_parents' value with their corresponding design matrix. 

Finally, I will compare the the parents' value I recorded in each iteration and the parents' value I calculated by combining extended_parents' value and design matrix.

In [28]:
for name, data in iter_data_tmp7:
    print(name)
    node = ms7_tmp.get_data_nodes(data.index) # get the node corresponding to data.index.
    node
    
    for i, parent in enumerate(node.extended_parents):
        print("Order of extended_parent: ", i)
        print(parent)

In [29]:
##### First, record the value of extended_parents and parents in 5 iterations ####

ls_10_ext_par = []
ls_10_par = []
for pos in range(5):
    print(pos)

    dicts = {}
    for i, parent in enumerate(node.extended_parents):
#         print(parent)
#         print(parent.trace()[pos])
        dicts[parent.__name__] = parent.trace()[pos]  # note how I get the node's name

        assert len(parent.trace()) >= pos, "pos larger than posterior sample size"
        parent.value = parent.trace()[pos]
    
    # record the values of extended_parents
    ls_10_ext_par.append(dicts)
    
    tmp_dict = deepcopy(node.parents.value)
    
    # record the values of parents
    del tmp_dict['reg_outcomes']
    tmp_par = pd.DataFrame.from_dict(tmp_dict)    
    
    ls_10_par.append(tmp_par)

df_ls_10_ext_par = pd.DataFrame.from_dict(ls_10_ext_par)
print(df_ls_10_ext_par.head)

0
1
2
3
4
<bound method NDFrame.head of          st  a_Intercept_subj.13  \
0  0.456611             2.148948   
1  0.456258             2.044641   
2  0.483527             2.245550   
3  0.469041             2.219220   
4  0.470805             2.227721   

   a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]_subj.13  \
0                                           0.121259                        
1                                           0.199851                        
2                                           0.064167                        
3                                           0.128285                        
4                                           0.092162                        

   a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13  \
0                                          -0.061255                        
1                                          -0.077928                        
2                                          -0.042

In [30]:
##### Second, select parameter "a" related extended_parents' value to verify the values

filter_col = [col for col in df_ls_10_ext_par if col.startswith('a_')]
filter_col

df_a_ext_par = df_ls_10_ext_par[filter_col]
print(df_a_ext_par)


#### Get the design matrix

print(ms7_tmp.model_descrs[0]['outcome'])
design_matrix=dmatrix(ms7_tmp.model_descrs[0]['model'], 
                      data=data, return_type='dataframe', NA_action='raise')
print("Head of the design matrix:")
print(design_matrix.head())

design_matrix = design_matrix.add_prefix("a_").add_suffix("_subj.13")
print("Add parameter and participants' info to design matrix: ")
print(design_matrix.head())

# re-order the extended_parents' value's dataframe 
df_a_ext_par = df_a_ext_par[design_matrix.columns]

print(df_a_ext_par.head())

predictor_tmp = design_matrix.dot(df_a_ext_par.T)
predictor_tmp

   a_Intercept_subj.13  \
0             2.148948   
1             2.044641   
2             2.245550   
3             2.219220   
4             2.227721   

   a_theta:C(conf, Treatment('LC'))[HC]:C(dbs, Treatment('0'))[0]_subj.13  \
0                                           0.121259                        
1                                           0.199851                        
2                                           0.064167                        
3                                           0.128285                        
4                                           0.092162                        

   a_theta:C(conf, Treatment('LC'))[LC]:C(dbs, Treatment('0'))[0]_subj.13  \
0                                          -0.061255                        
1                                          -0.077928                        
2                                          -0.042438                        
3                                          -0.018326                    

Unnamed: 0,0,1,2,3,4
3714,2.165100,2.049969,2.238928,2.209395,2.228267
3715,2.199981,1.878607,2.161448,2.163257,2.157646
3716,2.136189,2.086150,2.266576,2.233211,2.245240
3717,2.177441,2.054040,2.233868,2.201888,2.228684
3718,2.162842,1.999437,2.222653,2.203983,2.208643
...,...,...,...,...,...
3983,1.998931,1.797392,2.166164,2.060510,2.113702
3984,2.172069,2.074055,2.261568,2.226137,2.247216
3985,2.191471,2.098738,2.275010,2.231941,2.263575
3986,2.082676,1.935415,2.210480,2.149107,2.177351


In [31]:
# compare the values of the first draw:
ls_10_par[0]['a']

3714    2.165100
3715    2.199981
3716    2.136189
3717    2.177441
3718    2.162842
          ...   
3983    1.998931
3984    2.172069
3985    2.191471
3986    2.082676
3987    2.240182
Name: a, Length: 274, dtype: float64

In [32]:
predictor_tmp[0]

3714    2.165100
3715    2.199981
3716    2.136189
3717    2.177441
3718    2.162842
          ...   
3983    1.998931
3984    2.172069
3985    2.191471
3986    2.082676
3987    2.240182
Name: 0, Length: 274, dtype: float64

In [33]:
ls_10_par[0]['a'] == predictor_tmp[0]

3714    True
3715    True
3716    True
3717    True
3718    True
        ... 
3983    True
3984    True
3985    True
3986    True
3987    True
Length: 274, dtype: bool