# Numerical test of fitness model

In [1]:
import numpy as np
import pandas as pd
import bebi103 
import cmdstanpy
import arviz as az
import scipy.stats as st
import holoviews as hv
import bokeh.io
import evo_mwc
from bokeh.themes import Theme
hv.extension('bokeh')


ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

  "Both pystan and cmdstanpy are importable in this environment. As per the cmdstanpy documentation, this is not advised."


In [2]:
# Set PBoC style
theme = Theme(json=evo_mwc.viz.pboc_style_bokeh())
hv.renderer('bokeh').theme = theme
bokeh.io.curdoc().theme = theme

In this notebook, we will investigate the fitness model for tetracycline resistance. We will take a baysian approach, and define priors for unknown parameters. Then, we will generate data from the likelihood, and try to identify the parameters that were drawn from the prior distributions. This process is also known as simulation based calibration. 

We defined fitness as relative growth rate, compared to growth in the absence of antibiotic,

$$
F =\frac{c_\mathrm{R}^* - c_\mathrm{R}^\mathrm{min}}{c_\mathrm{R}^0 - c_\mathrm{R}^\mathrm{min}} = \frac{\frac{1}{1+e^{-\beta \mathcal{T}}}- c}{1 - c}
$$,

where $c$ is the ratio of the minimal amount of ribosomes required for growth relative to the total number of ribosomes and $\mathcal{T}$ is a "free energy-like" term,

$$
\mathcal{T} = \Delta G - \frac{1}{\beta}\log c_\mathrm{T}^\mathrm{E} + \frac{1}{\beta}\log\left[1 + \tilde g p/K_\mathrm{T}\right],
$$

where $\Delta G$ is the free energy difference of tetracycline and ribosome binding, $c_\mathrm{T}^\mathrm{E}$ is the extracellular drug concentration, and $\tilde g p/K_\mathrm{T}$ is a dimensionless fraction describing the strength of pumps (combination of expression and pump activity) compared to the diffusion of drug through the cell membrane ($K_\mathrm{T}$).

## Attempt 1: Infer c

Let's define some priors. The **free energy difference** is given in the literature as $\Delta G = -13.8 k_\mathrm{B}T$ [[6](https://www.sciencedirect.com/science/article/pii/S107455210800361X#bib4)], which we will use here as a starting point for a normal distribution.
<br>
The **minimal fraction of free ribosomes** is bound to be between 0 and 1, therefore we choose a beta distribution as prior. We assume that this fraction well more likely small, therefore we choose a prior which is slightly asymmetric towards smaller values. 
<br>
Finally we need to find a prior for **pump strength relative to diffusion**. We know that this ratio is positive, but not much more. So we will again choose a log normal to cover multiple orders of magnitude.

In [3]:
x_G = np.linspace(-28,0,100)
y_G = st.norm.pdf(x_G, -13.8, 2.5)

p_G = hv.Curve(
    (x_G, y_G)
).opts(
    padding=0.05,
    title="Free energy prior",
    xlabel="ΔG",
    ylabel="g(ΔG)"
)


x_c = np.linspace(0, 1, 100)
y_c = st.beta.pdf(x_c, 1.1, 10)

p_c = hv.Curve(
    (x_c, y_c)
).opts(
    padding=0.05,
    title="Relative minimum free ribosomes prior",
    xlabel="c",
    ylabel="g(c)"
)

x_p = np.linspace(-3, 3, 100)
y_p = st.norm.pdf(x_p, 0, .8)

p_p = hv.Curve(
    (x_p, y_p)
).opts(
    padding=0.05,
    title="Relative pump strength prior",
    xlabel="log10 p/K_T",
    ylabel="g(log10 p/K_T)"
)


(p_G + p_c + p_p).opts(shared_axes=False)

Since we have a theoretical prediction for fitness, and expect the data to spread around that prediction, we are going to use a regression model. The full generative model is then given by,

\begin{align}
\Delta G &\sim \mathrm{Normal}(-13.8, 3) \\[1em]
c &\sim \mathrm{Beta}(1.1, 1.5)\\[1em]
\log_{10} p/K_\mathrm{T} &\sim \mathrm{Normal}(0, .8) \\[1em]
p/K_\mathrm{T} &= 10^{\log_{10}p/K_\mathrm{T}}\\[1em]
\sigma &\sim \mathrm{HalfNormal}(0, 0.1)\\[1em]
F_i &\sim \mathrm{Normal}(F(g_i; \Delta G, c_\mathrm{T}^\mathrm{E}, p/K_\mathrm{T}, c), \sigma) \quad \forall i
\end{align}

Let's try a prior predictive check using stan. Note that we take a uniform spread of expression values. We are just looking of a proof of principle here. If we find a usable approach, we can reduce the number.

In [4]:
bebi103.stan.clean_cmdstan()
sm_prior_pred1 = cmdstanpy.CmdStanModel(
    stan_file='stan_code/prior_pred_pick_c.stan'
)
data1 = {
    "N": 100,
    "g": np.linspace(0, 1, 100),
    "mu_G": -13.8,
    "sigma_G": 3,
    "log_cte": -6,
    "mu_p": 1,
    "sigma_p": .8,
    "sigma_sigma": 0.01,
    "alpha_c":1.1,
    "beta_c":10
}

INFO:cmdstanpy:stan to c++ (/Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/prior_pred_pick_c.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/prior_pred_pick_c


Sample using the stan model.

In [5]:
samples_prior_pred1 = sm_prior_pred.sample(
    data=data1, sampling_iters=100, fixed_param=True
)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:finish chain 1


Load the samples using arviz.

In [6]:
samples_prior_pred1 = az.from_cmdstanpy(
    posterior=samples_prior_pred1, prior=samples_prior_pred1, prior_predictive=["F"]
)

Store the results of the prior predictive check in a data frame for easier plotting.

In [7]:
df_ppc1 = samples_prior_pred.prior_predictive.F.to_dataframe().reset_index()

We are just going to plot all drawn fitness curves first.

In [8]:
g = np.linspace(0, 1, 100)
df_ppc1['expression'] = g[df_ppc1['F_dim_0']]
hv.Curve(
    df_ppc1,
    kdims='expression',
    vdims=['F', 'draw'],
).groupby(
    'draw'
).opts(
    frame_height=200,
    frame_width=400,
    line_width=0.5
).overlay(
)

So we see that most curves look quite good, some are a little too much in the negative, but that could come from the fact that we chose some bad values for $c$. Let's try sampling out of the posterior using a data set that we created in the prior predictive check, and see if we can get an estimate for the parameters.

In [9]:
# Choosing a set that shows some variation
F1 = df_ppc1.loc[df_ppc1["draw"] == 6, "F"].values
g1 = df_ppc1.loc[df_ppc1["draw"] == 6, "expression"].values
F1

array([0.888492, 0.938005, 0.956522, 0.965699, 0.979064, 0.969123,
       0.992321, 0.987553, 0.982376, 1.00525 , 0.989982, 0.994691,
       0.987924, 0.983838, 0.985433, 0.991993, 0.998999, 0.993301,
       0.995135, 0.996067, 0.987513, 0.998934, 0.987294, 0.988336,
       0.985554, 0.999298, 1.0004  , 0.998811, 0.997371, 0.993943,
       0.987251, 0.995654, 0.998113, 1.005   , 0.989688, 0.985374,
       1.00506 , 0.997242, 1.0027  , 0.996927, 1.0099  , 1.00258 ,
       0.999732, 0.997725, 0.99888 , 0.996137, 1.00708 , 0.989959,
       1.00011 , 0.99067 , 0.996049, 0.991241, 1.01091 , 1.01164 ,
       1.0022  , 0.996524, 0.989893, 0.99403 , 0.999364, 1.00441 ,
       0.99767 , 1.00459 , 0.987163, 1.00054 , 0.999562, 0.996631,
       0.995122, 1.00067 , 1.00041 , 0.996614, 0.99429 , 0.995236,
       1.0063  , 0.9987  , 1.00299 , 0.994926, 0.992784, 1.00337 ,
       0.999032, 0.985012, 0.988627, 1.00138 , 0.998695, 0.998777,
       0.997608, 1.00207 , 0.999591, 1.00035 , 0.99839 , 1.005

In [10]:
sm1 = cmdstanpy.CmdStanModel(
    stan_file='stan_code/model_pick_c.stan'
)

data_ppc1 = {
    "N":100,
    "g": g,
    "F": F,
    "log_cte": -5,
}

INFO:cmdstanpy:stan to c++ (/Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/model_pick_c.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/model_pick_c


Spoiler Alert: There will be many divergences. To decrease the probability that the sampler has not warmed up properly, we will increase the number of warm-up iterations already.

In [12]:
samples_ppc1 = sm1.sample(
    data=data_ppc1, 
    warmup_iters=100000, 
    sampling_iters=1000, 
    chains=4, 
    cores=4
)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 2


Load the samples using arviz.

In [13]:
samples_az_ppc1 = az.from_cmdstanpy(
    posterior=samples_ppc1, posterior_predictive=["F_ppc"]
)

First, let's check some diagnostics. 

In [14]:
bebi103.stan.check_all_diagnostics(samples_az_ppc1)

Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
2254 of 4000 (56.35%) iterations ended with a divergence.
  Try running with larger adapt_delta to remove divergences.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.


4

So usually I get a lot of divergences here (~50%). Let's take some graphical approaches to identify the problem.

In [15]:
bokeh.io.show(
    bebi103.viz.parcoord_plot(
        samples_az_ppc1,
        pars=["delta_G", "p", "c"],
        xtick_label_orientation="vertical",
        transformation="minmax",
    )
)

Well, that looks absolutely terrible. Let's take a look at a corner plot.

In [16]:
bokeh.io.show(
    bebi103.viz.corner(
        samples_az_ppc1,
        pars=["delta_G", "p", "sigma", "c"],
        xtick_label_orientation="vertical",
    )
)

Here we get some information. The histogram of $c$ looks exactly like the prior. It seems like the data does not inform us at all about that parameter. So we need to identify the parameter from a different source.

## Attempt 2: Fixed c

Let's pretend for a moment, that we magically know the value for $c$, and redo the procedure.

Load stan model.

In [64]:
bebi103.stan.clean_cmdstan(path="stan_code")
sm_prior_pred2 = cmdstanpy.CmdStanModel(
    stan_file='stan_code/prior_pred_fix_c.stan'
)
data2 = {
    "N": 100,
    "g": np.linspace(0, 1, 100),
    "mu_G": -13.8,
    "sigma_G": 3,
    "log_cte": -6,
    "mu_p": 1,
    "sigma_p": .8,
    "sigma_sigma": 0.01,
    "c":0.2
}

INFO:cmdstanpy:stan to c++ (/Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/prior_pred_fix_c.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/prior_pred_fix_c


Sample data using parameters from the priors.

In [65]:
samples_prior_pred2 = sm_prior_pred2.sample(
    data=data2, sampling_iters=100, fixed_param=True
)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:finish chain 1


Load the samples using arviz.

In [66]:
samples_prior_pred2 = az.from_cmdstanpy(
    posterior=samples_prior_pred2, prior=samples_prior_pred2, prior_predictive=["F"]
)

Store results in data frame for easier plotting.

In [67]:
df_ppc2 = samples_prior_pred2.prior_predictive.F.to_dataframe().reset_index()

Again we just plot the fitness curves.

In [68]:
g2 = np.linspace(0, 1, 100)
df_ppc2['expression'] = g2[df_ppc2['F_dim_0']]
hv.Curve(
    df_ppc2,
    kdims='expression',
    vdims=['F', 'draw'],
).groupby(
    'draw'
).opts(
    frame_height=200,
    frame_width=400,
    line_width=0.5
).overlay(
)

Choose a fitness curve.

In [69]:
# Choosing a set that shows some variation
F2 = df_ppc2.loc[df_ppc2["draw"] == 14, "F"].values
g2 = df_ppc2.loc[df_ppc2["draw"] == 16, "expression"].values
F2

array([-0.191531  , -0.177869  , -0.175432  , -0.132227  , -0.150238  ,
       -0.0777037 , -0.0221533 , -0.03898   , -0.00795834,  0.00812149,
        0.0335181 ,  0.0478695 ,  0.0322589 ,  0.079417  ,  0.0909975 ,
        0.089245  ,  0.0961328 ,  0.148986  ,  0.126321  ,  0.148416  ,
        0.174202  ,  0.193041  ,  0.231135  ,  0.196413  ,  0.248707  ,
        0.242391  ,  0.25424   ,  0.240176  ,  0.290259  ,  0.2952    ,
        0.284988  ,  0.302131  ,  0.323138  ,  0.346031  ,  0.330276  ,
        0.326335  ,  0.337987  ,  0.334423  ,  0.381719  ,  0.363852  ,
        0.393354  ,  0.377152  ,  0.379001  ,  0.405153  ,  0.425335  ,
        0.42004   ,  0.426554  ,  0.418674  ,  0.397452  ,  0.463848  ,
        0.454904  ,  0.48046   ,  0.474187  ,  0.479455  ,  0.447882  ,
        0.461035  ,  0.445316  ,  0.485125  ,  0.499402  ,  0.475308  ,
        0.475614  ,  0.546371  ,  0.515206  ,  0.543404  ,  0.502956  ,
        0.513477  ,  0.518534  ,  0.552241  ,  0.521512  ,  0.52

Load the model to sample out of the posterior and perform posterior predictive checks.

In [70]:
sm2 = cmdstanpy.CmdStanModel(
    stan_file='stan_code/model_fix_c.stan'
)

data_ppc2 = {
    "N":100,
    "g": g,
    "F": F,
    "log_cte": -6,
    "c":0.2
}

INFO:cmdstanpy:stan to c++ (/Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/model_fix_c.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/model_fix_c


Perform the sampling and load the samples using arviz.

In [71]:
samples_ppc2 = sm2.sample(
    data2=data_ppc, 
    warmup_iters=100000, 
    sampling_iters=1000, 
    chains=4, 
    cores=4
)

samples_az_ppc = az.from_cmdstanpy(
    posterior=samples_ppc, posterior_predictive=["F_ppc"]
)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 2


First check diagnostics.

In [75]:
bebi103.stan.check_all_diagnostics(samples_az_ppc)

Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.


0

This looks much better already. Let's look at a parallel coordinate plot and check if we still see the same behavior.

In [76]:
bokeh.io.show(
    bebi103.viz.parcoord_plot(
        samples_az_ppc,
        pars=["delta_G", "p"],
        xtick_label_orientation="vertical",
        transformation="minmax",
    )
)

Oh yeah, there seems to be a clear relationship between $\Delta G$ and $p$. Let's investigate that in a corner plot.

In [77]:
bokeh.io.show(
    bebi103.viz.corner(
        samples_az_ppc,
        pars=["delta_G", "p", "sigma"],
        xtick_label_orientation="vertical",
    )
)

It seems that the correlation between these parameters does not seem to be very important, since the free energy only varies within 1%. However, we have to keep this in mind if we reduce the number of data points. Then this might become an issue.

## Attempt 3: Compute c

Instead of magically knowing $c$, let's try to compute $c$. If we know the MIC, i.e., the concentration of drug that inhibits growth of tetracycline sensitive cells for the time of observation. This might actually be measurable if we fix a time of observation and take a very fine drug gradient to find a good approximation for that concentration. Then, we can solve for $c$ from the fitness equation,

$$
c = \frac{1}{1+\exp\left[ -\beta\Delta G + \log(c^*_\mathrm{MIC}) \right]}.
$$

For concentrations higher than the MIC, the model gives negative fitness values, which I don't know we can measure actually. So let's assume that we stay below that concentration for now.

In [29]:
bebi103.stan.clean_cmdstan(path="stan_code")
sm_prior_pred = cmdstanpy.CmdStanModel(
    stan_file='stan_code/prior_pred_comp_c.stan'
)
data = {
    "N": 100,
    "g": np.linspace(0, 1, 100),
    "mu_G": -13.8,
    "sigma_G": 3,
    "log_cte": -6,
    "mu_p": 1,
    "sigma_p": .8,
    "sigma_sigma": 0.01
}

INFO:cmdstanpy:stan to c++ (/Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/prior_pred_comp_c.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/prior_pred_comp_c


In [30]:
samples_prior_pred = sm_prior_pred.sample(
    data=data, sampling_iters=100, fixed_param=True
)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:finish chain 1


In [31]:
samples_prior_pred = az.from_cmdstanpy(
    posterior=samples_prior_pred, prior=samples_prior_pred, prior_predictive=["F"]
)

In [32]:
df_ppc = samples_prior_pred.prior_predictive.F.to_dataframe().reset_index()

In [33]:
g = np.linspace(0, 1, 100)
df_ppc['expression'] = g[df_ppc['F_dim_0']]
hv.Curve(
    df_ppc,
    kdims='expression',
    vdims=['F', 'draw'],
).groupby(
    'draw'
).opts(
    frame_height=200,
    frame_width=400,
    line_width=0.5
).overlay(
)

Amazing, we don't get any negative fitness values anymore. This is exactly what we would expect since the chosen concentration is sub MIC. Let's do a posterior check with one of the data sets.

In [34]:
# Choosing a set that shows some variation
F = df_ppc.loc[df_ppc["draw"] == 16, "F"].values
g = df_ppc.loc[df_ppc["draw"] == 16, "expression"].values
F

array([0.0316775, 0.117763 , 0.125549 , 0.156978 , 0.180504 , 0.17821  ,
       0.247326 , 0.239905 , 0.287231 , 0.332027 , 0.355631 , 0.320165 ,
       0.407038 , 0.391997 , 0.397836 , 0.409383 , 0.417177 , 0.447225 ,
       0.473752 , 0.516302 , 0.443204 , 0.507435 , 0.524747 , 0.606288 ,
       0.521365 , 0.521923 , 0.561642 , 0.590387 , 0.578315 , 0.603453 ,
       0.585546 , 0.602137 , 0.593338 , 0.614172 , 0.64491  , 0.624486 ,
       0.658461 , 0.635472 , 0.660791 , 0.661276 , 0.695253 , 0.662536 ,
       0.676909 , 0.664137 , 0.653689 , 0.690484 , 0.705187 , 0.699513 ,
       0.723638 , 0.653057 , 0.711758 , 0.695671 , 0.727566 , 0.708366 ,
       0.733836 , 0.734961 , 0.763896 , 0.737593 , 0.755615 , 0.714864 ,
       0.747551 , 0.755814 , 0.774498 , 0.782817 , 0.794253 , 0.737826 ,
       0.719896 , 0.789331 , 0.766933 , 0.756422 , 0.731024 , 0.818541 ,
       0.781246 , 0.784399 , 0.7691   , 0.803815 , 0.8047   , 0.802398 ,
       0.79798  , 0.801204 , 0.793468 , 0.814789 , 

In [35]:
sm = cmdstanpy.CmdStanModel(
    stan_file='stan_code/model_comp_c.stan'
)

INFO:cmdstanpy:stan to c++ (/Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/model_comp_c.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: /Users/tomroschinger/git/evo_mwc/code/exploratory/stan_code/model_comp_c


In [36]:
data_ppc = {
    "N":100,
    "g": g,
    "F": F,
    "log_cte": -6
}

In [37]:
samples_ppc = sm.sample(
    data=data_ppc, 
    warmup_iters=100000, 
    sampling_iters=1000, 
    chains=4, 
    cores=4
)

samples_az_ppc = az.from_cmdstanpy(
    posterior=samples_ppc, posterior_predictive=["F_ppc"]
)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:finish chain 4


In [38]:
bebi103.stan.check_all_diagnostics(samples_az_ppc)

Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.


0

In [39]:
bokeh.io.show(
    bebi103.viz.parcoord_plot(
        samples_az_ppc,
        pars=["delta_G", "p"],
        xtick_label_orientation="vertical",
        transformation="minmax",
    )
)

Oh yeah, there seems to be a clear relationship between $\Delta G$ and $p$. Let's investigate that in a corner plot.

In [40]:
bokeh.io.show(
    bebi103.viz.corner(
        samples_az_ppc,
        pars=["delta_G", "p", "sigma"],
        xtick_label_orientation="vertical",
    )
)

## Simulation based calibration

In [41]:
#df = bebi103.stan.sbc(
#     prior_predictive_model=sm_prior_pred,
#     posterior_model=sm,
#     prior_predictive_model_data=data,
#     posterior_model_data=data_ppc,
#     measured_data=['F'],
#     parameters=['delta_G', 'p'],
#     cores=8,
#     progress_bar=True
#    
#)
#df.to_csv("sbc_data/20200309_1_comp_c.csv")

In [57]:
df2 = pd.read_csv("sbc_data/20200309_1_comp_c.csv")
points = hv.Points(
    data=df2,
    kdims=['shrinkage', ('z_score', 'z-score')]
).groupby(
    'parameter'
).opts(
    size=2,
    padding=0.1,
).overlay()#.options({'Points': {'color':hv.Cycle('Set2'), 'alpha': 0.3, 'size': 4}})

points

In [58]:
np.sum(df2["warning_code"] != 0)/len(df)
bokeh.io.show(
    bebi103.viz.sbc_rank_ecdf(
        df2, 
        diff=True,
        parameters=["delta_G", "p"],
    )
)

About one percent of the cases give weird results. That seems to be a good result. However, let's repeat the sbc with less data and see if we get more inconsistency.

In [47]:
data_ppc = {
    "N":24,
    "g": np.linspace(0, 1, 24),
    "F": np.linspace(0, 1, 24),
    "log_cte": -6
}

data = {
    "N": 24,
    "g": np.linspace(0, 1, 24),
    "mu_G": -13.8,
    "sigma_G": 3,
    "log_cte": -6,
    "mu_p": 1,
    "sigma_p": .8,
    "sigma_sigma": 0.01
}

In [48]:
df = bebi103.stan.sbc(
     prior_predictive_model=sm_prior_pred,
     posterior_model=sm,
     prior_predictive_model_data=data,
     posterior_model_data=data_ppc,
     measured_data=['F'],
     parameters=['delta_G', 'p'],
     cores=8,
     progress_bar=True
    
)
df.to_csv("sbc_data/20200309_2_comp_c.csv")

100%|██████████| 400/400 [01:34<00:00,  4.25it/s]


In [59]:
df3 = pd.read_csv("sbc_data/20200309_2_comp_c.csv")
points = hv.Points(
    data=df3,
    kdims=['shrinkage', ('z_score', 'z-score')]
).groupby(
    'parameter'
).opts(
    size=2,
    padding=0.1
).overlay()#.options({'Points': {'color':hv.Cycle('Set2'), 'alpha': 0.3, 'size': 4}})

points

Wow, even better results!

In [60]:
bokeh.io.show(
    bebi103.viz.sbc_rank_ecdf(
        df3, 
        diff=True,
        parameters=["delta_G", "p"],
    )
)