In [1]:
import subprocess

# Simulation study 
In this notebook we show the pipeline for carrying out the simulation study presented in Section 4.3. 

1) Simulate data sets
2) Do inference using mcmc algorithm 
3) Assess convergence 

##### 0. Define settings for experiment

In [4]:
# simulation setup 
rootpath = 'data/hercules_forewing_n=20.csv'
treepath = 'data/chazot_full_tree.nw'
outpath = '_simulation-study'
sigma_sim = 0.6
alpha_sim = 0.01
gamma = 0
n_datasets = 1
dt = 0.05
sti=1

# MCMC setup
tau_sigma = 0.07
tau_alpha = 0.003
prior_sigma = (0.4, 1.0)
prior_alpha = (0, 0.03)
n_iter = 10 # in the paper we use 6000
lambd = 0.97
ls = 5 # super root branch length 
gamma_mcmc = 0.002
x_s ='phylomean' 

##### 1. Simulate data set

In [5]:
subprocess.run(['python', 'simulate_data.py', 
                '-dt', f'{dt}',
                '-a', f'{alpha_sim}',
                '-s', f'{sigma_sim}',
                '-ov', f'{gamma}',
                '-root', f'{rootpath}',
                '-o', f'{outpath}/simdata', 
                '-simtree', f'{treepath}', 
                '-sti', f'{sti}', # whether or not do the stratonovich ito corr
                '-rb', '0'  
                  ])

!! no data seed given
simulation seed: 7126186253058767


here() starts at /Users/lkn315/Documents/stoch_phyl_mod_shape


> args <- commandArgs(trailingOnly = TRUE)
> 
> # do what we want 
> tree = read.tree(paste(args[1], '.nw', sep=''))
> vcv_ = vcv(tree)
> write.table(vcv_, file=paste(args[1],'_vcv.csv', sep=''), row.names=F, col.names=F)
> 


CompletedProcess(args=['python', 'simulate_data.py', '-dt', '0.05', '-a', '0.01', '-s', '0.6', '-ov', '0', '-root', 'data/hercules_forewing_n=20.csv', '-o', '_simulation-study/simdata', '-simtree', 'data/chazot_full_tree.nw', '-sti', '1', '-rb', '0'], returncode=0)

##### 2. Run MCMC to infer posterior 

We show in the notebook how to run MCMC but in practice we do not run MCMC from a notebook and this is just to give an example of how to run the mcmc.py script.  

A few comments regarding the mcmc.py script: 

Some of the variables in the code are not named in accordance with the paper (see below):
- gtheta = sigma 
- kalpha = alpha 
- obs_var = gamma 
- mirrored Gaussian = reflected Gaussian

In the paper we update parameters by evaluating $g_s(x_s; \theta)$ in the code we refer to this as "logrhorilde" and not g_s. 

In [6]:
with open (f'{outpath}.sh', 'w') as rsh:
    rsh.write(f'''#!/bin/bash
read seed
              
screen -md -S {outpath} python mcmc.py -N {n_iter} -l {lambd} -dt {dt} -datapath {outpath}/simdata -tau_sigma {tau_sigma} -tau_alpha {tau_alpha} -palpha {prior_alpha[0]} {round(prior_alpha[1]-prior_alpha[0],2)} -psigma {prior_sigma[0]} {round(prior_sigma[1]-prior_sigma[0],2)} -ov {gamma_mcmc} -super_root {x_s} -o {outpath}/runs -ds $seed
screen -md -S {outpath} python mcmc.py -N {n_iter} -l {lambd} -dt {dt} -datapath {outpath}/simdata -tau_sigma {tau_sigma} -tau_alpha {tau_alpha} -palpha {prior_alpha[0]} {round(prior_alpha[1]-prior_alpha[0],2)} -psigma {prior_sigma[0]} {round(prior_sigma[1]-prior_sigma[0],2)} -ov {gamma_mcmc} -super_root {x_s} -o {outpath}/runs -ds $seed
screen -md -S {outpath} python mcmc.py -N {n_iter} -l {lambd} -dt {dt} -datapath {outpath}/simdata -tau_sigma {tau_sigma} -tau_alpha {tau_alpha} -palpha {prior_alpha[0]} {round(prior_alpha[1]-prior_alpha[0],2)} -psigma {prior_sigma[0]} {round(prior_sigma[1]-prior_sigma[0],2)} -ov {gamma_mcmc} -super_root {x_s} -o {outpath}/runs -ds $seed

'''
    )

In [None]:
# bash _simulation-study.sh 

##### 3. Assess convergence, plot convergence diagnostics and visualize posterior. 

The posterior plots shown in the paper are produced by running the code below. 

In [None]:
#! ls _simulation-study/runs | while read seed; do python diagnostics.py -folder_runs _simulation-study/runs/$seed -folder_simdata _simulation-study/simdata/$seed -MCMC_iter 10 -burnin 1 -nnodes 59 -simtree data/chazot_full_tree.nw; done 