## CmdStanPyの動作確認

### CmdStanPy公式の"Hello World"

In [1]:
# import packages
import os
from cmdstanpy import cmdstan_path, CmdStanModel

# specify Stan program file
bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')

# instantiate the model; compiles the Stan program as needed.
bernoulli_model = CmdStanModel(stan_file=bernoulli_stan)

# inspect model object
print(bernoulli_model)

INFO:cmdstanpy:found newer exe file, not recompiling
INFO:cmdstanpy:compiled model file: /root/.cmdstan/cmdstan-2.26.1/examples/bernoulli/bernoulli


CmdStanModel: name=bernoulli
	 stan_file=/root/.cmdstan/cmdstan-2.26.1/examples/bernoulli/bernoulli.stan
	 exe_file=/root/.cmdstan/cmdstan-2.26.1/examples/bernoulli/bernoulli
	 compiler_options=stanc_options=None, cpp_options=None


In [2]:
# specify data file
bernoulli_data = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.data.json')

# fit the model
bern_fit = bernoulli_model.sample(data=bernoulli_data, output_dir='.')

# printing the object reports sampler commands, output files
print(bern_fit)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 4


CmdStanMCMC: model=bernoulli chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
	/workdir/demo/bernoulli-202105071636-1.csv
	/workdir/demo/bernoulli-202105071636-2.csv
	/workdir/demo/bernoulli-202105071636-3.csv
	/workdir/demo/bernoulli-202105071636-4.csv
 output_files:
	/workdir/demo/bernoulli-202105071636-1-stdout.txt
	/workdir/demo/bernoulli-202105071636-2-stdout.txt
	/workdir/demo/bernoulli-202105071636-3-stdout.txt
	/workdir/demo/bernoulli-202105071636-4-stdout.txt


In [3]:
bern_fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-7.3,0.018,0.73,-8.8,-7.0,-6.7,1600.0,760.0,1.0
theta,0.25,0.0031,0.12,0.08,0.24,0.47,1500.0,700.0,1.0


In [4]:
bern_fit.diagnose()

INFO:cmdstanpy:Processing csv files: /workdir/demo/bernoulli-202105071636-1.csv, /workdir/demo/bernoulli-202105071636-2.csv, /workdir/demo/bernoulli-202105071636-3.csv, /workdir/demo/bernoulli-202105071636-4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.


'Processing csv files: /workdir/demo/bernoulli-202105071636-1.csv, /workdir/demo/bernoulli-202105071636-2.csv, /workdir/demo/bernoulli-202105071636-3.csv, /workdir/demo/bernoulli-202105071636-4.csv\n\nChecking sampler transitions treedepth.\nTreedepth satisfactory for all transitions.\n\nChecking sampler transitions for divergences.\nNo divergent transitions found.\n\nChecking E-BFMI - sampler transitions HMC potential energy.\nE-BFMI satisfactory for all transitions.\n\nEffective sample size satisfactory.\n\nSplit R-hat values satisfactory all parameters.\n\nProcessing complete, no problems detected.'

### 8schools
- pystanとの速度比較

In [5]:
%%time
model = CmdStanModel(stan_file="8schools.stan")

INFO:cmdstanpy:compiling stan program, exe file: /workdir/demo/8schools
INFO:cmdstanpy:compiler options: stanc_options=None, cpp_options=None
INFO:cmdstanpy:compiled model file: /workdir/demo/8schools


CPU times: user 4.04 ms, sys: 6.77 ms, total: 10.8 ms
Wall time: 17.2 s


- 17~19s程度でコンパイルできた。
- pystan2では1min超を要したから、確かにコンパイルはかなり高速と言える。
  - コンパイルしたモデルは保存されており、2回目以降は数秒で読み込める。(`8schools`と`8schools.hpp`を削除すれば初回同様となる)

In [6]:
%%time
data = "8schools.data.json"
fit = model.sample(data=data, iter_sampling=1000, iter_warmup=500)

INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 2
INFO:cmdstanpy:finish chain 4


CPU times: user 48.9 ms, sys: 38.5 ms, total: 87.3 ms
Wall time: 244 ms


- 200ms強を要しており、微差ではあるがPyStan2（163ms）よりやや遅い？

In [7]:
fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-4.8,0.071,2.6,-9.5,-4.6,-0.91,1300.0,6500.0,1.0
mu,7.8,0.14,5.3,-0.87,7.9,16.0,1400.0,6800.0,1.0
tau,6.8,0.19,5.8,0.55,5.5,18.0,970.0,4900.0,1.0
eta[1],0.4,0.016,0.94,-1.1,0.42,1.9,3567.0,17924.0,1.0
eta[2],-0.0089,0.014,0.86,-1.4,-0.013,1.4,3719.0,18687.0,1.0
eta[3],-0.17,0.016,0.94,-1.7,-0.18,1.4,3508.0,17630.0,1.0
eta[4],-0.036,0.015,0.87,-1.5,-0.05,1.4,3155.0,15856.0,1.0
eta[5],-0.36,0.015,0.86,-1.8,-0.37,1.1,3294.0,16552.0,1.0
eta[6],-0.22,0.015,0.89,-1.7,-0.22,1.3,3700.0,18591.0,1.0
eta[7],0.36,0.016,0.88,-1.1,0.37,1.8,3193.0,16044.0,1.0


In [8]:
fit.diagnose()

INFO:cmdstanpy:Processing csv files: /tmp/tmpiwvlgs46/8schools-202105071636-1-fs7gms4y.csv, /tmp/tmpiwvlgs46/8schools-202105071636-2-4r9gtu6w.csv, /tmp/tmpiwvlgs46/8schools-202105071636-3-9g5fnvkc.csv, /tmp/tmpiwvlgs46/8schools-202105071636-4-pex8tj_l.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
3 of 4000 (0.075%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete.


"Processing csv files: /tmp/tmpiwvlgs46/8schools-202105071636-1-fs7gms4y.csv, /tmp/tmpiwvlgs46/8schools-202105071636-2-4r9gtu6w.csv, /tmp/tmpiwvlgs46/8schools-202105071636-3-9g5fnvkc.csv, /tmp/tmpiwvlgs46/8schools-202105071636-4-pex8tj_l.csv\n\nChecking sampler transitions treedepth.\nTreedepth satisfactory for all transitions.\n\nChecking sampler transitions for divergences.\n3 of 4000 (0.075%) transitions ended with a divergence.\nThese divergent transitions indicate that HMC is not fully able to explore the posterior distribution.\nTry increasing adapt delta closer to 1.\nIf this doesn't remove all divergences, try to reparameterize the model.\n\nChecking E-BFMI - sampler transitions HMC potential energy.\nE-BFMI satisfactory for all transitions.\n\nEffective sample size satisfactory.\n\nSplit R-hat values satisfactory all parameters.\n\nProcessing complete."