## CmdStanPyの動作確認

### CmdStanPy公式の"Hello World"

In [1]:
# import packages
import os
from cmdstanpy import cmdstan_path, CmdStanModel

# specify Stan program file
bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')

# instantiate the model; compiles the Stan program as needed.
bernoulli_model = CmdStanModel(stan_file=bernoulli_stan)

# inspect model object
print(bernoulli_model)

CmdStanModel: name=bernoulli
	 stan_file=/root/.cmdstan/cmdstan-2.30.0/examples/bernoulli/bernoulli.stan
	 exe_file=/root/.cmdstan/cmdstan-2.30.0/examples/bernoulli/bernoulli
	 compiler_options=stanc_options={}, cpp_options={}


In [2]:
# specify data file
bernoulli_data = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.data.json')

# fit the model
bern_fit = bernoulli_model.sample(data=bernoulli_data, output_dir='.')

# printing the object reports sampler commands, output files
print(bern_fit)

20:28:34 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

20:28:35 - cmdstanpy - INFO - CmdStan done processing.



CmdStanMCMC: model=bernoulli chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
	/workdir/demo/bernoulli-20220709202834_1.csv
	/workdir/demo/bernoulli-20220709202834_2.csv
	/workdir/demo/bernoulli-20220709202834_3.csv
	/workdir/demo/bernoulli-20220709202834_4.csv
 output_files:
	/workdir/demo/bernoulli-20220709202834_0-stdout.txt
	/workdir/demo/bernoulli-20220709202834_1-stdout.txt
	/workdir/demo/bernoulli-20220709202834_2-stdout.txt
	/workdir/demo/bernoulli-20220709202834_3-stdout.txt


In [3]:
bern_fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-7.26126,0.019761,0.706144,-8.70115,-6.98932,-6.74972,1276.99,510.386,1.00175
theta,0.245416,0.003084,0.11783,0.079008,0.232547,0.462923,1459.87,583.48,1.00243


In [4]:
print(bern_fit.diagnose())

Processing csv files: /workdir/demo/bernoulli-20220709202834_1.csv, /workdir/demo/bernoulli-20220709202834_2.csv, /workdir/demo/bernoulli-20220709202834_3.csv, /workdir/demo/bernoulli-20220709202834_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.



### 8schools
- pystanとの速度比較

In [5]:
%%time
model = CmdStanModel(stan_file="8schools.stan")

20:28:49 - cmdstanpy - INFO - compiling stan file /workdir/demo/8schools.stan to exe file /workdir/demo/8schools
20:29:03 - cmdstanpy - INFO - compiled model executable: /workdir/demo/8schools


CPU times: user 6.57 ms, sys: 1.86 ms, total: 8.43 ms
Wall time: 14.3 s


- 15s程度でコンパイルできた。
- pystan2では1min超を要したから、確かにコンパイルはかなり高速と言える。
  - コンパイルしたモデルは保存されており、2回目以降は数秒で読み込める。(`8schools`と`8schools.hpp`を削除すれば初回同様となる)

In [6]:
%%time
data = "8schools.data.json"
fit = model.sample(data=data, iter_sampling=1000, iter_warmup=500)

20:29:05 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

20:29:06 - cmdstanpy - INFO - CmdStan done processing.
	Chain 1 had 5 divergent transitions (0.5%)
	Use function "diagnose()" to see further information.



CPU times: user 190 ms, sys: 26.7 ms, total: 217 ms
Wall time: 427 ms


- 200ms強を要しており、微差ではあるがPyStan2（163ms）よりやや遅い？

In [7]:
fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-4.92376,0.072446,2.6047,-9.6973,-4.6693,-1.1158,1292.67,5477.43,1.0019
mu,8.11119,0.188841,5.39953,-0.17528,7.94529,17.0166,817.559,3464.23,1.00023
tau,6.56666,0.162449,5.38504,0.646541,5.28347,17.1337,1098.86,4656.19,1.00311
eta[1],0.404048,0.015845,0.933542,-1.16107,0.407255,1.92579,3471.06,14707.9,1.00052
eta[2],-0.025077,0.01546,0.922999,-1.53605,-0.031319,1.51418,3564.28,15102.9,1.00019
eta[3],-0.198637,0.014264,0.917975,-1.71168,-0.212369,1.35024,4141.77,17549.9,0.99953
eta[4],-0.058792,0.015147,0.905125,-1.53491,-0.082534,1.43891,3571.04,15131.5,0.999681
eta[5],-0.35435,0.017371,0.881159,-1.73449,-0.372865,1.11117,2573.25,10903.6,1.00002
eta[6],-0.224962,0.014916,0.900132,-1.71331,-0.25555,1.31839,3641.57,15430.4,0.999872
eta[7],0.335047,0.014381,0.883725,-1.13462,0.343099,1.77385,3775.98,15999.9,0.999689


In [8]:
print(fit.diagnose())

Processing csv files: /tmp/tmpdgu8jduq/8schoolsyvoajs_d/8schools-20220709202906_1.csv, /tmp/tmpdgu8jduq/8schoolsyvoajs_d/8schools-20220709202906_2.csv, /tmp/tmpdgu8jduq/8schoolsyvoajs_d/8schools-20220709202906_3.csv, /tmp/tmpdgu8jduq/8schoolsyvoajs_d/8schools-20220709202906_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
5 of 4000 (0.12%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete.

