## CmdStanPyの動作確認

### CmdStanPy公式の"Hello World"

In [1]:
# import packages
import os
from cmdstanpy import cmdstan_path, CmdStanModel

# specify Stan program file
bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')

# instantiate the model; compiles the Stan program as needed.
bernoulli_model = CmdStanModel(stan_file=bernoulli_stan)

# inspect model object
print(bernoulli_model)

13:43:34 - cmdstanpy - INFO - compiling stan file /opt/conda/bin/cmdstan/examples/bernoulli/bernoulli.stan to exe file /opt/conda/bin/cmdstan/examples/bernoulli/bernoulli
13:43:54 - cmdstanpy - INFO - compiled model executable: /opt/conda/bin/cmdstan/examples/bernoulli/bernoulli


CmdStanModel: name=bernoulli
	 stan_file=/opt/conda/bin/cmdstan/examples/bernoulli/bernoulli.stan
	 exe_file=/opt/conda/bin/cmdstan/examples/bernoulli/bernoulli
	 compiler_options=stanc_options={}, cpp_options={}


In [2]:
# specify data file
bernoulli_data = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.data.json')

# fit the model
bern_fit = bernoulli_model.sample(data=bernoulli_data, output_dir='.')

# printing the object reports sampler commands, output files
print(bern_fit)

13:44:04 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

13:44:05 - cmdstanpy - INFO - CmdStan done processing.



CmdStanMCMC: model=bernoulli chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
	/workdir/demo/bernoulli-20220713134404_1.csv
	/workdir/demo/bernoulli-20220713134404_2.csv
	/workdir/demo/bernoulli-20220713134404_3.csv
	/workdir/demo/bernoulli-20220713134404_4.csv
 output_files:
	/workdir/demo/bernoulli-20220713134404_0-stdout.txt
	/workdir/demo/bernoulli-20220713134404_1-stdout.txt
	/workdir/demo/bernoulli-20220713134404_2-stdout.txt
	/workdir/demo/bernoulli-20220713134404_3-stdout.txt


In [3]:
bern_fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-7.30367,0.025262,0.789873,-8.91474,-7.00389,-6.74982,977.662,432.402,1.00005
theta,0.249902,0.003312,0.121379,0.075015,0.235526,0.465092,1343.2,594.071,1.00166


In [4]:
print(bern_fit.diagnose())

Processing csv files: /workdir/demo/bernoulli-20220713134404_1.csv, /workdir/demo/bernoulli-20220713134404_2.csv, /workdir/demo/bernoulli-20220713134404_3.csv, /workdir/demo/bernoulli-20220713134404_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.



### 8schools
- pystanとの速度比較

In [5]:
%%time
model = CmdStanModel(stan_file="8schools.stan")

13:44:08 - cmdstanpy - INFO - compiling stan file /workdir/demo/8schools.stan to exe file /workdir/demo/8schools
13:44:34 - cmdstanpy - INFO - compiled model executable: /workdir/demo/8schools


CPU times: user 4.26 ms, sys: 5.17 ms, total: 9.43 ms
Wall time: 25.7 s


- 15s程度でコンパイルできた(が、condaによるセットアップに変更後はやや遅くなった？)。
- pystan2では1min超を要したから、確かにコンパイルはかなり高速と言える。
  - コンパイルしたモデルは保存されており、2回目以降は数秒で読み込める。(`8schools`と`8schools.hpp`を削除すれば初回同様となる)

In [6]:
%%time
data = "8schools.data.json"
fit = model.sample(data=data, iter_sampling=1000, iter_warmup=500)

13:44:34 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

13:44:34 - cmdstanpy - INFO - CmdStan done processing.
	Chain 2 had 1 divergent transitions (0.1%)
	Use function "diagnose()" to see further information.



CPU times: user 161 ms, sys: 32.7 ms, total: 194 ms
Wall time: 416 ms


- 200ms強を要しており、微差ではあるがPyStan2（163ms）よりやや遅い？

In [7]:
fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-4.86047,0.071753,2.54823,-9.48142,-4.64316,-1.068,1261.24,4121.7,1.00059
mu,7.88261,0.110283,5.00483,-0.185577,7.83925,16.0439,2059.51,6730.44,1.00049
tau,6.37701,0.124605,5.20231,0.550699,5.16527,16.6497,1743.09,5696.39,1.00121
eta[1],0.388874,0.014757,0.934857,-1.16866,0.406352,1.92468,4013.24,13115.2,0.999978
eta[2],0.016556,0.014734,0.875863,-1.47287,0.017266,1.4308,3533.62,11547.8,0.99994
eta[3],-0.178025,0.015019,0.909732,-1.65649,-0.185702,1.30498,3668.83,11989.6,0.999763
eta[4],-0.047031,0.015551,0.899436,-1.54314,-0.033596,1.43543,3345.02,10931.5,1.00008
eta[5],-0.359469,0.015875,0.877671,-1.79732,-0.36868,1.10693,3056.7,9989.21,0.999882
eta[6],-0.216226,0.014556,0.883166,-1.65913,-0.230653,1.22917,3681.37,12030.6,1.00001
eta[7],0.343554,0.014649,0.892823,-1.162,0.367859,1.78423,3714.84,12140.0,0.999474


In [8]:
print(fit.diagnose())

Processing csv files: /tmp/tmpyyoo803l/8schoolscxz0gsyf/8schools-20220713134434_1.csv, /tmp/tmpyyoo803l/8schoolscxz0gsyf/8schools-20220713134434_2.csv, /tmp/tmpyyoo803l/8schoolscxz0gsyf/8schools-20220713134434_3.csv, /tmp/tmpyyoo803l/8schoolscxz0gsyf/8schools-20220713134434_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
1 of 4000 (0.03%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete.

