## CmdStanPyの動作確認

### CmdStanPy公式の"Hello World"

In [1]:
# import packages
import os
from cmdstanpy import cmdstan_path, CmdStanModel

# specify Stan program file
bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')

# instantiate the model; compiles the Stan program as needed.
bernoulli_model = CmdStanModel(stan_file=bernoulli_stan)

# inspect model object
print(bernoulli_model)

CmdStanModel: name=bernoulli
	 stan_file=/root/.cmdstan/cmdstan-2.30.0/examples/bernoulli/bernoulli.stan
	 exe_file=/root/.cmdstan/cmdstan-2.30.0/examples/bernoulli/bernoulli
	 compiler_options=stanc_options={}, cpp_options={}


In [2]:
# specify data file
bernoulli_data = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.data.json')

# fit the model
bern_fit = bernoulli_model.sample(data=bernoulli_data, output_dir='.')

# printing the object reports sampler commands, output files
print(bern_fit)

18:38:39 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

18:38:41 - cmdstanpy - INFO - CmdStan done processing.



CmdStanMCMC: model=bernoulli chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
	/workdir/demo/bernoulli-20220709183840_1.csv
	/workdir/demo/bernoulli-20220709183840_2.csv
	/workdir/demo/bernoulli-20220709183840_3.csv
	/workdir/demo/bernoulli-20220709183840_4.csv
 output_files:
	/workdir/demo/bernoulli-20220709183840_0-stdout.txt
	/workdir/demo/bernoulli-20220709183840_1-stdout.txt
	/workdir/demo/bernoulli-20220709183840_2-stdout.txt
	/workdir/demo/bernoulli-20220709183840_3-stdout.txt


In [3]:
bern_fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-7.28204,0.021632,0.735511,-8.76462,-6.989,-6.75078,1156.08,495.957,1.00466
theta,0.249313,0.003317,0.11978,0.078098,0.236475,0.465121,1304.38,559.581,1.00201


In [4]:
print(bern_fit.diagnose())

Processing csv files: /workdir/demo/bernoulli-20220709183840_1.csv, /workdir/demo/bernoulli-20220709183840_2.csv, /workdir/demo/bernoulli-20220709183840_3.csv, /workdir/demo/bernoulli-20220709183840_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.



### 8schools
- pystanとの速度比較

In [5]:
%%time
model = CmdStanModel(stan_file="8schools.stan")

18:38:44 - cmdstanpy - INFO - compiling stan file /workdir/demo/8schools.stan to exe file /workdir/demo/8schools
18:38:59 - cmdstanpy - INFO - compiled model executable: /workdir/demo/8schools


CPU times: user 7.15 ms, sys: 1.73 ms, total: 8.88 ms
Wall time: 15.3 s


- 15s程度でコンパイルできた。
- pystan2では1min超を要したから、確かにコンパイルはかなり高速と言える。
  - コンパイルしたモデルは保存されており、2回目以降は数秒で読み込める。(`8schools`と`8schools.hpp`を削除すれば初回同様となる)

In [6]:
%%time
data = "8schools.data.json"
fit = model.sample(data=data, iter_sampling=1000, iter_warmup=500)

18:39:27 - cmdstanpy - INFO - CmdStan start processing


chain 1 |          | 00:00 Status

chain 2 |          | 00:00 Status

chain 3 |          | 00:00 Status

chain 4 |          | 00:00 Status

                                                                                                                                                                                                                                                                                                                                

18:39:28 - cmdstanpy - INFO - CmdStan done processing.
	Chain 2 had 1 divergent transitions (0.1%)
	Chain 3 had 1 divergent transitions (0.1%)
	Chain 4 had 3 divergent transitions (0.3%)
	Use function "diagnose()" to see further information.



CPU times: user 159 ms, sys: 36.3 ms, total: 195 ms
Wall time: 347 ms


- 200ms強を要しており、微差ではあるがPyStan2（163ms）よりやや遅い？

In [7]:
fit.summary()

Unnamed: 0_level_0,Mean,MCSE,StdDev,5%,50%,95%,N_Eff,N_Eff/s,R_hat
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
lp__,-4.91433,0.077315,2.63487,-9.73794,-4.7135,-1.06305,1161.43,4721.25,1.00213
mu,7.88518,0.11863,5.14434,-0.496117,7.87497,16.1939,1880.5,7644.3,1.00036
tau,6.53019,0.141251,5.5098,0.468473,5.19635,17.196,1521.55,6185.17,1.00154
eta[1],0.393664,0.01574,0.927746,-1.16201,0.409985,1.85686,3473.96,14121.8,1.00013
eta[2],0.010998,0.015033,0.87909,-1.44683,0.006289,1.44258,3419.69,13901.2,0.999298
eta[3],-0.191749,0.015167,0.905373,-1.6958,-0.196473,1.30621,3563.14,14484.3,1.00226
eta[4],-0.047209,0.015636,0.908164,-1.56302,-0.034566,1.4439,3373.46,13713.3,1.00041
eta[5],-0.337758,0.014137,0.878224,-1.75429,-0.350363,1.15328,3859.15,15687.6,1.00013
eta[6],-0.220067,0.015122,0.893699,-1.71048,-0.220767,1.27306,3492.91,14198.8,0.99979
eta[7],0.349998,0.01436,0.899337,-1.15284,0.361596,1.79296,3922.06,15943.3,0.999737


In [8]:
print(fit.diagnose())

Processing csv files: /tmp/tmpzq6d9p9t/8schoolss6i_9sli/8schools-20220709183927_1.csv, /tmp/tmpzq6d9p9t/8schoolss6i_9sli/8schools-20220709183927_2.csv, /tmp/tmpzq6d9p9t/8schoolss6i_9sli/8schools-20220709183927_3.csv, /tmp/tmpzq6d9p9t/8schoolss6i_9sli/8schools-20220709183927_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
5 of 4000 (0.12%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete.

