<a href="https://colab.research.google.com/github/stan-dev/example-models/knitr/cloud-compute-2020/CmdStanPy_example_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CmdStanPy Example Notebook

This notebook demonstrates how to install the [CmdStanPy](https://cmdstanpy.readthedocs.io/en/latest/index.html) toolchain on a Google Colab instance and verify the installation by running the Stan NUTS-HMC sampler on the example model and data which are included with CmdStan. Each code block in this notebook updates the Python environment, therefore you must step through this notebook cell by cell.

Step 1: install CmdStanPy

In [0]:
# Install package CmdStanPy
!pip install --upgrade cmdstanpy

Step 2: download and untar the CmdStan binary for Google Colab instances.

In [0]:
# Download, unpack CmdStan binaries
import os
import urllib.request
import shutil
tgz_file = 'cmdstan-2-22-1.tgz'
tgz_url = 'https://storage.googleapis.com/cmdstan-2-22-tgz/cmdstan-2-22-1.tgz'

if not os.path.exists(tgz_file):
  try:
    urllib.request.urlretrieve(tgz_url, tgz_file)
    shutil.unpack_archive(tgz_file)
  except Exception:
    raise KeyboardInterrupt


Step 3: Register the CmdStan install location.

In [0]:
# CmdStan is installed in current working directory
!ls
# Specify CmdStan location via environment variable
import os
os.environ['CMDSTAN'] = './cmdstan-2.22.1'
# Check CmdStan path
from cmdstanpy import cmdstan_path
cmdstan_path()

The CmdStan installation includes a simple example program `bernoulli.stan` and test data `bernoulli.data.json`. These are in the CmdStan installation directory `examples/bernoulli`.

The program `bernoulli.stan` takes a vector `y` of length `N` containing binary outcomes and uses a bernoulli distribution to estimate `theta`, the chance of success.

In [0]:
bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')
with open(bernoulli_stan, 'r') as fd:
        print('\n'.join(fd.read().splitlines()))

The data file `bernoulli.data.json` contains 10 observations, split between 2 successes (1) and 8 failures (0).

In [0]:
bernoulli_data = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.data.json')
with open(bernoulli_data, 'r') as fd:
        print('\n'.join(fd.read().splitlines()))

The following code test that the CmdStanPy toolchain is properly installed by compiling the example model, fitting it to the data, and obtaining a summary of estimates of the posterior distribution of all parameters and quantities of interest.



In [0]:
# Run CmdStanPy Hello, World! example
from cmdstanpy import cmdstan_path, CmdStanModel

# Compile example model bernoulli.stan
bernoulli_model = CmdStanModel(stan_file=bernoulli_stan)

# Condition on example data bernoulli.data.json
bern_fit = bernoulli_model.sample(data=bernoulli_data, seed=123)

# Print a summary of the posterior sample
bern_fit.summary()