#  Generating new quantities of interest given a existing model, data, and sample.


The [generated quantities block](https://mc-stan.org/docs/reference-manual/program-block-generated-quantities.html)
computes quantities of interest based on the data,
transformed data, parameters, and transformed parameters.
It can be used to:

-  generate simulated data for model testing by forward sampling
-  generate predictions for new data
-  calculate posterior event probabilities, including multiple
   comparisons, sign tests, etc.
-  calculating posterior expectations
-  transform parameters for reporting
-  apply full Bayesian decision theory
-  calculate log likelihoods, deviances, etc. for model comparison

The `CmdStanModel` class `generate_quantities` method is useful once you
have successfully fit a model to your data and have a valid
sample from the posterior.
If you need to compute additional quantities of interest,
you can do this using the existing parameter estimates.
It takes the existing sample as input, and for each draw it
runs the generated quantities block of the program using the
per-draw parameter estimates to compute the quantities of interest.
In this way you add more columns of information to an existing sample.

## Example:  add posterior predictive checks to `bernoulli.stan`


In this example we use the CmdStan example model [bernoulli.stan](https://github.com/stan-dev/cmdstanpy/blob/master/test/data/bernoulli.stan)
and data file [bernoulli.data.json](https://github.com/stan-dev/cmdstanpy/blob/master/test/data/bernoulli.data.json) as our existing model and data.

We instantiate the model `bernoulli`,
as in the "Hello World" section
of the CmdStanPy [tutorial](../../cmdstanpy_tutorial.ipynb) notebook.

In [None]:
import os
from cmdstanpy import CmdStanModel, cmdstan_path

bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
bernoulli_path = os.path.join(bernoulli_dir, 'bernoulli.stan')

# instantiate bernoulli model, compile Stan program
bernoulli_model = CmdStanModel(stan_file=bernoulli_path)
bernoulli_model.compile()
print(bernoulli_model.code())

We create program [bernoulli_ppc.stan](https://github.com/stan-dev/cmdstanpy/blob/master/test/data/bernoulli_ppc.stan)
by adding a `generated quantities` block which generates a new data vector `y_rep` using the current estimate of theta.

In [None]:
bernoulli_ppc_model = CmdStanModel(stan_file='bernoulli_ppc.stan')
bernoulli_ppc_model.compile()
print(bernoulli_ppc_model.code())

As in the "Hello World" tutorial, we produce a sample from the posterior of the model conditioned on the data:

In [None]:
# fit the model to the data
bern_data = os.path.join(bernoulli_dir, 'bernoulli.data.json')
bern_fit = bernoulli_model.sample(data=bern_data)

The input data consists of `N` - the number of bernoulli trials and `y` - the list of observed outcomes.

In [None]:
import ujson
import statistics
with open(bern_data,'r') as fp:
    data_dict = ujson.load(fp)
print(data_dict)
print('mean of y: {}'.format(statistics.mean(data_dict['y'])))

The arguments to the `generate_quantities` method are:
 + the data used to fit the model  (`bern_data`)
 + the list of the resulting stan csv files (`bern_fit.csv_files`)

In [None]:
new_quantities = bernoulli_ppc_model.generate_quantities(data=bern_data, csv_files=bern_fit.runset.csv_files)

The ``CmdStanGQ`` object contains the values for all variables in the generated quantitites block of the program ``bernoulli_ppc.stan``.  Unlike the output from the ``sample`` method, it doesn't contain any information on the joint log probability density, sampler state, or parameters or transformed parameter values.

In [None]:
new_quantities.column_names

In [None]:
new_quantities.generated_quantities.shape

In [None]:
for i in range(len(new_quantities.column_names)):
    print(new_quantities.generated_quantities[:,i].mean())
