Quickstart
======

Here is a small example on how to do Bayesian model selection.

Step by step explanation
----------------------------

### Defining a model


To do model selection, we first need some models. A model, in the simplest case,
is just a callable which takes a single `dict` as input and returns a single `dict` as output. The keys of the input dictionary are the parameters of the model, the output
keys denote the summary statistics.
Here, the `dict` is passed as `parameters` and has the entry `x`, which denotes the mean of a Gaussian.
It returns the observed summary statistics `y`, which is just the sampled value.

In [None]:
%matplotlib inline
import os
import tempfile
import numpy as np

import scipy.stats as st

from pyabc import (ABCSMC, RV, Distribution,
                   PercentileDistanceFunction)

In [None]:
import sys 
sys.path.append('../../')
from model_comparison.utils import *
from model_comparison.mdns import *
from model_comparison.models import PoissonModel, NegativeBinomialModel

In [None]:
sample_size = 5
ntest = 10

k2 = 2.
theta2 = 1.0

k3 = 2.
theta3 = 2. 

# then the scale of the Gamma prior for the Poisson is given by
theta1 = 2.0
k1 = (k2 * theta2 * k3 * theta3) / theta1
print(k1)

In [None]:
%matplotlib inline
import os
import tempfile

import scipy.stats as st

from pyabc import (ABCSMC, RV, Distribution, 
                   PercentileDistanceFunction, sampler)

# Define a gaussian model
sigma = 1.


def Poisson_model(parameters):
    x = scipy.stats.poisson.rvs(parameters.lam, size=sample_size)
    return {'y1': x.mean(), 'y2': x.var()}

def NB_model(parameters): 
    lams = scipy.stats.gamma.rvs(a=parameters.k, scale=parameters.theta, size=sample_size)
    x = scipy.stats.poisson.rvs(lams)
    return {'y1': x.mean(), 'y2': x.var()}

For model selection we usually have more than one model.
These are assembled in a list. We
require a Bayesian prior over the models.
The default is to have a uniform prior over the model classes.
This concludes the model definition.

In [None]:
# We define two models, but they are identical so far
models = [Poisson_model, NB_model]


# Define priors 
prior1 = Distribution.from_dictionary_of_dictionaries(dict(lam={'type': 'gamma', 'kwargs': {'a':k1, 'scale': theta1}}))

prior2 = Distribution.from_dictionary_of_dictionaries(dict(k={'type': 'gamma', 'kwargs': {'a':k2, 'scale': theta2}}, 
                                                     theta={'type': 'gamma', 'kwargs': {'a':k3, 'scale': theta3}}))

parameter_priors = [prior1, prior2]

### Configuring the ABCSMC run

Having the models defined, we can plug together the `ABCSMC` class.
We need a distance function,
to measure the distance of obtained samples.

In [None]:
# We plug all the ABC options together
abc = ABCSMC(
    models, parameter_priors,
    PercentileDistanceFunction(measures_to_use=["y1", "y2"]))

### Setting the observed data

Actually measured data can now be passed to the ABCSMC.
This is set via the `new` method, indicating that we start
a new run as opposed to resuming a stored run (see the "resume stored run" example).
Moreover, we have to set the output database where the ABC-SMC run
is logged.

## Load test data for comparison to DE

In [None]:
import pickle 

fn = '../data/201812141735__poisson_posterior_trained_N50000M5_k2.p'
dd = pickle.load( open( fn, "rb" ) )['d_model_post']

In [None]:
sx_t = dd['sx_test']

## Run a loop over all test data points

In [None]:
n_simulations = np.zeros(sx_t.shape[0])
phat = np.zeros((2, sx_t.shape[0]))

for idx, y_observed in enumerate(sx_t): 
    # y_observed is the important piece here: our actual observation.
    # and we define where to store the results
    db_path = ("sqlite:///" +
               os.path.join(tempfile.gettempdir(), "test.db"))
    abc_id = abc.new(db_path, {"y1": y_observed[0], "y2": y_observed[1]})

    # We run the ABC until either criterion is met
    history = abc.run(minimum_epsilon=0.00001, max_nr_populations=5)
    
    n_simulations[idx] = history.total_nr_simulations
    
    phat[:, idx] = history.get_model_probabilities().values[-1, :]

## Calculate true posterior 

In [None]:
ptrue = np.array(dd['ppoi_exact'])
ptrue = np.vstack((ptrue, 1 - ptrue))

In [None]:
np.abs(phat - ptrue).mean()

In [None]:
dd = dict(phat=phat, nsims=n_simulations, ptrue=ptrue)

In [None]:
import time
time_stamp = time.strftime('%Y%m%d%H%M_')
fn = os.path.join('../data/', time_stamp + '_SMCABC_results_PoissonNB_Ntest{}.p'.format(sx_t.shape[0]))

with open(fn, 'wb') as outfile: 
    pickle.dump(dd, outfile, protocol=pickle.HIGHEST_PROTOCOL)