Quickstart
======

Here is a small example on how to do Bayesian model selection.

Step by step explanation
----------------------------

### Defining a model


To do model selection, we first need some models. A model, in the simplest case,
is just a callable which takes a single `dict` as input and returns a single `dict` as output. The keys of the input dictionary are the parameters of the model, the output
keys denote the summary statistics.
Here, the `dict` is passed as `parameters` and has the entry `x`, which denotes the mean of a Gaussian.
It returns the observed summary statistics `y`, which is just the sampled value.

In [7]:
%matplotlib inline
import os
import tempfile
import numpy as np

import scipy.stats as st

from pyabc import (ABCSMC, RV, Distribution,
                   PercentileDistanceFunction)

# Define a gaussian model
sigma = 1.


def model(parameters):
    # sample from a gaussian
    y = st.norm(parameters.x, sigma).rvs()
    # return the sample as dictionary
    return {"y": y}

For model selection we usually have more than one model.
These are assembled in a list. We
require a Bayesian prior over the models.
The default is to have a uniform prior over the model classes.
This concludes the model definition.

In [2]:
# We define two models, but they are identical so far
models = [model, model, model]


# However, our models' priors are not the same.
# Their mean differs.
mu_x_1, mu_x_2, mu_x_3 = 0, 1, -1
parameter_priors = [
    Distribution(x=RV("norm", mu_x_1, sigma)),
    Distribution(x=RV("norm", mu_x_2, sigma)), 
    Distribution(x=RV("norm", mu_x_3, sigma))
]

### Configuring the ABCSMC run

Having the models defined, we can plug together the `ABCSMC` class.
We need a distance function,
to measure the distance of obtained samples.

In [3]:
# We plug all the ABC options together
abc = ABCSMC(
    models, parameter_priors,
    PercentileDistanceFunction(measures_to_use=["y"]))

### Setting the observed data

Actually measured data can now be passed to the ABCSMC.
This is set via the `new` method, indicating that we start
a new run as opposed to resuming a stored run (see the "resume stored run" example).
Moreover, we have to set the output database where the ABC-SMC run
is logged.

## Generate test data for comparison to DE

In [43]:
# set same seed as in the NDE notebook
rng = np.random.RandomState(seed=42)
sx_t = rng.normal(loc=0, scale=1, size=100)
sx_t

array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337,
       -0.23413696,  1.57921282,  0.76743473, -0.46947439,  0.54256004,
       -0.46341769, -0.46572975,  0.24196227, -1.91328024, -1.72491783,
       -0.56228753, -1.01283112,  0.31424733, -0.90802408, -1.4123037 ,
        1.46564877, -0.2257763 ,  0.0675282 , -1.42474819, -0.54438272,
        0.11092259, -1.15099358,  0.37569802, -0.60063869, -0.29169375,
       -0.60170661,  1.85227818, -0.01349722, -1.05771093,  0.82254491,
       -1.22084365,  0.2088636 , -1.95967012, -1.32818605,  0.19686124,
        0.73846658,  0.17136828, -0.11564828, -0.3011037 , -1.47852199,
       -0.71984421, -0.46063877,  1.05712223,  0.34361829, -1.76304016,
        0.32408397, -0.38508228, -0.676922  ,  0.61167629,  1.03099952,
        0.93128012, -0.83921752, -0.30921238,  0.33126343,  0.97554513,
       -0.47917424, -0.18565898, -1.10633497, -1.19620662,  0.81252582,
        1.35624003, -0.07201012,  1.0035329 ,  0.36163603, -0.64

## Run a loop over all test data points

In [44]:
n_simulations = np.zeros_like(sx_t)
phat = np.zeros((3, sx_t.size))

for idx, y_observed in enumerate(sx_t): 
    # y_observed is the important piece here: our actual observation.
    # and we define where to store the results
    db_path = ("sqlite:///" +
               os.path.join(tempfile.gettempdir(), "test.db"))
    abc_id = abc.new(db_path, {"y": y_observed})

    # We run the ABC until either criterion is met
    history = abc.run(minimum_epsilon=0.0001, max_nr_populations=10)
    
    n_simulations[idx] = history.total_nr_simulations
    
    phat[:, idx] = history.get_model_probabilities().values[-1, :]

INFO:History:Start <ABCSMC(id=6, start_time=2018-12-13 10:53:44.911687, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4261196588509821
INFO:ABC:t:0 eps:0.4261196588509821
INFO:ABC:t:1 eps:0.18186821712639498
INFO:ABC:t:2 eps:0.08674800038941173
INFO:ABC:t:3 eps:0.04369223330930019
INFO:ABC:t:4 eps:0.01903570428781887
INFO:ABC:t:5 eps:0.00960534217009361
INFO:ABC:t:6 eps:0.004633065202253647
INFO:ABC:t:7 eps:0.0024198317974152067
INFO:ABC:t:8 eps:0.0013214918294173577
INFO:ABC:t:9 eps:0.0006750534835048037
INFO:History:Done <ABCSMC(id=6, start_time=2018-12-13 10:53:44.911687, end_time=2018-12-13 10:54:27.260494)>
INFO:History:Start <ABCSMC(id=7, start_time=2018-12-13 10:54:27.282686, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4256653559487356
INFO:ABC:t:0 eps:0.4256653559487356
INFO:ABC:t:1 eps:0.1670614066856134
INFO:ABC:t:2 eps:0.08613725822665178
INFO:ABC:t:3 eps:0.037998616196034535
INFO:ABC:t:4 eps:0.022275932048967006
INFO:ABC:t:5 eps:0.010926191418412896
INFO:ABC:t:6 e

INFO:Epsilon:initial epsilon is 0.740235215596532
INFO:ABC:t:0 eps:0.740235215596532
INFO:ABC:t:1 eps:0.38943223354056383
INFO:ABC:t:2 eps:0.1954152752962589
INFO:ABC:t:3 eps:0.09402465409800542
INFO:ABC:t:4 eps:0.04998830284578086
INFO:ABC:t:5 eps:0.02512796276334482
INFO:ABC:t:6 eps:0.012525780322445522
INFO:ABC:t:7 eps:0.0071729959377733335
INFO:ABC:t:8 eps:0.0034554844283999743
INFO:ABC:t:9 eps:0.001928433508291081
INFO:History:Done <ABCSMC(id=19, start_time=2018-12-13 11:01:45.064704, end_time=2018-12-13 11:02:06.425398)>
INFO:History:Start <ABCSMC(id=20, start_time=2018-12-13 11:02:06.447565, end_time=None)>
INFO:Epsilon:initial epsilon is 0.6675869920703293
INFO:ABC:t:0 eps:0.6675869920703293
INFO:ABC:t:1 eps:0.3330318381773713
INFO:ABC:t:2 eps:0.1531531788477856
INFO:ABC:t:3 eps:0.07237195540135269
INFO:ABC:t:4 eps:0.04798991133753922
INFO:ABC:t:5 eps:0.023359452415921272
INFO:ABC:t:6 eps:0.01214728820010368
INFO:ABC:t:7 eps:0.005053716245287528
INFO:ABC:t:8 eps:0.0025401663545

INFO:ABC:t:1 eps:0.22295726871268964
INFO:ABC:t:2 eps:0.10312762857199119
INFO:ABC:t:3 eps:0.05212568775007733
INFO:ABC:t:4 eps:0.024141225872418633
INFO:ABC:t:5 eps:0.016591163467995596
INFO:ABC:t:6 eps:0.008059834282194377
INFO:ABC:t:7 eps:0.004007707005222067
INFO:ABC:t:8 eps:0.0021035274191591223
INFO:ABC:t:9 eps:0.0010883860917904873
INFO:History:Done <ABCSMC(id=32, start_time=2018-12-13 11:10:26.966443, end_time=2018-12-13 11:10:54.357894)>
INFO:History:Start <ABCSMC(id=33, start_time=2018-12-13 11:10:54.380923, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4105436336945234
INFO:ABC:t:0 eps:0.4105436336945234
INFO:ABC:t:1 eps:0.18194550106070484
INFO:ABC:t:2 eps:0.09993641497886115
INFO:ABC:t:3 eps:0.04914616028891093
INFO:ABC:t:4 eps:0.030618452093335705
INFO:ABC:t:5 eps:0.012911930223374515
INFO:ABC:t:6 eps:0.005492334325364378
INFO:ABC:t:7 eps:0.0028191692642319023
INFO:ABC:t:8 eps:0.001369273816227413
INFO:ABC:t:9 eps:0.0007361524110509372
INFO:History:Done <ABCSMC(id=33,

INFO:ABC:t:3 eps:0.05690595051363307
INFO:ABC:t:4 eps:0.029048981498656506
INFO:ABC:t:5 eps:0.013188247282644486
INFO:ABC:t:6 eps:0.006900956987296254
INFO:ABC:t:7 eps:0.0037855665494962534
INFO:ABC:t:8 eps:0.002104859256507899
INFO:ABC:t:9 eps:0.0010567311043319803
INFO:History:Done <ABCSMC(id=45, start_time=2018-12-13 11:18:33.893218, end_time=2018-12-13 11:19:01.857509)>
INFO:History:Start <ABCSMC(id=46, start_time=2018-12-13 11:19:01.884346, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4685604975376953
INFO:ABC:t:0 eps:0.4685604975376953
INFO:ABC:t:1 eps:0.22465305432276883
INFO:ABC:t:2 eps:0.11339055170002652
INFO:ABC:t:3 eps:0.05553195092911821
INFO:ABC:t:4 eps:0.024749613816067245
INFO:ABC:t:5 eps:0.0097371773126851
INFO:ABC:t:6 eps:0.00472265987653952
INFO:ABC:t:7 eps:0.0022705676754098485
INFO:ABC:t:8 eps:0.0011988236350723254
INFO:ABC:t:9 eps:0.0005982805326289636
INFO:History:Done <ABCSMC(id=46, start_time=2018-12-13 11:19:01.884346, end_time=2018-12-13 11:19:49.453206)

INFO:ABC:t:5 eps:0.009904672964969378
INFO:ABC:t:6 eps:0.0046271204882909235
INFO:ABC:t:7 eps:0.0020180432093390363
INFO:ABC:t:8 eps:0.000908705644602654
INFO:ABC:t:9 eps:0.00035362428733069545
INFO:History:Done <ABCSMC(id=58, start_time=2018-12-13 11:25:23.561432, end_time=2018-12-13 11:26:30.933561)>
INFO:History:Start <ABCSMC(id=59, start_time=2018-12-13 11:26:30.961772, end_time=None)>
INFO:Epsilon:initial epsilon is 0.45593017483610054
INFO:ABC:t:0 eps:0.45593017483610054
INFO:ABC:t:1 eps:0.17088447769489645
INFO:ABC:t:2 eps:0.08131386165419498
INFO:ABC:t:3 eps:0.039407544345890284
INFO:ABC:t:4 eps:0.02215398033615065
INFO:ABC:t:5 eps:0.008695905133818814
INFO:ABC:t:6 eps:0.004025117482511962
INFO:ABC:t:7 eps:0.0018445012054768139
INFO:ABC:t:8 eps:0.0008990788116620674
INFO:ABC:t:9 eps:0.00044418448701911056
INFO:History:Done <ABCSMC(id=59, start_time=2018-12-13 11:26:30.961772, end_time=2018-12-13 11:27:30.129511)>
INFO:History:Start <ABCSMC(id=60, start_time=2018-12-13 11:27:30.

INFO:ABC:t:7 eps:0.005915333271673641
INFO:ABC:t:8 eps:0.0032256673209272754
INFO:ABC:t:9 eps:0.0016657105190303096
INFO:History:Done <ABCSMC(id=71, start_time=2018-12-13 11:33:02.677693, end_time=2018-12-13 11:33:24.986902)>
INFO:History:Start <ABCSMC(id=72, start_time=2018-12-13 11:33:25.016397, end_time=None)>
INFO:Epsilon:initial epsilon is 0.40144601197544905
INFO:ABC:t:0 eps:0.40144601197544905
INFO:ABC:t:1 eps:0.19133757019665712
INFO:ABC:t:2 eps:0.11260798226110388
INFO:ABC:t:3 eps:0.05673890931751259
INFO:ABC:t:4 eps:0.02787142376383554
INFO:ABC:t:5 eps:0.0136013775560702
INFO:ABC:t:6 eps:0.005921946899606175
INFO:ABC:t:7 eps:0.003341989915863249
INFO:ABC:t:8 eps:0.0015667367346324848
INFO:ABC:t:9 eps:0.0009635315145134357
INFO:History:Done <ABCSMC(id=72, start_time=2018-12-13 11:33:25.016397, end_time=2018-12-13 11:34:02.201561)>
INFO:History:Start <ABCSMC(id=73, start_time=2018-12-13 11:34:02.223199, end_time=None)>
INFO:Epsilon:initial epsilon is 0.5111966303398946
INFO:ABC

INFO:ABC:t:9 eps:0.000802085380367334
INFO:History:Done <ABCSMC(id=84, start_time=2018-12-13 11:40:00.030914, end_time=2018-12-13 11:40:35.422658)>
INFO:History:Start <ABCSMC(id=85, start_time=2018-12-13 11:40:35.444188, end_time=None)>
INFO:Epsilon:initial epsilon is 0.770877129071267
INFO:ABC:t:0 eps:0.770877129071267
INFO:ABC:t:1 eps:0.3312165520098751
INFO:ABC:t:2 eps:0.1526359428662953
INFO:ABC:t:3 eps:0.07679165713776932
INFO:ABC:t:4 eps:0.045588339714952075
INFO:ABC:t:5 eps:0.020441093636835977
INFO:ABC:t:6 eps:0.01037769377003883
INFO:ABC:t:7 eps:0.004810419438044556
INFO:ABC:t:8 eps:0.0021885357782371003
INFO:ABC:t:9 eps:0.0010150696357512723
INFO:History:Done <ABCSMC(id=85, start_time=2018-12-13 11:40:35.444188, end_time=2018-12-13 11:41:10.082338)>
INFO:History:Start <ABCSMC(id=86, start_time=2018-12-13 11:41:10.106975, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4138124826312378
INFO:ABC:t:0 eps:0.4138124826312378
INFO:ABC:t:1 eps:0.2234229846503236
INFO:ABC:t:2 eps:0

INFO:History:Start <ABCSMC(id=98, start_time=2018-12-13 11:48:43.426387, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4477890603302149
INFO:ABC:t:0 eps:0.4477890603302149
INFO:ABC:t:1 eps:0.2376864626818172
INFO:ABC:t:2 eps:0.10778377969132709
INFO:ABC:t:3 eps:0.04222719775964892
INFO:ABC:t:4 eps:0.02211310082401029
INFO:ABC:t:5 eps:0.010404153617294647
INFO:ABC:t:6 eps:0.00551687364591448
INFO:ABC:t:7 eps:0.003014321556507482
INFO:ABC:t:8 eps:0.0014257761760090113
INFO:ABC:t:9 eps:0.000720162014310514
INFO:History:Done <ABCSMC(id=98, start_time=2018-12-13 11:48:43.426387, end_time=2018-12-13 11:49:23.446029)>
INFO:History:Start <ABCSMC(id=99, start_time=2018-12-13 11:49:23.474180, end_time=None)>
INFO:Epsilon:initial epsilon is 0.4283286973019255
INFO:ABC:t:0 eps:0.4283286973019255
INFO:ABC:t:1 eps:0.19951970371113004
INFO:ABC:t:2 eps:0.09343494153455925
INFO:ABC:t:3 eps:0.04766670927247679
INFO:ABC:t:4 eps:0.019559039685593752
INFO:ABC:t:5 eps:0.011311530778188169
INFO:ABC:t:6 e

## Calculate true posterior 

In [53]:
import scipy
import sys 
sys.path.append('../../')
from model_comparison.models import BaseModel


# background model prior 
prior_m0 = scipy.stats.norm(0, 1)
# signal model prior 
prior_m1 = scipy.stats.norm(1, 1)
# third model 
prior_m2 = scipy.stats.norm(-1, 1)

class GaussianModel(BaseModel):
    def __init__(self, std, dim_param=1, sample_size=10, n_workers=1, seed=None):
        super().__init__(dim_param=dim_param, sample_size=sample_size, n_workers=n_workers, seed=seed)
        self.std = std
        self.posterior = None

    def gen_single(self, params):
        # in multiprocessing the parameter vector additionally contains a seed
        if self.run_parallel:
            mu, seed = params
            self.rng.seed(int(seed))
        else:
            mu = params
        return self.rng.normal(loc=mu, scale=self.std, size=self.sample_size)

# models 
sample_size = 1
m0 = GaussianModel(std=1, sample_size=sample_size)
m1 = GaussianModel(std=1, sample_size=sample_size)
m2 = GaussianModel(std=1, sample_size=sample_size)

marli0 = np.array([scipy.stats.norm.pdf(x=xo, loc=prior_m0.mean(), 
                                        scale=np.sqrt(m0.std**2 + prior_m0.std()**2)) for xo in sx_t])
marli1 = np.array([scipy.stats.norm.pdf(x=xo, loc=prior_m1.mean(), 
                                        scale=np.sqrt(m1.std**2 + prior_m1.std()**2)) for xo in sx_t])
marli2 = np.array([scipy.stats.norm.pdf(x=xo, loc=prior_m2.mean(), 
                                        scale=np.sqrt(m2.std**2 + prior_m2.std()**2)) for xo in sx_t])

p_m0_xtest = marli0 / (marli1 + marli0 + marli2)
p_m1_xtest = marli1 / (marli1 + marli0 + marli2)
p_m2_xtest = marli2 / (marli1 + marli0 + marli2)

In [57]:
ptrue = np.vstack((p_m0_xtest, p_m1_xtest, p_m2_xtest))

In [58]:
np.abs(phat - ptrue).mean()

0.04090708698630825