## Analysis problem
The dataset that we want to analyze in this project is about number of deaths for viral hepatitis and sequelae of viral hepatitis in 25 European countries. Our dataset is divided into two files:
 - one contains the total amount of deaths for all causes of death from 2001 to 2010
 - the other contains the number of deaths for viral hepatitis and sequelae of viral hepatitis from 2001 to 2010

The data in the first file will be used as normalization factor for the number of deaths for viral hepatitis. Therefore, the data that will be analyzed is the ratio of deaths for viral hepatitis over the total number of deaths in that country.
Our objective for this analysis is to predict for each country the number of deaths for viral hepatitis in the following years. We decided not to make any prediction for a country outside the list because, even if there could be a hyperdistribution which the parameters of our models could follow, we considered that there could be too many differences among the countries which would make such a prediction unreliable.

## Model description
As first choice we will assess the use of a Separate model. We will consider each country as a separate group, and so we will have a distribution that describe each of them independently. 

![Separate Model](./separate.png)

We will also evaluate a Hierarchial model, in order to verify that our initial assumption about the independece of the distribution of each country is correct. In other words, we expect that the hierarchial model will perform worse than the Separate model, which is our first choice.

![Hierarchial Model](./hierarchial.png)

## Prior choices
Our prior hypothesis are the followings:
 - we assume that the data for each  distributed $y_{ij}\mid\theta_j \sim \mathcal{N}(\mu_ {ij}, \sigma_j)$
 - as prior distribution we will use an uninformative flat prior $\theta_j \sim \mathcal{U}([0,1])=Beta(1,1)$

In the Separate model we will fit a linear gaussian model for each group independently:
$$\mu_{ij} = \alpha_j + \beta_j x_{ij}$$
and then for the Hierarchial model we will add the layer represented by the hyperdistributions for $\alpha$, $\beta$ and $\sigma$:
$$\alpha \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha)$$
$$\beta \sim \mathcal{N}(\mu_\beta, \sigma_\beta)$$
$$\sigma \sim Inv-\chi^2(\sigma^2_0, \nu_0)$$


In [3]:
import pystan
import matplotlib.pyplot as plt
import pickle
import numpy as np
import pandas as pd
import stan_utility
import matplotlib as mpl

d = pd.read_csv("../dataset/deads.txt", sep=" ", header=None, skiprows=1)
h = pd.read_csv("../dataset/hepatitis.txt", sep=" ", header=None, skiprows=1)
countries = d[0].as_matrix()
d = d.iloc[:, 1:d.shape[1]].as_matrix()
h = h.iloc[:, 1:h.shape[1]].as_matrix()

data = h/d

# Separate model

In [14]:
separate_model_code = '''
data {
    int<lower=0> N;
    vector[N] x; // group indicator
    vector[N] y;
    real xpred;
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
}

transformed parameters {
  vector[N] mu;
  mu = alpha + beta*x;
}

model {
    y ~ normal(mu, sigma);
}

generated quantities {
    real ypred;
    vector[N] log_lik;
    ypred = normal_rng(alpha + beta*xpred, sigma);
    for (i in 1:N)
        log_lik[i] = normal_lpdf(y[i] | mu[i], sigma);
}
'''

sm = stan_utility.compile_model_plus(separate_model_code)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_570544391b1e440e4aa07d2bfc218092 NOW.


In [15]:
N = 10
x = range(2001, 2011)
xpred=2011
K = 25 

samples = []
for i in range(K):
    y = np.log(data[i]).ravel()   # observations
    separate_model_data = dict(
        N = N,
        #K = K,  # 25 contries
        x = x,  # group indicators
        y = y,  # observations
        xpred=xpred
    )

    samples.append(sm.sampling(n_jobs=4, data=separate_model_data))

    print('Completed: group ', i)
print('Completed: separate model')

Completed: group  0
Completed: group  1
Completed: group  2
Completed: group  3
Completed: group  4
Completed: group  5
Completed: group  6
Completed: group  7
Completed: group  8
Completed: group  9
Completed: group  10
Completed: group  11
Completed: group  12
Completed: group  13
Completed: group  14
Completed: group  15
Completed: group  16
Completed: group  17
Completed: group  18
Completed: group  19
Completed: group  20
Completed: group  21
Completed: group  22
Completed: group  23
Completed: group  24
Completed: separate model


# Hierarchical

In [5]:
hierarchical_model_code = '''
data {
  int<lower=0> N; // number of data points
  int<lower=0> K; // number of groups
  int<lower=1,upper=K> x[N]; // group indicator
  vector[N] y; //
}
parameters {
  vector[K] alpha;        // group means
  vector[K] beta;        // group means
  real<lower=0> sigma; // common std
}

transformed parameters {
    vector[N] mu;
    for (i in 1:N)
        mu[i] = alpha[x[i]] + beta[x[i]]*x[i];
}

model {
  y ~ normal(mu[x], sigma);
}
'''

sm_hierarchical = stan_utility.compile_model_plus(hierarchical_model_code)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_777298a12a6558ff9d2f3e08b729a76d NOW.


In [17]:
K=2
nj = 10
N = nj*K
x = np.array([i for i in range(1,K+1) for j in range(10)])
# print(x)
y = np.log(data[0:K]).ravel()  # observations
xpred=K+1

hierarchical_model_data = dict(
    N = N,
    K = K,  # 25 contries
    x = x,  # group indicators
    y = y,  # observations
    xpred=xpred
)

samples_hierarchical = sm_hierarchical.sampling(n_jobs=1, data=hierarchical_model_data)

print('Completed - Hierarchial model')

Completed - Hierarchial model


## How Stan model is run

### Separate
Given that we are using a separate model, we created a Stan model which is run once on the data of each country. This gives us K predictions, one for each analysed country.

### Hierarchical
In the hierarchical model we created a single Stan model to analyse the whole dataset at one, considering each parameter `alpha` (respectively `beta`) to be following a common hyperdistribution across all the countries. 

In [None]:
plt.hist(np.exp(samples['ypred']), 50)
plt.xlabel('y-prediction for x={}'.format(2011))
plt.show()

In [None]:
color_scatter = 'C0'  # 'C0' for default color #0
color_line = 'C1'     # 'C1' for default color #1

color_shade = (
    1 - 0.1*(1 - np.array(mpl.colors.to_rgb(color_line)))
)

plt.fill_between(
    x[0],
    np.percentile(samples['mu'], 5, axis=0),
    np.percentile(samples['mu'], 95, axis=0),
    color=color_shade
)

plt.plot(
    x[0],
    np.percentile(samples['mu'], 50, axis=0),
    color=color_line,
    linewidth=1
)

plt.scatter(x[0], y, 5, color=color_scatter)
plt.xlim((2001, 2010))
plt.tight_layout()
plt.show()

## Convergence analysis
### Separate model

In [16]:
for convergence in samples:
    print(convergence)

Inference for Stan model: anon_model_570544391b1e440e4aa07d2bfc218092.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha       71.85    5.91   56.7 -40.11  38.11  72.28  107.0 188.67     92   1.06
beta        -0.04  2.9e-3   0.03   -0.1  -0.06  -0.04  -0.02   0.02     92   1.06
sigma        0.25  8.6e-3   0.08   0.14   0.19   0.23   0.28   0.47     92   1.04
mu[0]        -7.7    0.01   0.15  -8.01  -7.79   -7.7  -7.61  -7.41    121   1.04
mu[1]       -7.74    0.01   0.13   -8.0  -7.82  -7.74  -7.66  -7.49    140   1.04
mu[2]       -7.78  8.0e-3   0.11   -8.0  -7.84  -7.78  -7.72  -7.57    183   1.03
mu[3]       -7.82  4.9e-3   0.09   -8.0  -7.88  -7.82  -7.77  -7.63    350   1.01
mu[4]       -7.86  1.6e-3   0.08  -8.02  -7.91  -7.86  -7.81  -7.69   2513    1.0
mu[5]        -7.9  1.4e-3   0.08  -8.07  -7.95   -7.9  -7.85  -7.74   337

Inference for Stan model: anon_model_570544391b1e440e4aa07d2bfc218092.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha      -177.1   10.19  59.42 -280.5 -225.8 -180.9 -137.7 -50.42     34   1.09
beta         0.08  5.1e-3   0.03   0.02   0.06   0.09   0.11   0.14     34   1.09
sigma        0.28    0.01   0.08   0.17   0.23   0.25   0.32   0.46     35    1.1
mu[0]       -7.92    0.02   0.16  -8.22  -8.03  -7.93  -7.82  -7.58     45   1.06
mu[1]       -7.84    0.02   0.14  -8.09  -7.92  -7.85  -7.75  -7.55     55   1.05
mu[2]       -7.75    0.01   0.12  -7.97  -7.83  -7.76  -7.68  -7.51     95   1.03
mu[3]       -7.67  6.8e-3    0.1  -7.86  -7.73  -7.67  -7.61  -7.46    220   1.02
mu[4]       -7.58  1.7e-3   0.09  -7.77  -7.64  -7.58  -7.53   -7.4   3018    1.0
mu[5]        -7.5  1.7e-3   0.09  -7.68  -7.56   -7.5  -7.44  -7.32   291

Inference for Stan model: anon_model_570544391b1e440e4aa07d2bfc218092.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha      -197.5    8.15   95.7 -381.3 -260.0 -200.0 -135.6  -3.38    138   1.03
beta         0.09  4.1e-3   0.05-2.1e-3   0.06    0.1   0.13   0.19    138   1.03
sigma        0.41    0.01   0.14   0.24   0.32   0.38   0.46   0.79    163   1.01
mu[0]       -7.85    0.02   0.26  -8.36  -8.01  -7.85   -7.7  -7.33    195   1.02
mu[1]       -7.75    0.01   0.22  -8.18  -7.89  -7.76  -7.62  -7.31    231   1.02
mu[2]       -7.66    0.01   0.18  -8.01  -7.77  -7.66  -7.55  -7.28    317   1.01
mu[3]       -7.56  6.3e-3   0.16  -7.86  -7.66  -7.56  -7.47  -7.25    617    1.0
mu[4]       -7.47  2.2e-3   0.14  -7.74  -7.55  -7.47  -7.38  -7.19   4000    1.0
mu[5]       -7.37  2.2e-3   0.14  -7.65  -7.46  -7.37  -7.29   -7.1   400

Inference for Stan model: anon_model_570544391b1e440e4aa07d2bfc218092.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha       29.49    4.68  39.72 -54.41   4.99  30.01  55.99 106.46     72   1.04
beta        -0.02  2.3e-3   0.02  -0.06  -0.03  -0.02-6.4e-3   0.02     72   1.04
sigma        0.18  7.5e-3   0.06   0.11   0.14   0.17   0.21   0.34     68   1.09
mu[0]       -7.76  9.3e-3   0.11  -7.98  -7.83  -7.76   -7.7  -7.56    134   1.02
mu[1]       -7.78  7.1e-3   0.09  -7.97  -7.84  -7.78  -7.73  -7.61    167   1.02
mu[2]        -7.8  4.9e-3   0.08  -7.97  -7.85   -7.8  -7.75  -7.65    257   1.01
mu[3]       -7.82  2.6e-3   0.07  -7.96  -7.86  -7.82  -7.78  -7.68    717    1.0
mu[4]       -7.84  1.1e-3   0.06  -7.97  -7.88  -7.84   -7.8  -7.72   3617    1.0
mu[5]       -7.86  1.5e-3   0.07  -7.99   -7.9  -7.86  -7.82  -7.73   193

Inference for Stan model: anon_model_570544391b1e440e4aa07d2bfc218092.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha         0.6   10.09  81.99 -171.5 -50.53   2.32  54.87 158.79     66   1.08
beta      -4.4e-3  5.0e-3   0.04  -0.08  -0.03-5.3e-3   0.02   0.08     66   1.08
sigma        0.34  7.9e-3   0.11    0.2   0.27   0.32   0.39    0.6    184   1.01
mu[0]       -8.21    0.02   0.21  -8.65  -8.34   -8.2  -8.08  -7.81     94   1.05
mu[1]       -8.22    0.02   0.18  -8.59  -8.32  -8.21  -8.11  -7.87    112   1.04
mu[2]       -8.22    0.01   0.15  -8.53  -8.31  -8.22  -8.13  -7.92    157   1.03
mu[3]       -8.23  6.9e-3   0.13  -8.49   -8.3  -8.22  -8.15  -7.97    331   1.01
mu[4]       -8.23  2.0e-3   0.11  -8.46   -8.3  -8.23  -8.16   -8.0   3104    1.0
mu[5]       -8.23  3.1e-3   0.11  -8.46   -8.3  -8.24  -8.17   -8.0   134

Inference for Stan model: anon_model_570544391b1e440e4aa07d2bfc218092.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha      268.85   19.27 194.64 -103.8 153.03 264.14 384.13 670.64    102    1.0
beta        -0.14  9.6e-3    0.1  -0.34   -0.2  -0.14  -0.08   0.05    102    1.0
sigma        0.88    0.02   0.29   0.51   0.69   0.82    1.0   1.63    163   1.02
mu[0]       -7.55    0.04   0.53  -8.58  -7.88  -7.57  -7.24  -6.47    156    1.0
mu[1]       -7.69    0.03   0.45  -8.55  -7.97  -7.71  -7.42  -6.79    191    1.0
mu[2]       -7.83    0.02   0.38  -8.58  -8.06  -7.84   -7.6  -7.07    275    1.0
mu[3]       -7.97    0.01   0.33  -8.63  -8.17  -7.97  -7.77  -7.31    650    1.0
mu[4]       -8.11  5.5e-3    0.3  -8.73  -8.28   -8.1  -7.93  -7.52   3020    1.0
mu[5]       -8.25  6.3e-3    0.3  -8.85  -8.42  -8.24  -8.07  -7.66   232

### Hierarchial Model

In [18]:
samples_hierarchical

Inference for Stan model: anon_model_777298a12a6558ff9d2f3e08b729a76d.
4 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=20000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha[0] 1359.3  2651.6 3750.0  -4431  -1993 1195.6 5339.5 6707.6      2   7.26
alpha[1] -1.9e4   2.4e4  4.9e4 -1.7e5 -4.1e4  -8214  1.2e4  5.5e4      4   1.94
beta[0]   -1367  2651.6 3750.0  -6715  -5347  -1203 1985.3 4422.8      2   7.26
beta[1]  -1.5e4   4.4e4  9.9e4 -3.9e5 -1.1e4 662.02  3.0e4  7.8e4      5   1.42
sigma      0.52    0.05   0.11   0.35   0.43   0.51   0.58   0.81      5    1.6
mu[0]     -8.18  8.7e-4   0.12  -8.42  -8.25  -8.18   -8.1  -7.94  19410    1.0
mu[1]     -8.18  8.7e-4   0.12  -8.42  -8.25  -8.18   -8.1  -7.94  19410    1.0
mu[2]     -8.18  8.7e-4   0.12  -8.42  -8.25  -8.18   -8.1  -7.94  19410    1.0
mu[3]     -8.18  8.7e-4   0.12  -8.42  -8.25  -8.18   -8.1  -7.94  19410    1.0
mu[4]    

## Posterior predictive checking
In order to check how well the different models are fitting the data, we will use leave-one-out (LOO) cross validation method. In particular, we will use PSIS-LOO (Pareto smoothed importance sampling LOO) code for computing approximate LOO-CV. 

## Model comparison