In [3]:
import data
import models
import cache
import stanity
import seaborn as sns
%matplotlib inline

In [4]:
sns.set(context='talk')

In [5]:
import logging
cache_logger = logging.getLogger('cache')
cache_logger.setLevel(logging.INFO)

In [6]:
by = 'cell_type'
sample_n = 500

## load data

Load data as we did in previous notebooks, so we can compare predictive performance for particular observations

In [7]:
sample_df = cache.cached(models.prep_sample_df, sample_n=sample_n)

INFO:cache:prep_sample_df: cache_filename set to prep_sample_df.cached.sample_n_31194724242.pkl
INFO:cache:prep_sample_df: Loading result from cache


In [8]:
stan_data = models.prep_stan_data(sample_df, by=by)

## model3 -- using poisson distribution

Load results from model3 from cache

In [9]:
model3 = models.get_model_file(model_name='model3')
fit3 = models.cached_stan_fit(file=model3,
                              fit_cachefile='model3.model_code_73307426795.stanfit.chains_81232192355.data_35729880489.iter_31194724242.seed_57902074806.pkl')

INFO:cache:Step 1: Get compiled model code, possibly from cache
INFO:cache:StanModel: cache_filename set to model3.model_code_73307426795.stanmodel.pkl
INFO:cache:StanModel: Loading result from cache
INFO:cache:Step 2: Get posterior draws from model, possibly from cache
INFO:cache:sampling: cache_filename set to model3.model_code_73307426795.stanfit.chains_81232192355.data_35729880489.iter_31194724242.seed_57902074806.pkl
INFO:cache:sampling: Loading result from cache
INFO:cache:Fit cachefile: model3.model_code_73307426795.stanfit.chains_81232192355.data_35729880489.iter_31194724242.seed_57902074806.pkl


In [10]:
models.print_stan_summary(fit3, pars='lp__')

              mean   se_mean         sd          2.5%           50%         97.5%      Rhat
lp__  5.281721e+07  1.534292  30.609025  5.281715e+07  5.281721e+07  5.281727e+07  1.014739


## model4 -- using negative binomial dist

Load results from fitting model4 from cache

In [11]:
model4 = models.get_model_file(model_name='model4')
fit4 = models.cached_stan_fit(file=model4,
                              fit_cachefile='model4.model_code_35447780597.stanfit.chains_81232192355.data_35729880489.iter_31194724242.seed_57902074806.pkl'
                             )

INFO:cache:Step 1: Get compiled model code, possibly from cache
INFO:cache:StanModel: cache_filename set to model4.model_code_35447780597.stanmodel.pkl
INFO:cache:StanModel: Loading result from cache
INFO:cache:Step 2: Get posterior draws from model, possibly from cache
INFO:cache:sampling: cache_filename set to model4.model_code_35447780597.stanfit.chains_81232192355.data_35729880489.iter_31194724242.seed_57902074806.pkl
INFO:cache:sampling: Loading result from cache
INFO:cache:Fit cachefile: model4.model_code_35447780597.stanfit.chains_81232192355.data_35729880489.iter_31194724242.seed_57902074806.pkl


In [12]:
models.print_stan_summary(fit4, pars='lp__')

              mean   se_mean         sd          2.5%           50%         97.5%      Rhat
lp__  5.383965e+07  2.126517  36.954924  5.383958e+07  5.383966e+07  5.383972e+07  1.002845


## Summarize LOO-psis for each model

Summarize Leave-One-Out (LOO) predictive performance for model3 & model4, using pareto-smoothed importance sampling (PSIS) method to approximate true CV performance.

In [13]:
loo3 = cache.cached(stanity.psisloo, log_likelihood=fit3.extract('log_lik')['log_lik'],
                   cache_filename='model3_psisloo.by_{}.sample_n_{}.pkl'.format(by, sample_n),
                   force=False)

INFO:cache:psisloo: cache_filename set to model3_psisloo.by_cell_type.sample_n_500.pkl
INFO:cache:psisloo: Starting execution
  elif sort == 'in-place':
  bs /= 3 * x[sort[np.floor(n/4 + 0.5) - 1]]
INFO:cache:psisloo: Execution completed (0:00:42.462509 elapsed)
INFO:cache:psisloo: Saving results to cache


In [18]:
loo3.print_summary()

greater than 0.5    0.163746
greater than 1      0.088190
dtype: float64

The loo-summary for fit3 also suggests a poor model fit. We expect (hope) to have no more than ~ 5% of observations > 0.5, and even fewer greater than 1. These aren't hard endpoints; they are guidelines.

The interpretation is along the lines that:

1. There are too many observations exerting strong influence over our fit results, suggesting our model may be mis-parameterized by failing to account for these "extreme" values (extreme relative to model expectations)
2. This influence invalidates the approximation to LOO performance, which we will be runing next. Essentially, performance would be worse for these observations than estimated by this approximation.

In [15]:
loo4 = cache.cached(stanity.psisloo, log_likelihood=fit4.extract('log_lik')['log_lik'],
                   cache_filename='model4_psisloo.by_{}.sample_n_{}.pkl'.format(by, sample_n),
                   force=False)

INFO:cache:psisloo: cache_filename set to model4_psisloo.by_cell_type.sample_n_500.pkl
INFO:cache:psisloo: Starting execution
  elif sort == 'in-place':
  bs /= 3 * x[sort[np.floor(n/4 + 0.5) - 1]]
INFO:cache:psisloo: Execution completed (0:00:40.177454 elapsed)
INFO:cache:psisloo: Saving results to cache


In [16]:
loo4.print_summary()

greater than 0.5    0.010730
greater than 1      0.001048
dtype: float64

These proportions are more in line with our expectations.

## Compare fit using model3 & model4

In [17]:
stanity.loo_compare(loo3, loo4)

{'diff': 1061108.3575595515, 'se_diff': 47160.146922995482}

A larger value suggests that model4 is a better fit than model3. How much better can be evaluated intuitively by comparing the magnitude of the difference to the SE of the difference.

The short answer for this comparison is .. *much* better.