fix bug in loo and p_loo computation, also return waic and loo result as dataframes #1765

aloctavodia · 2017-02-09T13:02:30Z

While trying to address what we discuss in #1677 I found that a previous PR introduced an error in the computation of p_loo (a variable was undefined). This PR fix it. I also found a bug in the computation of lppd_loo related to the unsorting of the importance ratios. Now the results of pm.loo are closer to those reported for the 8 school problem in the papers "Understanding predictive information criteria for Bayesian models" and "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC".

…s as dataframes

twiecki · 2017-02-09T13:14:03Z

pymc3/stats.py

-    elif pointwise:
-        return waic, waic_se, waic_i, p_waic
+    if pointwise:
+        return pd.DataFrame([[waic, waic_se, p_waic, waic_i]],


This should be a pd.Series I think (or maybe a namedtuple).

Yeah, I think namedtuple would be best here as we also use it for e.g. ADVI.

I thought the same, but Jupyter renders DataFrame in a nicer way. And I am also thinking this is more consistent with a compare function (that I need to write) that will display waic/loo results for several models as a DataFrame.

@twiecki still thinking namedtuple is the way to go? If so let me now and I will change it.

Your argument is purely for display purposes? I'm not convinced that should guide API and data-structure choices.

It's possible to convert, but a bit clunky:

from collections import namedtuple Point = namedtuple('test', ['x', 'y']) p = Point(1, 2) pd.Series(p._asdict()).to_frame()

yep purely aesthetics. OK. I will use namedtuples. Thanks for the review.

twiecki · 2017-02-09T13:14:26Z

pymc3/stats.py

    w = np.minimum(r_new, r_new.mean(axis=0) * S**0.75)

-    loo_lppd_i = -2.0 * logsumexp(log_py, axis = 0, b = w / np.sum(w, axis = 0))
+    loo_lppd_i = - 2. * logsumexp(log_py, axis=0, b=w/np.sum(w, axis=0))


b=w / np.sum...

Spaces around math operators.

ups! I will fix it

twiecki · 2017-02-09T20:42:56Z

Thanks @aloctavodia!

fix bug in loo and p_loo computation, also return waic and loo result…

363084f

…s as dataframes

twiecki reviewed Feb 9, 2017

View reviewed changes

aloctavodia added 2 commits February 9, 2017 10:33

autopep8

07402b1

use namedtuple instead of dataframe, update model comparison example

6c23f77

twiecki merged commit 39ee53b into pymc-devs:master Feb 9, 2017

aloctavodia deleted the waic_loo branch February 9, 2017 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix bug in loo and p_loo computation, also return waic and loo result as dataframes #1765

fix bug in loo and p_loo computation, also return waic and loo result as dataframes #1765

Uh oh!

aloctavodia commented Feb 9, 2017

Uh oh!

twiecki Feb 9, 2017

Uh oh!

twiecki Feb 9, 2017

Uh oh!

aloctavodia Feb 9, 2017

Uh oh!

aloctavodia Feb 9, 2017

Uh oh!

twiecki Feb 9, 2017

Uh oh!

twiecki Feb 9, 2017

Uh oh!

aloctavodia Feb 9, 2017

Uh oh!

twiecki Feb 9, 2017

Uh oh!

twiecki Feb 9, 2017

Uh oh!

aloctavodia Feb 9, 2017

Uh oh!

twiecki commented Feb 9, 2017

Uh oh!

Uh oh!

fix bug in loo and p_loo computation, also return waic and loo result as dataframes #1765

fix bug in loo and p_loo computation, also return waic and loo result as dataframes #1765

Uh oh!

Conversation

aloctavodia commented Feb 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki commented Feb 9, 2017

Uh oh!

Uh oh!