Skip to content

Conversation

aloctavodia
Copy link
Member

While trying to address what we discuss in #1677 I found that a previous PR introduced an error in the computation of p_loo (a variable was undefined). This PR fix it. I also found a bug in the computation of lppd_loo related to the unsorting of the importance ratios. Now the results of pm.loo are closer to those reported for the 8 school problem in the papers "Understanding predictive information criteria for Bayesian models" and "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC".

pymc3/stats.py Outdated
elif pointwise:
return waic, waic_se, waic_i, p_waic
if pointwise:
return pd.DataFrame([[waic, waic_se, p_waic, waic_i]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a pd.Series I think (or maybe a namedtuple).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think namedtuple would be best here as we also use it for e.g. ADVI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same, but Jupyter renders DataFrame in a nicer way. And I am also thinking this is more consistent with a compare function (that I need to write) that will display waic/loo results for several models as a DataFrame.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki still thinking namedtuple is the way to go? If so let me now and I will change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your argument is purely for display purposes? I'm not convinced that should guide API and data-structure choices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to convert, but a bit clunky:

from collections import namedtuple
Point = namedtuple('test', ['x', 'y'])
p = Point(1, 2)
pd.Series(p._asdict()).to_frame()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep purely aesthetics. OK. I will use namedtuples. Thanks for the review.

pymc3/stats.py Outdated
w = np.minimum(r_new, r_new.mean(axis=0) * S**0.75)

loo_lppd_i = -2.0 * logsumexp(log_py, axis = 0, b = w / np.sum(w, axis = 0))
loo_lppd_i = - 2. * logsumexp(log_py, axis=0, b=w/np.sum(w, axis=0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b=w / np.sum...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces around math operators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups! I will fix it

@twiecki twiecki merged commit 39ee53b into pymc-devs:master Feb 9, 2017
@twiecki
Copy link
Member

twiecki commented Feb 9, 2017

Thanks @aloctavodia!

@aloctavodia aloctavodia deleted the waic_loo branch February 9, 2017 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants