-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix bug in loo and p_loo computation, also return waic and loo result as dataframes #1765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pymc3/stats.py
Outdated
elif pointwise: | ||
return waic, waic_se, waic_i, p_waic | ||
if pointwise: | ||
return pd.DataFrame([[waic, waic_se, p_waic, waic_i]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a pd.Series I think (or maybe a namedtuple).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think namedtuple would be best here as we also use it for e.g. ADVI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same, but Jupyter renders DataFrame in a nicer way. And I am also thinking this is more consistent with a compare function (that I need to write) that will display waic/loo results for several models as a DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twiecki still thinking namedtuple is the way to go? If so let me now and I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your argument is purely for display purposes? I'm not convinced that should guide API and data-structure choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to convert, but a bit clunky:
from collections import namedtuple
Point = namedtuple('test', ['x', 'y'])
p = Point(1, 2)
pd.Series(p._asdict()).to_frame()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep purely aesthetics. OK. I will use namedtuples. Thanks for the review.
pymc3/stats.py
Outdated
w = np.minimum(r_new, r_new.mean(axis=0) * S**0.75) | ||
|
||
loo_lppd_i = -2.0 * logsumexp(log_py, axis = 0, b = w / np.sum(w, axis = 0)) | ||
loo_lppd_i = - 2. * logsumexp(log_py, axis=0, b=w/np.sum(w, axis=0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b=w / np.sum...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces around math operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups! I will fix it
Thanks @aloctavodia! |
While trying to address what we discuss in #1677 I found that a previous PR introduced an error in the computation of
p_loo
(a variable was undefined). This PR fix it. I also found a bug in the computation of lppd_loo related to the unsorting of the importance ratios. Now the results ofpm.loo
are closer to those reported for the 8 school problem in the papers "Understanding predictive information criteria for Bayesian models" and "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC".