-
Notifications
You must be signed in to change notification settings - Fork 2.1k
ENH refactor diagnostics to not depend on generic ValueErrors #1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH refactor diagnostics to not depend on generic ValueErrors #1425
Conversation
# iterate over tuples of indices of the shape of var | ||
inds = [y.ravel().tolist() for y in np.indices(x.shape[:-2])] | ||
for tup in zip(*inds): # iterate with zip | ||
_n_eff[tup] = get_neff(x[tup], Vhat[tup]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section above needs some eyes. I wrote test for the output shape, but that may not be sufficient to ensure this code is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For variables with very large sizes, this way of building the inds is not great since they all have to be put into memory. However, I don't expect this to be so common but what do I know...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well who knew this was in the code base!
https://github.com/pymc-devs/pymc3/blob/master/pymc3/stats.py#L236
We might all benefit from using np.ndindex!
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndindex.html
Also, they n_eff computation should be done with an FFT since it is slow for large chain lengths. This is another PR however. |
I am looking for a halfway review here. Is this contribution welcome? Any requested changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice!
I'll check the section you marked more carefully and approve or comment. My only suggestion would be to use descriptive variable names instead of comments to document the code. I marked one spot where that might happen.
pymc3/tests/test_diagnostics.py
Outdated
n_effective = effective_n(ptrace)['x'] | ||
assert_allclose(n_effective, n_jobs * n_samples, 2) | ||
|
||
def test_effective_n_right_shape_tesnor(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I found one more too!
pymc3/diagnostics.py
Outdated
def get_neff(x, Vhat): | ||
# number of chains is last dim | ||
# chain samples are second to last dim | ||
m = x.shape[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename m
to number_of_chains
, and remove comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a bunch of changes like this.
pymc3/diagnostics.py
Outdated
|
||
variogram = lambda t: (sum(sum((x[j][i] - x[j][i - t])**2 | ||
for i in range(t, n)) for j in range(m)) / (m * (n - t))) | ||
def variogram(_t): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be written as a matrix operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to use np.mean with the proper array offsets.
I added a bunch more tests of the array handling as well. Hopefully there won't be any surprises! |
One more thing. I went ahead reduced the number of MCMC samples for the shape tests in order to keep the test run time about the same as before even though we are doing more tests. |
Whoops. A bug in the tests. :( This is why we write tests after all! I also updated the test strings to make them look nicer on travis. py.test shows the full path but nose just shows the doc string apparently. |
Also, with the change to computing the variogram with np.mean, the neff comp is way more efficient now! (I am looking at 10k step chains). I think passing on the FFTs would be ok for the time being. |
@fonnesbeck Before this potentially gets merged, I wanted to ask. Is there a reason you didn't go with a more numpy-heavy approach from the start? Are these functions supposed to work on inputs that may not be numpy arrays? |
From the Geweke score function, I think the answer is no, but always good to ask! |
This looks great! Poked around, and seems like this is identical on most cases, and correct on some cases that the old function failed on, like |
How big are the differences from the old version? They should be at most floating point things. Anything bigger is a bug most likely. |
There were no differences (beyond returning a float instead of an int). I did what you did -- copied the old version and ran the new test suite, On Wed, Oct 5, 2016 at 9:18 AM Matthew R. Becker notifications@github.com
|
@beckermr these were ported over from PyMC2, which accepted lists and tuples as well as arrays. So, these changes are appropriate for PyMC3. |
I have eliminated the recursive function calls with the try except statements in the diagnostics. I had a weird thing where my debugger caught one (though I cannot reproduce that now). Furthermore, they are fragile to actual value errors cause by other things in the code. Finally, the control flow of the code was very difficult to follow. I added tests for some of the fancier numpy indexing. I did some PEP8-ing as well.