ENH refactor diagnostics to not depend on generic ValueErrors #1425

beckermr · 2016-10-03T22:22:16Z

I have eliminated the recursive function calls with the try except statements in the diagnostics. I had a weird thing where my debugger caught one (though I cannot reproduce that now). Furthermore, they are fragile to actual value errors cause by other things in the code. Finally, the control flow of the code was very difficult to follow. I added tests for some of the fancier numpy indexing. I did some PEP8-ing as well.

beckermr · 2016-10-03T22:25:12Z

pymc3/diagnostics.py

+        # iterate over tuples of indices of the shape of var
+        inds = [y.ravel().tolist() for y in np.indices(x.shape[:-2])]
+        for tup in zip(*inds):  # iterate with zip
+            _n_eff[tup] = get_neff(x[tup], Vhat[tup])


This section above needs some eyes. I wrote test for the output shape, but that may not be sufficient to ensure this code is correct.

For variables with very large sizes, this way of building the inds is not great since they all have to be put into memory. However, I don't expect this to be so common but what do I know...

Well who knew this was in the code base!

https://github.com/pymc-devs/pymc3/blob/master/pymc3/stats.py#L236

We might all benefit from using np.ndindex!

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndindex.html

beckermr · 2016-10-03T22:33:58Z

Also, they n_eff computation should be done with an FFT since it is slow for large chain lengths. This is another PR however.

coveralls · 2016-10-03T22:41:36Z

Coverage increased (+0.2%) to 84.6% when pulling f565c6f on beckermr:remove-valuerrors-diagnostics into 02ef44f on pymc-devs:master.

beckermr · 2016-10-03T23:00:19Z

I am looking for a halfway review here. Is this contribution welcome? Any requested changes?

ColCarroll

Looks nice!

I'll check the section you marked more carefully and approve or comment. My only suggestion would be to use descriptive variable names instead of comments to document the code. I marked one spot where that might happen.

ColCarroll · 2016-10-03T22:26:59Z

pymc3/tests/test_diagnostics.py

        n_effective = effective_n(ptrace)['x']
        assert_allclose(n_effective, n_jobs * n_samples, 2)
+
+    def test_effective_n_right_shape_tesnor(self):


nit: tensor

Done. I found one more too!

ColCarroll · 2016-10-03T23:50:48Z

pymc3/diagnostics.py

+    def get_neff(x, Vhat):
+        # number of chains is last dim
+        # chain samples are second to last dim
+        m = x.shape[-1]


rename m to number_of_chains, and remove comment?

I made a bunch of changes like this.

ColCarroll · 2016-10-03T23:55:43Z

pymc3/diagnostics.py

-
-        variogram = lambda t: (sum(sum((x[j][i] - x[j][i - t])**2
-                                       for i in range(t, n)) for j in range(m)) / (m * (n - t)))
+        def variogram(_t):


Can this be written as a matrix operation?

I moved this to use np.mean with the proper array offsets.

beckermr · 2016-10-04T02:22:57Z

I added a bunch more tests of the array handling as well. Hopefully there won't be any surprises!

beckermr · 2016-10-04T02:26:22Z

One more thing. I went ahead reduced the number of MCMC samples for the shape tests in order to keep the test run time about the same as before even though we are doing more tests.

beckermr · 2016-10-04T02:59:20Z

Whoops. A bug in the tests. :( This is why we write tests after all!

I also updated the test strings to make them look nicer on travis. py.test shows the full path but nose just shows the doc string apparently.

beckermr · 2016-10-04T03:04:43Z

Also, with the change to computing the variogram with np.mean, the neff comp is way more efficient now! (I am looking at 10k step chains). I think passing on the FFTs would be ok for the time being.

beckermr · 2016-10-04T03:31:43Z

@fonnesbeck Before this potentially gets merged, I wanted to ask. Is there a reason you didn't go with a more numpy-heavy approach from the start? Are these functions supposed to work on inputs that may not be numpy arrays?

beckermr · 2016-10-04T03:33:25Z

From the Geweke score function, I think the answer is no, but always good to ask!

ColCarroll · 2016-10-05T12:22:00Z

This looks great! Poked around, and seems like this is identical on most cases, and correct on some cases that the old function failed on, like test_effective_n_right_shape_scalar_tuple. Also, this new version returns a float for effective_n, which was advertised in the doc string anyways.

beckermr · 2016-10-05T13:18:26Z

How big are the differences from the old version? They should be at most floating point things. Anything bigger is a bug most likely.

ColCarroll · 2016-10-05T13:23:27Z

There were no differences (beyond returning a float instead of an int).

I did what you did -- copied the old version and ran the new test suite,
adding a naive equality test in each of the cases. "Naive" meaning
asserting exact equality of the return values (or assertTrue((old == new).all()) in case of arrays). In every case where both completed, the
equality check passed.

On Wed, Oct 5, 2016 at 9:18 AM Matthew R. Becker notifications@github.com
wrote:

How big are the differences from the old version? They should be at most
floating point things. Anything bigger is a bug most likely.

—
You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub
#1425 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACMHEHQB7C3Ht4nL2h6JcBqdH-5x1hoqks5qw6OkgaJpZM4KNG78
.

fonnesbeck · 2016-10-05T13:26:28Z

@beckermr these were ported over from PyMC2, which accepted lists and tuples as well as arrays. So, these changes are appropriate for PyMC3.

ENH refactor diagnostics to not raise generic ValueErrors

f565c6f

beckermr commented Oct 3, 2016

View reviewed changes

ColCarroll reviewed Oct 3, 2016

View reviewed changes

ENH updates to code to improve style and performance; more tests

92b9e07

BUG fixed bug in tests; made doc strings clearer

8711fb2

ENH move to using np.ndindex in effective_n comp

c183081

ColCarroll merged commit a70187c into pymc-devs:master Oct 5, 2016

beckermr deleted the remove-valuerrors-diagnostics branch October 5, 2016 13:26

ENH refactor diagnostics to not depend on generic ValueErrors #1425

ENH refactor diagnostics to not depend on generic ValueErrors #1425

Uh oh!

Conversation

beckermr commented Oct 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beckermr Oct 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beckermr commented Oct 3, 2016

Uh oh!

coveralls commented Oct 3, 2016

Uh oh!

beckermr commented Oct 3, 2016

Uh oh!

ColCarroll left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beckermr commented Oct 4, 2016

Uh oh!

beckermr commented Oct 4, 2016

Uh oh!

beckermr commented Oct 4, 2016

Uh oh!

beckermr commented Oct 4, 2016

Uh oh!

beckermr commented Oct 4, 2016

Uh oh!

beckermr commented Oct 4, 2016

Uh oh!

ColCarroll commented Oct 5, 2016

Uh oh!

beckermr commented Oct 5, 2016

Uh oh!

ColCarroll commented Oct 5, 2016

Uh oh!

fonnesbeck commented Oct 5, 2016

Uh oh!

Uh oh!

beckermr Oct 4, 2016 •

edited

Loading