Skip to content

Conversation

beckermr
Copy link
Contributor

@beckermr beckermr commented Oct 3, 2016

I have eliminated the recursive function calls with the try except statements in the diagnostics. I had a weird thing where my debugger caught one (though I cannot reproduce that now). Furthermore, they are fragile to actual value errors cause by other things in the code. Finally, the control flow of the code was very difficult to follow. I added tests for some of the fancier numpy indexing. I did some PEP8-ing as well.

# iterate over tuples of indices of the shape of var
inds = [y.ravel().tolist() for y in np.indices(x.shape[:-2])]
for tup in zip(*inds): # iterate with zip
_n_eff[tup] = get_neff(x[tup], Vhat[tup])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section above needs some eyes. I wrote test for the output shape, but that may not be sufficient to ensure this code is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For variables with very large sizes, this way of building the inds is not great since they all have to be put into memory. However, I don't expect this to be so common but what do I know...

Copy link
Contributor Author

@beckermr beckermr Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckermr
Copy link
Contributor Author

beckermr commented Oct 3, 2016

Also, they n_eff computation should be done with an FFT since it is slow for large chain lengths. This is another PR however.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 84.6% when pulling f565c6f on beckermr:remove-valuerrors-diagnostics into 02ef44f on pymc-devs:master.

@beckermr
Copy link
Contributor Author

beckermr commented Oct 3, 2016

I am looking for a halfway review here. Is this contribution welcome? Any requested changes?

Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

I'll check the section you marked more carefully and approve or comment. My only suggestion would be to use descriptive variable names instead of comments to document the code. I marked one spot where that might happen.

n_effective = effective_n(ptrace)['x']
assert_allclose(n_effective, n_jobs * n_samples, 2)

def test_effective_n_right_shape_tesnor(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I found one more too!

def get_neff(x, Vhat):
# number of chains is last dim
# chain samples are second to last dim
m = x.shape[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename m to number_of_chains, and remove comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a bunch of changes like this.


variogram = lambda t: (sum(sum((x[j][i] - x[j][i - t])**2
for i in range(t, n)) for j in range(m)) / (m * (n - t)))
def variogram(_t):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be written as a matrix operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to use np.mean with the proper array offsets.

@beckermr
Copy link
Contributor Author

beckermr commented Oct 4, 2016

I added a bunch more tests of the array handling as well. Hopefully there won't be any surprises!

@beckermr
Copy link
Contributor Author

beckermr commented Oct 4, 2016

One more thing. I went ahead reduced the number of MCMC samples for the shape tests in order to keep the test run time about the same as before even though we are doing more tests.

@beckermr
Copy link
Contributor Author

beckermr commented Oct 4, 2016

Whoops. A bug in the tests. :( This is why we write tests after all!

I also updated the test strings to make them look nicer on travis. py.test shows the full path but nose just shows the doc string apparently.

@beckermr
Copy link
Contributor Author

beckermr commented Oct 4, 2016

Also, with the change to computing the variogram with np.mean, the neff comp is way more efficient now! (I am looking at 10k step chains). I think passing on the FFTs would be ok for the time being.

@beckermr
Copy link
Contributor Author

beckermr commented Oct 4, 2016

@fonnesbeck Before this potentially gets merged, I wanted to ask. Is there a reason you didn't go with a more numpy-heavy approach from the start? Are these functions supposed to work on inputs that may not be numpy arrays?

@beckermr
Copy link
Contributor Author

beckermr commented Oct 4, 2016

From the Geweke score function, I think the answer is no, but always good to ask!

@ColCarroll
Copy link
Member

This looks great! Poked around, and seems like this is identical on most cases, and correct on some cases that the old function failed on, like test_effective_n_right_shape_scalar_tuple. Also, this new version returns a float for effective_n, which was advertised in the doc string anyways.

@ColCarroll ColCarroll merged commit a70187c into pymc-devs:master Oct 5, 2016
@beckermr
Copy link
Contributor Author

beckermr commented Oct 5, 2016

How big are the differences from the old version? They should be at most floating point things. Anything bigger is a bug most likely.

@ColCarroll
Copy link
Member

There were no differences (beyond returning a float instead of an int).

I did what you did -- copied the old version and ran the new test suite,
adding a naive equality test in each of the cases. "Naive" meaning
asserting exact equality of the return values (or assertTrue((old == new).all()) in case of arrays). In every case where both completed, the
equality check passed.

On Wed, Oct 5, 2016 at 9:18 AM Matthew R. Becker notifications@github.com
wrote:

How big are the differences from the old version? They should be at most
floating point things. Anything bigger is a bug most likely.


You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub
#1425 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACMHEHQB7C3Ht4nL2h6JcBqdH-5x1hoqks5qw6OkgaJpZM4KNG78
.

@beckermr beckermr deleted the remove-valuerrors-diagnostics branch October 5, 2016 13:26
@fonnesbeck
Copy link
Member

@beckermr these were ported over from PyMC2, which accepted lists and tuples as well as arrays. So, these changes are appropriate for PyMC3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants