Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling Output checks and Normality tests to check for bugs #1618

Merged
merged 3 commits into from Dec 31, 2016

Conversation

springcoil
Copy link
Contributor

Based on #1424 - I couldn't for some reason rebase that correctly - added in some KS-tests etc.

Needs a review.

@springcoil springcoil changed the title Added TestSampleEstimates to check for obvious poor sampling by NUTS Sampling Output checks and KS-tests to check for bugs Dec 21, 2016
@@ -39,26 +41,26 @@ class TestStepMethods(object): # yield test doesn't work subclassing unittest.T
7.04959179e-01, 8.37863464e-01, -5.24200836e-01, 1.28261340e+00, 9.08774240e-01,
8.80566763e-01, 7.82911967e-01, 8.01843432e-01, 7.09251098e-01, 5.73803618e-01]),
HamiltonianMC: np.array([
-0.74925631, -0.2566773 , -2.12480977, 1.64328926, -1.39315913,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove these from the diff?

assert np.isclose(np.median(trace.beta, 0), beta_true, rtol=0.1).all()
assert np.isclose(np.median(trace.alpha), alpha_true, rtol=0.1)
assert np.isclose(np.median(trace.sigma), sigma_true, rtol=0.1)
np.random.seed(987654321)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we inherit from SeededTest, we shouldn't need to set it again here. Sorry for being confusing.

test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
assert np.less(np.median(test_normal[1]), 0.05)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the p-values? Could we make the test more stringent, perhaps by increasing the number of samples?

Also, I think https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.testing.assert_array_less.html is better here.

assert np.isclose(np.median(trace.alpha), alpha_true, rtol=0.1)
assert np.isclose(np.median(trace.sigma), sigma_true, rtol=0.1)
np.random.seed(987654321)
test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about: _, p_normal = ..., can then drop the index below.

test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
assert np.less(np.median(test_normal[1]), 0.05)
assert np.less(np.median(test_normal_beta[1]) / 2, 0.5)
Copy link
Member

@twiecki twiecki Dec 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why / 2? and .5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was to hack it together. I'm fixing this now. Thanks for the feedback.

@springcoil
Copy link
Contributor Author

@twiecki Thanks for the feedback. I changed the tests slightly based on your suggestions, thanks. The tests are now hopefully a lot more robust. We may have to relax the conditions slightly, but they passed on my computer.

[Slice([alpha, sigma]), Metropolis([beta])]):
trace = sample(1000, step=step_method, progressbar=False)

assert np.isclose(np.median(trace.beta, 0), beta_true, rtol=0.1).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.testing.assert_array_almost_equal()

_, test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
_, test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
_, test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
np.testing.assert_array_almost_equal(np.median(test_normal), 4.9960036108132044e-15, decimal=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this testing? shouldn't those be just p-values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe something went wrong. Let me find out...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually i'm using the wrong variant of the test, https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.kstest.html - must have gotten mixed up in my experiments.

assert np.isclose(np.median(trace.sigma), sigma_true, rtol=0.1)
_, test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
_, test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
_, test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also test for a normal distribution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigma = Uniform('sigma', lower=0.0, upper=1.0) - surely I need to test for the uniform distribtion? Or am i mistaken..n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're testing the posterior which should be Normal-ish. Log-normal I guess. Maybe we'll leave that one out here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fair point :) I'll add in something normalish.

@springcoil
Copy link
Contributor Author

There were some errors caused during my rebase. Fixing now...

@springcoil
Copy link
Contributor Author

Having trouble getting the p-values to work as we intended... seems to be quite tricky.

@springcoil
Copy link
Contributor Author

I'm having trouble diagnosing this error. I'll try to write up something though - even though it's over the holidays and I'm sleep deprived.

@springcoil
Copy link
Contributor Author

springcoil commented Dec 22, 2016

Ahh I think I get it. From the scipy.stats documentation

*Testing t distributed random variables against normal distribution*

With 100 degrees of freedom the t distribution looks close to the normal
distribution, and the K-S test does not reject the hypothesis that the
sample came from the normal distribution:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(100,size=100),'norm')
(0.072018929165471257, 0.67630062862479168)

With 3 degrees of freedom the t distribution looks sufficiently different
from the normal distribution, that we can reject the hypothesis that the
sample came from the normal distribution at the 10% level:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(3,size=100),'norm')
(0.131016895759829, 0.058826222555312224)

So this is testing if the two distributions are different, the p-value being below say 10% proves that they are different distributions - not if they are the same distribution. Does anyone know another test or am I misunderstanding this?

@twiecki
Copy link
Member

twiecki commented Dec 22, 2016

Ohh, right, I forgot about that. How high are the p-values?

@springcoil
Copy link
Contributor Author

I'll include the diagnostics later - boarding a plane atm :)

@springcoil
Copy link
Contributor Author

springcoil commented Dec 22, 2016

NormaltestResult(statistic=288.8868099649228, pvalue=1.8579168299255905e-63) is the result from the Normal test for stats.normaltest(trace.beta[:,1])

And (Pdb) stats.normaltest(trace.sigma) NormaltestResult(statistic=31.217980013036858, pvalue=1.6638024983177679e-07)

Submitting an updated PR now.

@springcoil springcoil force-pushed the sampling_output_check2 branch 2 times, most recently from d59e535 to f7e3a7d Compare December 22, 2016 15:55
@springcoil
Copy link
Contributor Author

@twiecki Ready to merge?

# See `https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352`
_, test_normal_beta = stats.normaltest(trace.beta[:,1])
_, test_uniform = stats.normaltest(trace.sigma)
np.testing.assert_array_less(test_normal_beta, 0.005)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I thought we realized that these should not be significant, so this means the posteriors are not normal, indicating a problem with our sampling. CC @fonnesbeck @jsalvatier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html: "This function tests the null hypothesis that a sample comes from a normal distribution."

@springcoil
Copy link
Contributor Author

springcoil commented Dec 26, 2016 via email

@springcoil
Copy link
Contributor Author

springcoil commented Dec 26, 2016 via email


for step_method in (NUTS(), Metropolis(),
[Slice([alpha, sigma]), Metropolis([beta])]):
trace = sample(1000, step=step_method, progressbar=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would draw more samples and add generous burn-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more generous burn-in.

# Using the Normal test from SciPy `scipy.stats.normaltest`
# See `https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352`
_, test_normal_beta = stats.normaltest(trace.beta[:,1])
_, test_uniform = stats.normaltest(trace.sigma)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would test for alpha, not sigma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I'll do that.

@springcoil
Copy link
Contributor Author

springcoil commented Dec 29, 2016

Refactored this into two tests - one for NUTS and one for metropolis, added in Burn-in, a longer sampler and changed the statistical test. I'll push in a minute or two.

These tests pass when the sampler is longer etc. Let me know if you need anything else in this.

Update - Tests pass.

# We define very small as 0.05, for the sake of this argument.
# We test this by evaluating if the p-value is above 0.05.
_, test_normal = stats.normaltest(trace[-3000:].alpha)
np.testing.assert_array_less(0.05, test_normal, verbose=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this tests that .05 < p_normal which is what we want, so we get a nice, normal posterior. I would add this test for alpha, beta_0 and beta_1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also rename test_normal to p_normal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion :)

@springcoil
Copy link
Contributor Author

I made some commits - adding nose_parameterized stopped me being able to test it locally in my current conda environment. However it should work on travis.

@fonnesbeck
Copy link
Member

This is the last PR with a 3.0 milestone. Once this is done, I will do one more RC and (hopefully) release if all goes well.

@springcoil
Copy link
Contributor Author

Do we need another rc or can we go straight to a release?

@twiecki
Copy link
Member

twiecki commented Dec 30, 2016

We did do some fixes so probably best to do another RC.

[Slice([alpha, sigma]), Metropolis([beta])]):
trace = sample(100000, step=step_method, progressbar=False)

np.testing.assert_array_almost_equal(np.median(trace.beta, 0), beta_true, decimal=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move these tests into the one below and remove this whole test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we remove this test I think this is ready.

# We define very small as 0.05, for the sake of this argument.
# We test this by evaluating if the p-value is above 0.05.
_, test_p = stats.normaltest(trace[-3000:].alpha)
_, test_p_beta_0 = stats.normaltest(trace[-3000:].beta[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that return an array or just a single sample? Thought it would have to be something like .beta[0, :] or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.


from numpy.testing import assert_array_almost_equal
from nose_parameterized import parameterized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add nose_parameterized to travis and dependencies. Or just revert to the list...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to Travis and the dependencies. Getting context errors though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to the dependencies, for future use even though I'm probably not going to use it for this. Perhaps we should remove it though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to use a list instead.

@@ -11,3 +11,4 @@ recommonmark
sphinx
nbsphinx
numpydoc
nose_parameterized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this anymore

@springcoil
Copy link
Contributor Author

springcoil commented Dec 31, 2016 via email

@twiecki
Copy link
Member

twiecki commented Dec 31, 2016

It is but you can just move the asserts inside the other one.

@springcoil
Copy link
Contributor Author

Ok I added those changes, moved the asserts - removed an argument that wasn't needed anymore and removed references to nose_parameterized. We should (if all the tests pass) be ready to go.

# We test this by evaluating if the p-value is above 0.05.
_, test_p = stats.normaltest(trace[-3000:].alpha)
_, test_p_beta_0 = stats.normaltest(trace[-3000:].beta[0, :])
_, test_p_beta_1 = stats.normaltest(trace[-3000:].beta[:, 1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should both be sliced along the same dimension. What's the shape of beta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah oops - I meant to change that.

Updating tests

Updating tests again

Added in one test for each sampler

Updating tests

Sampling errors

Trying this commit

Removing parameterized nose tests as they didn't work with the model
context

Moving asserts and removing arguments which are unnecessary - removing requirements.txt reference to nose_parameterized

Changing shape

Removing unnecessary test, and checking they run locally
@springcoil
Copy link
Contributor Author

Here you go. This should be mergeable now.

@springcoil
Copy link
Contributor Author

Ok the tests pass I'm going to close the other PR that's related.

@twiecki
Copy link
Member

twiecki commented Dec 31, 2016

It was quicker for me to do it. I think this is what it should look like, but it doesn't pass yet for Slice.

@springcoil
Copy link
Contributor Author

springcoil commented Dec 31, 2016 via email

@springcoil
Copy link
Contributor Author

Ahh I never thought of thinning

@springcoil
Copy link
Contributor Author

I'll merge this later if it passes

@springcoil springcoil merged commit ca7f68d into master Dec 31, 2016
@springcoil springcoil deleted the sampling_output_check2 branch December 31, 2016 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants