Sampling Output checks and Normality tests to check for bugs #1618

springcoil · 2016-12-21T00:06:20Z

Based on #1424 - I couldn't for some reason rebase that correctly - added in some KS-tests etc.

Needs a review.

twiecki · 2016-12-21T13:17:34Z

pymc3/tests/test_step.py

@@ -39,26 +41,26 @@ class TestStepMethods(object):  # yield test doesn't work subclassing unittest.T
            7.04959179e-01, 8.37863464e-01, -5.24200836e-01, 1.28261340e+00, 9.08774240e-01,
            8.80566763e-01, 7.82911967e-01, 8.01843432e-01, 7.09251098e-01, 5.73803618e-01]),
        HamiltonianMC: np.array([
-           -0.74925631, -0.2566773 , -2.12480977,  1.64328926, -1.39315913,


Can you remove these from the diff?

twiecki · 2016-12-21T13:41:30Z

pymc3/tests/test_step.py

+                assert np.isclose(np.median(trace.beta, 0), beta_true, rtol=0.1).all()
+                assert np.isclose(np.median(trace.alpha), alpha_true, rtol=0.1)
+                assert np.isclose(np.median(trace.sigma), sigma_true, rtol=0.1)
+                np.random.seed(987654321)


Since we inherit from SeededTest, we shouldn't need to set it again here. Sorry for being confusing.

twiecki · 2016-12-21T13:43:06Z

pymc3/tests/test_step.py

+                test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
+                test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
+                test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
+                assert np.less(np.median(test_normal[1]), 0.05)


What are the p-values? Could we make the test more stringent, perhaps by increasing the number of samples?

Also, I think https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.testing.assert_array_less.html is better here.

twiecki · 2016-12-21T13:43:38Z

pymc3/tests/test_step.py

+                assert np.isclose(np.median(trace.alpha), alpha_true, rtol=0.1)
+                assert np.isclose(np.median(trace.sigma), sigma_true, rtol=0.1)
+                np.random.seed(987654321)
+                test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')


how about: _, p_normal = ..., can then drop the index below.

twiecki · 2016-12-21T13:43:55Z

pymc3/tests/test_step.py

+                test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
+                test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
+                assert np.less(np.median(test_normal[1]), 0.05)
+                assert np.less(np.median(test_normal_beta[1]) / 2, 0.5)


why / 2? and .5?

That was to hack it together. I'm fixing this now. Thanks for the feedback.

springcoil · 2016-12-21T20:29:52Z

@twiecki Thanks for the feedback. I changed the tests slightly based on your suggestions, thanks. The tests are now hopefully a lot more robust. We may have to relax the conditions slightly, but they passed on my computer.

twiecki · 2016-12-21T20:40:15Z

pymc3/tests/test_step.py

+                                [Slice([alpha, sigma]), Metropolis([beta])]):
+                trace = sample(1000, step=step_method, progressbar=False)
+
+                assert np.isclose(np.median(trace.beta, 0), beta_true, rtol=0.1).all()


np.testing.assert_array_almost_equal()

twiecki · 2016-12-21T20:40:45Z

pymc3/tests/test_step.py

+                _, test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
+                _, test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
+                _, test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')
+                np.testing.assert_array_almost_equal(np.median(test_normal), 4.9960036108132044e-15, decimal=2)


what is this testing? shouldn't those be just p-values?

Hmm maybe something went wrong. Let me find out...

Actually i'm using the wrong variant of the test, https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.kstest.html - must have gotten mixed up in my experiments.

twiecki · 2016-12-21T20:45:28Z

pymc3/tests/test_step.py

+                assert np.isclose(np.median(trace.sigma), sigma_true, rtol=0.1)
+                _, test_normal = stats.kstest(trace.alpha, 'norm', alternative='greater')
+                _, test_normal_beta = stats.kstest(trace.beta[0], 'norm', alternative='greater')
+                _, test_uniform = stats.kstest(trace.sigma, 'uniform', alternative='greater')


Shouldn't this also test for a normal distribution?

sigma = Uniform('sigma', lower=0.0, upper=1.0) - surely I need to test for the uniform distribtion? Or am i mistaken..n

But you're testing the posterior which should be Normal-ish. Log-normal I guess. Maybe we'll leave that one out here.

Ok fair point :) I'll add in something normalish.

springcoil · 2016-12-21T20:58:56Z

There were some errors caused during my rebase. Fixing now...

springcoil · 2016-12-21T22:04:31Z

Having trouble getting the p-values to work as we intended... seems to be quite tricky.

springcoil · 2016-12-22T08:01:06Z

I'm having trouble diagnosing this error. I'll try to write up something though - even though it's over the holidays and I'm sleep deprived.

springcoil · 2016-12-22T08:09:06Z

Ahh I think I get it. From the scipy.stats documentation

*Testing t distributed random variables against normal distribution*

With 100 degrees of freedom the t distribution looks close to the normal
distribution, and the K-S test does not reject the hypothesis that the
sample came from the normal distribution:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(100,size=100),'norm')
(0.072018929165471257, 0.67630062862479168)

With 3 degrees of freedom the t distribution looks sufficiently different
from the normal distribution, that we can reject the hypothesis that the
sample came from the normal distribution at the 10% level:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(3,size=100),'norm')
(0.131016895759829, 0.058826222555312224)

So this is testing if the two distributions are different, the p-value being below say 10% proves that they are different distributions - not if they are the same distribution. Does anyone know another test or am I misunderstanding this?

twiecki · 2016-12-22T08:26:58Z

Ohh, right, I forgot about that. How high are the p-values?

springcoil · 2016-12-22T08:28:45Z

I'll include the diagnostics later - boarding a plane atm :)

springcoil · 2016-12-22T15:23:46Z

NormaltestResult(statistic=288.8868099649228, pvalue=1.8579168299255905e-63) is the result from the Normal test for stats.normaltest(trace.beta[:,1])

And (Pdb) stats.normaltest(trace.sigma) NormaltestResult(statistic=31.217980013036858, pvalue=1.6638024983177679e-07)

Submitting an updated PR now.

springcoil · 2016-12-22T19:15:19Z

@twiecki Ready to merge?

twiecki · 2016-12-26T06:13:17Z

pymc3/tests/test_step.py

+                # See `https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352`
+                _, test_normal_beta = stats.normaltest(trace.beta[:,1])
+                _, test_uniform = stats.normaltest(trace.sigma)
+                np.testing.assert_array_less(test_normal_beta, 0.005)


But I thought we realized that these should not be significant, so this means the posteriors are not normal, indicating a problem with our sampling. CC @fonnesbeck @jsalvatier

From https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html: "This function tests the null hypothesis that a sample comes from a normal distribution."

springcoil · 2016-12-26T12:51:29Z

I think this test indicates it is normal. because this is a different test. nevertheless can someone else check this? @colcarrol @fonnesbeck.

…

On 26 Dec 2016 6:13 AM, "Thomas Wiecki" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pymc3/tests/test_step.py <#1618 (review)>: > + sigma = Uniform('sigma', lower=0.0, upper=1.0) + mu = alpha + beta[0] * X1 + beta[1] * X2 + Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y) + + for step_method in (NUTS(), Metropolis(), + [Slice([alpha, sigma]), Metropolis([beta])]): + trace = sample(1000, step=step_method, progressbar=False) + + np.testing.assert_array_almost_equal(np.median(trace.beta, 0), beta_true, decimal=1) + np.testing.assert_array_almost_equal(np.median(trace.alpha), alpha_true, decimal=1) + np.testing.assert_array_almost_equal(np.median(trace.sigma), sigma_true, decimal=1) + # Using the Normal test from SciPy `scipy.stats.normaltest` + # See `https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352` <https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352> + _, test_normal_beta = stats.normaltest(trace.beta[:,1]) + _, test_uniform = stats.normaltest(trace.sigma) + np.testing.assert_array_less(test_normal_beta, 0.005) But I thought we realized that these should *not* be significant, so this means the posteriors are *not* normal, indicating a problem with our sampling. CC @fonnesbeck <https://github.com/fonnesbeck> @jsalvatier <https://github.com/jsalvatier> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1618 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiGpxmBaofOsuSw9Lctj-UUtl2jRzks5rL1r-gaJpZM4LSb5I> .

springcoil · 2016-12-26T12:58:33Z

Ok we've got problems then with the samplers. These tests are worth doing. Good point.

…

On 26 Dec 2016 12:56 PM, "Thomas Wiecki" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pymc3/tests/test_step.py <#1618> : > + sigma = Uniform('sigma', lower=0.0, upper=1.0) + mu = alpha + beta[0] * X1 + beta[1] * X2 + Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y) + + for step_method in (NUTS(), Metropolis(), + [Slice([alpha, sigma]), Metropolis([beta])]): + trace = sample(1000, step=step_method, progressbar=False) + + np.testing.assert_array_almost_equal(np.median(trace.beta, 0), beta_true, decimal=1) + np.testing.assert_array_almost_equal(np.median(trace.alpha), alpha_true, decimal=1) + np.testing.assert_array_almost_equal(np.median(trace.sigma), sigma_true, decimal=1) + # Using the Normal test from SciPy `scipy.stats.normaltest` + # See `https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352` <https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352> + _, test_normal_beta = stats.normaltest(trace.beta[:,1]) + _, test_uniform = stats.normaltest(trace.sigma) + np.testing.assert_array_less(test_normal_beta, 0.005) From https://docs.scipy.org/doc/scipy/reference/generated/ scipy.stats.normaltest.html: "This function tests the null hypothesis that a sample comes from a normal distribution." — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1618>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiJqdCiWulwDuZ7bT4I2G2sK0Tt-bks5rL7l7gaJpZM4LSb5I> .

twiecki · 2016-12-26T13:02:07Z

pymc3/tests/test_step.py

+
+            for step_method in (NUTS(), Metropolis(),
+                                [Slice([alpha, sigma]), Metropolis([beta])]):
+                trace = sample(1000, step=step_method, progressbar=False)


I would draw more samples and add generous burn-in.

I'll add more generous burn-in.

twiecki · 2016-12-26T13:02:39Z

pymc3/tests/test_step.py

+                # Using the Normal test from SciPy `scipy.stats.normaltest`
+                # See `https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/stats.py#L1352`
+                _, test_normal_beta = stats.normaltest(trace.beta[:,1])
+                _, test_uniform = stats.normaltest(trace.sigma)


Again, I would test for alpha, not sigma.

Thanks I'll do that.

springcoil · 2016-12-29T12:00:20Z

Refactored this into two tests - one for NUTS and one for metropolis, added in Burn-in, a longer sampler and changed the statistical test. I'll push in a minute or two.

These tests pass when the sampler is longer etc. Let me know if you need anything else in this.

Update - Tests pass.

twiecki · 2016-12-30T18:57:20Z

pymc3/tests/test_step.py

+            # We define very small as 0.05, for the sake of this argument.
+            # We test this by evaluating if the p-value is above 0.05.
+            _, test_normal = stats.normaltest(trace[-3000:].alpha)
+            np.testing.assert_array_less(0.05, test_normal, verbose=True)


So this tests that .05 < p_normal which is what we want, so we get a nice, normal posterior. I would add this test for alpha, beta_0 and beta_1.

I'll add that.

I'd also rename test_normal to p_normal.

Good suggestion :)

springcoil · 2016-12-30T19:37:18Z

I made some commits - adding nose_parameterized stopped me being able to test it locally in my current conda environment. However it should work on travis.

fonnesbeck · 2016-12-30T20:16:55Z

This is the last PR with a 3.0 milestone. Once this is done, I will do one more RC and (hopefully) release if all goes well.

springcoil · 2016-12-30T20:22:17Z

Do we need another rc or can we go straight to a release?

twiecki · 2016-12-30T20:36:38Z

We did do some fixes so probably best to do another RC.

twiecki · 2016-12-30T20:39:24Z

pymc3/tests/test_step.py

+                                [Slice([alpha, sigma]), Metropolis([beta])]):
+                trace = sample(100000, step=step_method, progressbar=False)
+
+                np.testing.assert_array_almost_equal(np.median(trace.beta, 0), beta_true, decimal=1)


Why not move these tests into the one below and remove this whole test?

Once we remove this test I think this is ready.

twiecki · 2016-12-30T20:40:19Z

pymc3/tests/test_step.py

+            # We define very small as 0.05, for the sake of this argument.
+            # We test this by evaluating if the p-value is above 0.05.
+            _, test_p = stats.normaltest(trace[-3000:].alpha)
+            _, test_p_beta_0 = stats.normaltest(trace[-3000:].beta[0])


Does that return an array or just a single sample? Thought it would have to be something like .beta[0, :] or some such.

Good point.

twiecki · 2016-12-30T20:40:49Z

pymc3/tests/test_step.py


 from numpy.testing import assert_array_almost_equal
+from nose_parameterized import parameterized


Need to add nose_parameterized to travis and dependencies. Or just revert to the list...

I added it to Travis and the dependencies. Getting context errors though.

Added it to the dependencies, for future use even though I'm probably not going to use it for this. Perhaps we should remove it though

I chose to use a list instead.

twiecki · 2016-12-30T22:29:24Z

requirements.txt

@@ -11,3 +11,4 @@ recommonmark
 sphinx
 nbsphinx
 numpydoc
+nose_parameterized


don't need this anymore

springcoil · 2016-12-31T11:48:24Z

Is this older test not needed then?

…

On 30 Dec 2016 10:29 PM, "Thomas Wiecki" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pymc3/tests/test_step.py <#1618> : > + X1 = np.random.randn(size) + X2 = np.random.randn(size) * 0.2 + Y = alpha_true + beta_true[0] * X1 + beta_true[1] * X2 + np.random.randn(size) * sigma_true + + with Model() as model: + alpha = Normal('alpha', mu=0, sd=10) + beta = Normal('beta', mu=0, sd=10, shape=2) + sigma = Uniform('sigma', lower=0.0, upper=1.0) + mu = alpha + beta[0] * X1 + beta[1] * X2 + Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y) + + for step_method in (NUTS(), Metropolis(), + [Slice([alpha, sigma]), Metropolis([beta])]): + trace = sample(100000, step=step_method, progressbar=False) + + np.testing.assert_array_almost_equal(np.median(trace.beta, 0), beta_true, decimal=1) Once we remove this test I think this is ready. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1618>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiOa2QR3ucCf7uvtjwie7GSu0kfNoks5rNYXWgaJpZM4LSb5I> .

twiecki · 2016-12-31T12:08:05Z

It is but you can just move the asserts inside the other one.

springcoil · 2016-12-31T12:14:49Z

Ok I added those changes, moved the asserts - removed an argument that wasn't needed anymore and removed references to nose_parameterized. We should (if all the tests pass) be ready to go.

twiecki · 2016-12-31T12:17:25Z

pymc3/tests/test_step.py

+                # We test this by evaluating if the p-value is above 0.05.
+                _, test_p = stats.normaltest(trace[-3000:].alpha)
+                _, test_p_beta_0 = stats.normaltest(trace[-3000:].beta[0, :])
+                _, test_p_beta_1 = stats.normaltest(trace[-3000:].beta[:, 1])


They should both be sliced along the same dimension. What's the shape of beta?

Yeah oops - I meant to change that.

Updating tests Updating tests again Added in one test for each sampler Updating tests Sampling errors Trying this commit Removing parameterized nose tests as they didn't work with the model context Moving asserts and removing arguments which are unnecessary - removing requirements.txt reference to nose_parameterized Changing shape Removing unnecessary test, and checking they run locally

springcoil · 2016-12-31T15:01:31Z

Here you go. This should be mergeable now.

springcoil · 2016-12-31T15:53:58Z

Ok the tests pass I'm going to close the other PR that's related.

twiecki · 2016-12-31T17:00:23Z

It was quicker for me to do it. I think this is what it should look like, but it doesn't pass yet for Slice.

springcoil · 2016-12-31T17:04:59Z

Yeah it's fair that Slice should be there too. Maybe Slice applied to alpha and beta only?

…

On 31 Dec 2016 5:00 PM, "Thomas Wiecki" ***@***.***> wrote: It was quicker for me to do it. I think this is what it should look like, but it doesn't pass yet for Slice. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1618 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiO3-IMNNG0DAcLYYxGZ4L0nBm_uyks5rNooogaJpZM4LSb5I> .

springcoil · 2016-12-31T18:58:46Z

Ahh I never thought of thinning

springcoil · 2016-12-31T19:09:34Z

I'll merge this later if it passes

springcoil changed the title ~~Added TestSampleEstimates to check for obvious poor sampling by NUTS~~ Sampling Output checks and KS-tests to check for bugs Dec 21, 2016

twiecki reviewed Dec 21, 2016

View reviewed changes

springcoil force-pushed the sampling_output_check2 branch from b380975 to e1dcb23 Compare December 21, 2016 20:26

springcoil added the enhancements label Dec 21, 2016

springcoil added this to the Version 3.0.stable milestone Dec 21, 2016

twiecki reviewed Dec 21, 2016

View reviewed changes

springcoil force-pushed the sampling_output_check2 branch 2 times, most recently from d59e535 to f7e3a7d Compare December 22, 2016 15:55

twiecki reviewed Dec 26, 2016

View reviewed changes

springcoil force-pushed the sampling_output_check2 branch from f7e3a7d to 27191f8 Compare December 29, 2016 12:03

twiecki reviewed Dec 30, 2016

View reviewed changes

springcoil force-pushed the sampling_output_check2 branch from 96fd65f to f57947d Compare December 30, 2016 19:36

twiecki reviewed Dec 30, 2016

View reviewed changes

springcoil force-pushed the sampling_output_check2 branch from f57947d to 2acf88d Compare December 30, 2016 22:15

twiecki reviewed Dec 30, 2016

View reviewed changes

requirements.txt

@@ -11,3 +11,4 @@ recommonmark

sphinx

nbsphinx

numpydoc

nose_parameterized

Copy link

Member

twiecki Dec 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this anymore

springcoil force-pushed the sampling_output_check2 branch from 2acf88d to b54ffcf Compare December 31, 2016 12:14

twiecki reviewed Dec 31, 2016

View reviewed changes

springcoil force-pushed the sampling_output_check2 branch from b54ffcf to 1aabf65 Compare December 31, 2016 12:43

springcoil force-pushed the sampling_output_check2 branch from 1aabf65 to 7854624 Compare December 31, 2016 15:00

springcoil mentioned this pull request Dec 31, 2016

Sampling output check #1424

Closed

TST Improve test.

c5c5a2d

TST Add thinning.

043ac6b

springcoil merged commit ca7f68d into master Dec 31, 2016

springcoil deleted the sampling_output_check2 branch December 31, 2016 19:23


		from numpy.testing import assert_array_almost_equal
		from nose_parameterized import parameterized

Sampling Output checks and Normality tests to check for bugs #1618

Sampling Output checks and Normality tests to check for bugs #1618

Conversation

springcoil commented Dec 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twiecki Dec 21, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 21, 2016

springcoil commented Dec 21, 2016

springcoil commented Dec 22, 2016

springcoil commented Dec 22, 2016 • edited

twiecki commented Dec 22, 2016

springcoil commented Dec 22, 2016

springcoil commented Dec 22, 2016 • edited

springcoil commented Dec 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 26, 2016 via email

springcoil commented Dec 26, 2016 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 29, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 30, 2016

fonnesbeck commented Dec 30, 2016

springcoil commented Dec 30, 2016

twiecki commented Dec 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 31, 2016 via email

twiecki commented Dec 31, 2016

springcoil commented Dec 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented Dec 31, 2016

springcoil commented Dec 31, 2016

twiecki commented Dec 31, 2016

springcoil commented Dec 31, 2016 via email

springcoil commented Dec 31, 2016

springcoil commented Dec 31, 2016

twiecki Dec 21, 2016 •

edited

springcoil commented Dec 22, 2016 •

edited

springcoil commented Dec 22, 2016 •

edited

springcoil commented Dec 29, 2016 •

edited