Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seed HSGP bootstrap tests #6679

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Apr 18, 2023

Failed in #6678


📚 Documentation preview 📚: https://pymc--6679.org.readthedocs.build/en/6679/

@ricardoV94 ricardoV94 added tests needs info Additional information required GP Gaussian Process labels Apr 18, 2023
@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Merging #6679 (71e663e) into main (1ed4475) will decrease coverage by 2.47%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6679      +/-   ##
==========================================
- Coverage   91.99%   89.52%   -2.47%     
==========================================
  Files          94       94              
  Lines       15944    15944              
==========================================
- Hits        14667    14274     -393     
- Misses       1277     1670     +393     

see 5 files with indirect coverage changes

@bwengals
Copy link
Contributor

weird that a different seed it still fails. maybe very unlucky...? I couldnt get it to fail in the original PR. Does it fail for like, any seed?

@ricardoV94
Copy link
Member Author

My last commit tries 5 seeds, it fails in one of them. The failure rate seems higher than 0.01. Even with 5 tests should only fail 5% of the time

@bwengals
Copy link
Contributor

yup somethings wrong, will pull and give it a try

Comment on lines +200 to +202
seeds = np.arange(5) + 197 # 5 possible seeds
rng = np.random.default_rng(np.random.choice(seeds))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems cleaner to use @parameterize like for test_prior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @parametrize was just to show what seed caused the test to fail. The idea was to revert back to choosing one of 5 possible seeds once the root cause was found.

I imagine we don't want to run each test 5 times, but instead allow some variation across tests.

"""Compare HSGP prior to unapproximated GP prior, pm.gp.Latent. Draw samples from the
prior and compare them using MMD two sample test. Tests both centered and non-centered
parameterizations.
"""
rng = np.random.default_rng(seed)

with model:
hsgp = pm.gp.HSGP(m=[200], c=2.0, parameterization=parameterization, cov_func=cov_func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try increasing m here to 500

with model:
hsgp = pm.gp.HSGP(m=[200], c=2.0, parameterization=parameterization, cov_func=cov_func)
f1 = hsgp.prior("f1", X=X1)

gp = pm.gp.Latent(cov_func=cov_func)
f2 = gp.prior("f2", X=X1)

idata = pm.sample_prior_predictive(samples=1000)
idata = pm.sample_prior_predictive(samples=1000, random_seed=rng)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to decrease samples here to 500, will run faster which will be nice for trying the different random seeds

@@ -194,17 +197,20 @@ def test_conditional(self, model, cov_func, X1, parameterization):
prior and compare them using MMD two sample test. Tests both centered and non-centered
parameterizations. The conditional should match the prior when no data is observed.
"""
seeds = np.arange(5) + 197 # 5 possible seeds
rng = np.random.default_rng(np.random.choice(seeds))

with model:
hsgp = pm.gp.HSGP(m=[100], c=2.0, parameterization=parameterization, cov_func=cov_func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try increasing m here to 500 too. This got rid of the random failures for me locally

Copy link
Contributor

@bwengals bwengals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing m seems to have fixed things for me locally. Then, decreasing the number of samples in pm.sample_posterior_predictive helps it go a bit faster. I think that should fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GP Gaussian Process needs info Additional information required tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants