Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored continuous distributions to v4 (AsymmetricLaplace, HalfStudentT, ExGaussian, Interpolated) #4746

Merged
merged 5 commits into from
Jun 14, 2021

Conversation

kc611
Copy link
Contributor

@kc611 kc611 commented Jun 6, 2021

Linked Issue: #4686

This PR refactors the following distributions:

  • AsymmetricLaplace
  • HalfStudentT
  • ExGaussian
  • Interpolated

(Random Note: It's strange how git is showing that many file changes in the test file pymc3/tests/test_distributions.py when only difference I made is of few lines ?)

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR! I marked a couple of things than need to be changed. Let me know if you have any questions :)

Also, we will need the test in test_distributions_random.py to be refactored / added, and any tests that may be marked as xfail because of these distributions anywhere else on the test module.

@michaelosthege michaelosthege added this to the vNext (4.0.0) milestone Jun 6, 2021
@ricardoV94 ricardoV94 mentioned this pull request Jun 6, 2021
26 tasks

@pytest.mark.xfail(condition=(aesara.config.floatX == "float32"), reason="Fails on float32")
@pytest.mark.xfail(reason="Distribution not refactored yet")
@pytest.mark.xfail(reason="Test not yet refactored for new version")
Copy link
Member

@ricardoV94 ricardoV94 Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be refactored for this PR, otherwise we have no test that covers the logp method of the Interpolated, or do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I plan on doing exactly that in this PR. It seems some problem with the arguments not being passed correctly.

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 7, 2021

(Random Note: It's strange how git is showing that many file changes in the test file pymc3/tests/test_distributions.py when only difference I made is of few lines ?)

Could it be some auto-formatting on your end? I can try to check locally to see if it is just a GitHub confusion or if you actually indented those lines.

@kc611
Copy link
Contributor Author

kc611 commented Jun 7, 2021

Could it be some auto-formatting on your end? I can try to check locally to see if it is just a GitHub confusion or if you actually indented those lines.

Please do. The only changes it shows on my side are the few changed x_fail lines.

Edit: Never mind, Looks like VS Code doesn't take into account the indentation changes as "real" changes in git. Fixed it now.

@kc611 kc611 marked this pull request as draft June 7, 2021 10:06
@kc611 kc611 force-pushed the add_dist branch 2 times, most recently from 293900a to cf99a91 Compare June 7, 2021 10:51
def rng_fn(cls, rng, x, pdf, size=None):
interp = InterpolatedUnivariateSpline(x, pdf, k=1, ext="zeros")

Z = interp.integral(x[0], x[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside rng_fn there shouldn't be any symbolic operations. I didn't realize you need Z here as well.

You can either use scipy equivalents or, if easier, have everything that's needed for both logp and the rng_fn defined once during the dist classmethod and passed to super as additional inputs (and expected in both the logp and rng_fn. This way, the rng_fn will receive the non-symbolic arguments as inputs during normal sampling, while the logp method will receive the symbolic ones.

Apologies if this sounds very confusing :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should probably refactor the test first, just so that I'll have an intuition of how this entire Interpolated thing should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you can also check it out in V3, and see how it works there (and in the test). That may help, since this is a very non-standard one

@kc611 kc611 requested a review from ricardoV94 June 9, 2021 15:20
@kc611 kc611 marked this pull request as ready for review June 10, 2021 11:12
@kc611 kc611 changed the title Refactored remaining continuous distributions to v4 (AsymmetricLaplace, HalfStudentT, ExGaussian, Interpolated) Refactored continuous distributions to v4 (AsymmetricLaplace, HalfStudentT, ExGaussian, Interpolated) Jun 10, 2021
Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@ricardoV94 you make the call

@@ -270,7 +270,6 @@ class TestWald(BaseTestCases.BaseTestCase):
params = {"mu": 1.0, "lam": 1.0, "alpha": 0.0}


@pytest.mark.xfail(reason="This distribution has not been refactored for v4")
class TestAsymmetricLaplace(BaseTestCases.BaseTestCase):
Copy link
Member

@ricardoV94 ricardoV94 Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests that rely on BaseTestCases.BaseTestCase should be removed and replaced with tests using the new BaseTestDistribution

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far. The random tests still need to be refactored to use the BaseTestDistribution

@@ -1497,7 +1494,6 @@ def ref_rand(size, mu, sigma):

pymc3_random(pm.Moyal, {"mu": R, "sigma": Rplus}, ref_rand=ref_rand)

@pytest.mark.xfail(reason="This distribution has not been refactored for v4")
@pytest.mark.xfail(condition=(aesara.config.floatX == "float32"), reason="Fails on float32")
def test_interpolated(self):
Copy link
Member

@ricardoV94 ricardoV94 Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be moved to it's own class. The reason for this is that after refactoring all distributions, the TestScalarParameterSamples where this one is will disappear, but this distribution is special enough that the test should be kept (i.e., we are not simply drawing from scipy / numpy distributions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved it to it's own class and named it TestInterpolated

@@ -1257,7 +1256,6 @@ def ref_rand(size, mu, lam, alpha):
ref_rand=ref_rand,
)

@pytest.mark.xfail(reason="This distribution has not been refactored for v4")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests that are using a ref_rand should be removed as well, unless there is a good reason to test them against a reference. If we are just copying the same scipy/numpy expression we use in our random Ops the default tests in BaseTestDistribution should be used instead.

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The last thing I would suggest is to add a BaseTestDistribution for the Interpolated just with the check_rv_size test enabled. Just to make sure size is working properly.

Otherwise, I think this PR is pretty much spot on. Specially wth the thorny Interpolated.

@kc611
Copy link
Contributor Author

kc611 commented Jun 11, 2021

Thanks @ricardoV94 for all your suggestions, (and especially for being patient with me in the start of this PR with all those silly mistakes. I guess my coffee didn't work quite right that particular day.)

Edit: Also (just a heads up) I'll be working on refactoring KroneckerNormal from the multivariate distributions next.

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 12, 2021

I am afraid the Interpolated is still problematic. Right now it works fine if you pass non-symbolic arguments, but it fails if the inputs are symbolic. This snippet raises an enigmatic ValueError from within the scipy.interpolate method:

x_points = np.linspace(0, 10)
pdf_points = st.norm.pdf(x_points, loc=5, scale=0.1)
ae_x_points = at.as_tensor_variable(x_points)
ae_pdf_points = at.as_tensor_variable(pdf_points)
interp = pm.Interpolated.dist(ae_x_points, ae_pdf_points, size=(2, 4))
# ValueError: diff requires input that is at least one dimensional

V3 also had the same limitation.

It seems a proper fix would require integrating the non-symbolic InterpolatedUnivariateSpline inside the symbolic SplineWrapper (or as a separate symbolic Op). The logp method making use of x_points.data is also quite hackish. For instance, it wouldn't work with a generic at.arange input.

If we can't be bothered at the moment, perhaps we should at least mention this limitation explicitly in the docstring and add an informative TypeError in dist if the input is symbolic. What do you think?

Edit: I see this line is in the docstrings:

Both parameters ``x_points`` and values ``pdf_points`` are not variables, but
plain array-like objects, so they are constant and cannot be sampled.

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 12, 2021

I pushed some changes that hopefully clarify a bit how the Interpolated is expected to work internally, and also placed the contents of the _argcdf method directly inside the rng_fn.

I am okay with merging as is, and perhaps open a GH issue to discuss whether we want to make this distribution accept symbolic elements in the future.

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @kc611 and thanks for your patience. Let me know if you also agree with merging it, given my last comments.

@kc611
Copy link
Contributor Author

kc611 commented Jun 13, 2021

Sure, you can go ahead and merge this, and yes, we should probably open an issue for making a proper fix for integrating the non-symbolic InterpolatedUnivariateSpline inside the symbolic SplineWrapper (or as a separate symbolic Op). Otherwise we'll be stuck with this hackish approach for Interpolated which I'm sure we don't want for v4.

@ricardoV94
Copy link
Member

Sure, you can go ahead and merge this, and yes, we should probably open an issue for making a proper fix for integrating the non-symbolic InterpolatedUnivariateSpline inside the symbolic SplineWrapper (or as a separate symbolic Op). Otherwise we'll be stuck with this hackish approach for Interpolated which I'm sure we don't want for v4.

Do you mind creating that issue?

I'll wait until Monday to merge the PR in case anyone else wants to have a look.

@ricardoV94 ricardoV94 merged commit 02a973b into pymc-devs:main Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants