Add Interpolated distribution class #2163

ghost · 2017-05-09T21:58:43Z

This PR, as a follow-up on #2146, adds Interpolated distribution that takes arbitrary one-dimensional probability density function as an array of values evaluated at some set of points and makes it possible to use this distribution as a prior distribution.

The PDF values between the points that are present in the input array are calculated by linear interpolation, gradient and random sampling are defined accordingly.

Time complexity of evaluation is O(log n) where n is the number of points in the input array. The step size doesn't have to be the same at all points, so it is possible to make it smaller for the regions of measure concentration and larger for other regions.

I implemented some tests for random function, but unfortunately I was not able to do the same for logp function (but added a test template to pymc3/tests/test_distributions.py). The reason is that this distribution takes plain numpy arrays instead of variables as inputs, because anyway the dimensionality of the input vectors is supposed to be large and it would not be feasible to sample them, but the test suite is supposed to work with distributions that take variables as inputs.

twiecki · 2017-05-10T07:31:14Z

I love the API, not crazy about the name: from_posterior('beta1', trace['beta1']). Brain storming: Empirical() (not wise because of overlap with VI stuff), Interpolated(), FromSamples(), Histogram(), other ideas?

ghost · 2017-05-10T07:45:53Z

@twiecki from_posterior is not a part of the API, it is a user function in the notebook.

The API is

dist = Interpolated('interpolated_dist', x_points=x_points, pdf_points=pdf_points)

twiecki · 2017-05-10T07:47:22Z

Oh, I see. Could we make the Interpolated API similar by providing good defaults?

ghost · 2017-05-10T07:52:31Z

I doubt that it is possible to come here up with good defaults for x_points and pdf_points. It is possible to put here something like for example uniform distribution on [0, 1], but I don't find this choice to be obvious. It might be useful for illustrative purposes though.

The thing I think about is that maybe it is better to name the parameters not x_points and pdf_points, but just x and y. I've chosen the first form in favor of the second one to avoid confusion by making a user think that the distribution is two-dimensional, but I'm not sure is it consistent with other API and maybe x and y would be better.

Btw it is not necessary to explicitly write parameters names, it might be used as well as just

dist = Interpolated('interpolated_dist', x_points, pdf_points)

and for example

x_points = np.linspace(-10, 10, 100)
pdf_points = np.exp(-x_points*x_points/2)

junpenglao · 2017-05-10T09:13:14Z

Looks good. Could you run the whole notebook? The output of the last cell is not shown.

ghost · 2017-05-10T13:16:57Z

@junpenglao Sure! I've updated the notebook.

ghost · 2017-05-10T18:36:36Z

I managed to solve the problem of testing by creating temporary subclasses of Interpolated class that construct Interpolated instances by providing x_points and pdf_points parameters to the parent constructor, so that it becomes possible to use for tests the same functions pymc3_random and pymc3_matches_scipy as for other distributions.

twiecki · 2017-05-11T07:41:28Z

This essentially uses the marginal of the posterior as the prior, right? I.e. we're not taking any possible correlations into account.

ghost · 2017-05-11T07:45:26Z

Yes, it is one-dimensional, so it's impossible to take any correlations into account with it.

However it is useful when the prior comes not as samples, but for example as a result of some numerical integration, so that the prior is one-dimensional, but complex enough to not be explained in terms of standard distributions.

junpenglao · 2017-05-11T08:11:53Z

pymc3/distributions/continuous.py

+        super(Interpolated, self).__init__(transform=transform,
+                                           *args, **kwargs)
+
+        interp = InterpolatedUnivariateSpline(x_points, pdf_points, k=1, ext='zeros')


Should we allow the user to define k here?

There are two major problems with non-linear interpolations:

PDF can become negative for higher-order spline interpolation at some points, it is nonsensical and would break NUTS sampler. It is also impossible to just replace negative values by zeros because it would make impossible to to use integral() method of SciPy spline classes.

random() method can be implemented efficiently only for linear interpolation because inverse CDF can be expressed in closed form only for piecewise-quadratic CDF (well, it is possible to try to do the same for piecewise-cubic CDF using Cardano formula, but I'm not subscribing to it :) For higher-order polynomial interpolations it would be necessary to find inverses numerically, using an iterative process like Newton method.

yep the first point is a valid concern. I think we can merge this now and add higher order polynomial support in the future.

junpenglao · 2017-05-11T08:14:33Z

pymc3/distributions/continuous.py

+                                size=size)
+
+    def logp(self, value):
+        return tt.log(self.interp_op(value) / self.Z)


when I ran locally on your example in #2146 I actually find your first implementation faster.

That was because initially I incorrectly defined the gradient in #2146 , so HMC didn't work well for the second version (see this comment). In this implementation calculation of the gradient is fixed, so it is even faster to put division by the normalization constant here.

you are right, it should be faster then. I will double check on my side then.

junpenglao · 2017-05-11T09:16:13Z

Great work @a-rodin! Thank you for your contribution!

a-rodin1 added 3 commits May 10, 2017 17:15

Add Interpolated distribution class

e2c14c7

Use Interpolated distribution in updating_priors example

aa0fbea

Increase number of samples in update_priors example

c520245

Use temporary subclasses to test Interpolated distribution

4a381f1

junpenglao reviewed May 11, 2017

View reviewed changes

junpenglao merged commit cce9dfe into pymc-devs:master May 11, 2017

ghost mentioned this pull request May 22, 2017

NUTS step, when selected explicitly, requires the second derivative #2209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Interpolated distribution class #2163

Add Interpolated distribution class #2163

ghost commented May 9, 2017 •

edited by ghost

Loading

twiecki commented May 10, 2017

ghost commented May 10, 2017 •

edited by ghost

Loading

twiecki commented May 10, 2017

ghost commented May 10, 2017 •

edited by ghost

Loading

junpenglao commented May 10, 2017

ghost commented May 10, 2017

ghost commented May 10, 2017

twiecki commented May 11, 2017

ghost commented May 11, 2017 •

edited by ghost

Loading

junpenglao May 11, 2017

ghost May 11, 2017 •

edited by ghost

Loading

junpenglao May 11, 2017

junpenglao May 11, 2017

ghost May 11, 2017 •

edited by ghost

Loading

junpenglao May 11, 2017

junpenglao commented May 11, 2017

Add Interpolated distribution class #2163

Add Interpolated distribution class #2163

Conversation

ghost commented May 9, 2017 • edited by ghost Loading

twiecki commented May 10, 2017

ghost commented May 10, 2017 • edited by ghost Loading

twiecki commented May 10, 2017

ghost commented May 10, 2017 • edited by ghost Loading

junpenglao commented May 10, 2017

ghost commented May 10, 2017

ghost commented May 10, 2017

twiecki commented May 11, 2017

ghost commented May 11, 2017 • edited by ghost Loading

junpenglao May 11, 2017

Choose a reason for hiding this comment

ghost May 11, 2017 • edited by ghost Loading

Choose a reason for hiding this comment

junpenglao May 11, 2017

Choose a reason for hiding this comment

junpenglao May 11, 2017

Choose a reason for hiding this comment

ghost May 11, 2017 • edited by ghost Loading

Choose a reason for hiding this comment

junpenglao May 11, 2017

Choose a reason for hiding this comment

junpenglao commented May 11, 2017

ghost commented May 9, 2017 •

edited by ghost

Loading

ghost commented May 10, 2017 •

edited by ghost

Loading

ghost commented May 10, 2017 •

edited by ghost

Loading

ghost commented May 11, 2017 •

edited by ghost

Loading

ghost May 11, 2017 •

edited by ghost

Loading

ghost May 11, 2017 •

edited by ghost

Loading