`pm.sample_prior_predictive` fails when model contains a mixture-based distribution #3101

ahmadsalim · 2018-07-17T09:13:49Z

I am trying to use the new pm.sample_prior_predictive method to get a sense of my priors.
I get an internal error when my model contains a mixture-based distribution.

Self-contained example:

with pm.Model() as model:
    mu = pm.Normal('mu', mu=[0, math.pi/2], tau=8/math.pi, shape=2)
    tau = pm.Gamma('tau', alpha=1, beta=1, shape=2)
    w = pm.Dirichlet('w', a=np.ones(2), shape=2)
    ys = pm.NormalMixture('ys', w=w, mu=mu, tau=tau, comp_shape=2, shape=1)
    prior = pm.sample_prior_predictive()

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-7e5dc188c0c0> in <module>()
      4     w = pm.Dirichlet('w', a=np.ones(2), shape=2)
      5     ys = pm.NormalMixture('ys', w=w, mu=mu, tau=tau, shape=1)
----> 6     prior = pm.sample_prior_predictive()

~/Documents/SourceControl/probprog-sandbox/venv/lib/python3.6/site-packages/pymc3/sampling.py in sample_prior_predictive(samples, model, vars, random_seed)
   1314     names = get_default_varnames(model.named_vars, include_transformed=False)
   1315     # draw_values fails with auto-transformed variables. transform them later!
-> 1316     values = draw_values([model[name] for name in names], size=samples)
   1317 
   1318     data = {k: v for k, v in zip(names, values)}

~/Documents/SourceControl/probprog-sandbox/venv/lib/python3.6/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    291                                                          point=point,
    292                                                          givens=temp_givens,
--> 293                                                          size=size))
    294                 stored.add(next_.name)
    295             except theano.gof.fg.MissingInputError:

~/Documents/SourceControl/probprog-sandbox/venv/lib/python3.6/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    382             return point[param.name]
    383         elif hasattr(param, 'random') and param.random is not None:
--> 384             return param.random(point=point, size=size)
    385         elif (hasattr(param, 'distribution') and
    386                 hasattr(param.distribution, 'random') and

~/Documents/SourceControl/probprog-sandbox/venv/lib/python3.6/site-packages/pymc3/model.py in __call__(self, *args, **kwargs)
     40 
     41     def __call__(self, *args, **kwargs):
---> 42         return getattr(self.obj, self.method_name)(*args, **kwargs)
     43 
     44 

~/Documents/SourceControl/probprog-sandbox/venv/lib/python3.6/site-packages/pymc3/distributions/mixture.py in random(self, point, size)
    179                     samples[i, :] = np.squeeze(comp_tmp[np.arange(w_tmp.size), ..., w_tmp])
    180                 else:
--> 181                     samples[i, :] = np.squeeze(comp_tmp[w_tmp])
    182 
    183         return samples

ValueError: could not broadcast input array from shape (500) into shape (1)

Additional information:
It seems to work when I comment out the line with NormalMixture and I do not get any model error if I don't sample.

Any idea why this happens?
Thanks!

Versions and main components

PyMC3 Version: 3.5 RC1 (master)
Theano Version: 1.0.2
Python Version: 3.6.5
Operating system: macOS 10.13.5
How did you install PyMC3: pip

The text was updated successfully, but these errors were encountered:

junpenglao · 2018-07-17T10:48:57Z

Thanks for reporting, you can remove the shape=1 to make it work.
Oddly it works when shape=5 etc...

ahmadsalim · 2018-07-17T11:01:19Z

Ah, great, thanks!
Yeah, it does seem odd that it fails only when shape=1.

ColCarroll · 2018-07-17T13:47:00Z

Yes, the shape handling does some funny things around scalars, since univariate distributions continue to be the "common case".

Consider sampling from

with pm.Model():
    pm.Normal('mu', x, 1)
    prior = pm.sample_prior_predictive()
print(prior['mu'].shape)

In case x = 0 or x = np.array(0), this prints (500,). If x = np.array([0]), you must specify shape=1, and it still prints (500,). If x = np.array([[0]]), you must specify shape=(1, 1), and it prints (500, 1, 1).

This was on purpose to avoid making too many backward-incompatible changes.

ColCarroll · 2018-07-17T13:50:41Z

Er... rereading this it sounds like I am suggesting this behavior is obvious. I want to emphasize that shape handling is hard (not just in PyMC3, but in life), and especially idiosyncratic around 1-d distributions. We welcome suggestions for making shape handling more intuitive!

junpenglao · 2018-07-17T17:04:24Z

It's a touchy subject and I have nightmare about it from time to time.

ahmadsalim · 2018-07-18T08:07:19Z

I have only recently started seriously using prob. prog. frameworks like pymc3, but I will provide my 2 cents of suggestions below.

I think one particular improvement could be to have less overloading on the use of the shape parameter.
E.g., if I want to draw an F-component multi-variate normal I would write pm.MvNormal(mu=np.zeros(F), cov=np.ones((F,F)), shape=F) and I would get an array of shape (F,), whereas if I want to draw T-times F-component multi-variate normals, I would change the shape to (T,F).

This variability in shapes seems to make it more difficult conceptually reason about what the target shape is. It would be nice if there was a separation between the number of draws D and the number of components C and then we would have a specification e.g.,
pm.MvNormal(mu=np.zeros(F), cov=np.ones((F,F)), draws=D, components=C) always returns a (D,C) matrix. So even one draw (I guess the default) would return (1,F) instead of (F,).
This gets more complicated with things like mixture distributions and covariance matrixes (I am having trouble specifying number of draws for LKJCholeskyCov and explicitly resorting to a for-loop in one of my models).

As someone, from a dependent types/formal verification background, I think it would be cool to at least have an informal specification of the calculated shape of output given the shape of inputs (I am willing to help with this if you are interested). Maybe this is more relevant for pyMC4, where slightly breaking backwards compatibility is more feasible.

junpenglao · 2018-07-18T10:38:54Z

That's a great suggestion and actually we have some effort to refactor towards that aspect (eg #2833) The difficulty is that... it is really difficult... As the shape issue currently leaves many corner cases that we fixed by hard coding some of the shape related component, which is basically a mess in some distribution.
Hopefully PyMC4 would be much better indeed, as the tensorflow distribution is indeed have these different dimension issue considered.

ahmadsalim · 2018-07-18T11:12:14Z

Great to hear that there is some active effort for refactoring, I will take a look at it.

While I understand that there is difficulty, I think having at least a semi-formal specification of relation between input and output shapes and types at least makes the difficulties apparent. We can try in a sense to consider the input and output types and shapes symbolically and see if for target use cases everything fits together nicely like a puzzle.

ColCarroll · 2018-07-18T11:28:13Z

Another moving target I would toss in there is that numpy has a convention that we try to respect as much as possible. This leads to situations like

np.broadcast(np.ones((2, 1)), np.ones((2, 500))).shape  # (2, 500)
np.broadcast(np.ones((2)), np.ones((2, 500))).shape  # ValueError

The second case comes up a lot, though not because the user intended, so we will often break with convention there. For example:

with pm.Model():
    mu = pm.Normal('mu', mu=0, sd=1, shape=2)
    theta = pm.Normal('theta', mu=mu, sd=np.ones(2))
    prior = pm.sample_prior_predictive(500)

To calculate the "right" size for theta, we try that exact broadcasting.

ahmadsalim · 2018-07-20T09:14:56Z

@ColCarroll I think for your example, my suggestion of separating draws from components could potentially be useful.

For example, one could try the following interpretation:

with pm.Model():
    # Implicitly convert constants to shape (1,1) in arguments
    mu = pm.Normal('mu', mu=0, sd=1, draws=2)  # We get that shape(mu) = (2,1)
    theta = pm.Normal('theta', mu=mu, sd=np.ones((2,1)), draws=2) # Use each draw from `mu` to make a draw for theta, so shape(theta) = (2,1). In particular we must ensure draws(theta) = k * draws(mu) and draws(mu) = draws(sd).
    prior = pm.sample_prior_predictive(500) # Do broadcasting as expected using numpy

In any case, I am starting to get a sense of why it becomes difficult for more complex distributions. I will take a closer look at #2833 .

ahmadsalim · 2018-07-20T09:20:54Z

I see that there are similar ideas about using atom_shape and param_shape, so it is already moving in the same direction.

ahmadsalim · 2018-07-20T09:43:58Z

I think for convenience the input shapes to distributions can be more flexible (as I also stated in the comment of the annotated example).

E.g. constants like 2 can be converted to arrays of shape (1,1) (so [[2]]) and simple vectors e.g. np.ones(2) with shape (2,) can be converted to arrays of shape (2,1) (so [[1], [1]]) implicitly.

ahmadsalim · 2018-09-14T09:20:13Z

So, I think that the tensor shapes supported by Pyro seem somewhat intuitive: http://pyro.ai/examples/tensor_shapes.html . Maybe this is something to be inspired by? 😄

junpenglao · 2018-09-14T14:48:49Z

Yes, I think they follow the design of Tensorflow distribution, which has a pretty clean design.

ColCarroll closed this as completed Jul 17, 2018

ahmadsalim mentioned this issue Jul 20, 2018

WIP Refactor Distribution and random variables #2833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pm.sample_prior_predictive` fails when model contains a mixture-based distribution #3101

`pm.sample_prior_predictive` fails when model contains a mixture-based distribution #3101

ahmadsalim commented Jul 17, 2018

junpenglao commented Jul 17, 2018

ahmadsalim commented Jul 17, 2018

ColCarroll commented Jul 17, 2018

ColCarroll commented Jul 17, 2018 •

edited

Loading

junpenglao commented Jul 17, 2018

ahmadsalim commented Jul 18, 2018

junpenglao commented Jul 18, 2018

ahmadsalim commented Jul 18, 2018

ColCarroll commented Jul 18, 2018

ahmadsalim commented Jul 20, 2018

ahmadsalim commented Jul 20, 2018

ahmadsalim commented Jul 20, 2018

ahmadsalim commented Sep 14, 2018

junpenglao commented Sep 14, 2018

pm.sample_prior_predictive fails when model contains a mixture-based distribution #3101

pm.sample_prior_predictive fails when model contains a mixture-based distribution #3101

Comments

ahmadsalim commented Jul 17, 2018

Versions and main components

junpenglao commented Jul 17, 2018

ahmadsalim commented Jul 17, 2018

ColCarroll commented Jul 17, 2018

ColCarroll commented Jul 17, 2018 • edited Loading

junpenglao commented Jul 17, 2018

ahmadsalim commented Jul 18, 2018

junpenglao commented Jul 18, 2018

ahmadsalim commented Jul 18, 2018

ColCarroll commented Jul 18, 2018

ahmadsalim commented Jul 20, 2018

ahmadsalim commented Jul 20, 2018

ahmadsalim commented Jul 20, 2018

ahmadsalim commented Sep 14, 2018

junpenglao commented Sep 14, 2018

`pm.sample_prior_predictive` fails when model contains a mixture-based distribution #3101

`pm.sample_prior_predictive` fails when model contains a mixture-based distribution #3101

ColCarroll commented Jul 17, 2018 •

edited

Loading