Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape broadcast error in sample_prior_predictive #3481

Closed
rpgoldman opened this issue May 16, 2019 · 9 comments · Fixed by #3509
Closed

Shape broadcast error in sample_prior_predictive #3481

rpgoldman opened this issue May 16, 2019 · 9 comments · Fixed by #3509

Comments

@rpgoldman
Copy link
Contributor

Description of your problem

When trying to sample the prior predictive -- for a model which can be successfully sampled using NUTS, and whose resulting trace can be successfully used with sample_posterior_predictive() -- I get an error broadcasting with different shapes (see below).

Please provide a minimal, self-contained, and reproducible example.

import pickle
import pymc3 as pm

with open('model.pickle', 'rb') as file:
    model = pickle.load(file)

with model:
    pre_trace = pm.sample_prior_predictive()

Pickled model may be found here.

Please provide the full traceback.

ValueError                                Traceback (most recent call last)
~/projects/xplan/xplan-experiment-analysis/sample_prior_predictive_error.py in <module>
      8 
      9 with model:
---> 10     pre_trace = pm.sample_prior_predictive()

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymc3/sampling.py in sample_prior_predictive(samples, model, vars, var_names, random_seed)
   1320     names = get_default_varnames(model.named_vars, include_transformed=False)
   1321     # draw_values fails with auto-transformed variables. transform them later!
-> 1322     values = draw_values([model[name] for name in names], size=samples)
   1323 
   1324     data = {k: v for k, v in zip(names, values)}

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    393                                         point=point,
    394                                         givens=temp_givens,
--> 395                                         size=size)
    396                     givens[next_.name] = (next_, value)
    397                     drawn[(next_, size)] = value

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    579                         else:
    580                             dist_tmp.shape = val.shape
--> 581                 return dist_tmp.random(point=point, size=size)
    582             else:
    583                 return param.distribution.random(point=point, size=size)

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymc3/distributions/continuous.py in random(self, point, size)
    668             [self.mu, self.sigma, self.lower, self.upper], point=point, size=size)
    669         return generate_samples(stats.truncnorm.rvs,
--> 670                                 a=(a_v - mu_v)/std_v,
    671                                 b=(b_v - mu_v) / std_v,
    672                                 loc=mu_v,

ValueError: operands could not be broadcast together with shapes (500,1031) (500,) 

Additional Information

If it helps, the observed variable is a TruncatedNormal, which has failed before, but I'm running with master as of today, and this is supposed to be fixed, AFAICT.

Versions and main components

  • PyMC3 Version: 3.6 from source
  • Theano Version: 1.0.4
  • Python Version: 3.7.3
  • Operating system: MacOS Mojave
  • How did you install PyMC3: pip from Github source
@rpgoldman
Copy link
Contributor Author

A little more information. This is the random variable in question:

pm.TruncatedNormal("obs", mu=pred + means_vector, sd=err_sd, 
                   observed=self.df['mean_log_gfp_live'].to_numpy(),
                   lower=0.0, upper=12.0)

The error is

ValueError: operands could not be broadcast together with shapes (500,1031) (500,) 

and the (default) number of samples is 500. There are 1031 observations. So perhaps it's the observations that are causing the issue?

@lucianopaz
Copy link
Contributor

Thanks for reporting! The problem is happening before entering generate_samples, which had the generic fix implemented. The error happens because of the substractions and divisions to get a and b. I'll fix this soon.

@rpgoldman
Copy link
Contributor Author

So do we need to take the transpose of the (a_v - mu_v) and (b_v - mu_v) terms to make this work?

Or is it: ((a_v - mu_v).T / std_v).T ?

@lucianopaz
Copy link
Contributor

No, it's a bit more complicated, but all the required functionality is in shape_utils.py and distribution.py. There are two ways to solve this problem:

  1. Use shape_utils.broadcast_distribution_samples (or one of the other similar functions in the module) on the output of draw_values.
  2. Instead of passing self.lower and self.upper to draw_values we could pass (self.lower-self.mu)/self.sigma and (self.upper-self.mu)/self.sigma. That would give us aand b with the right shape directly.

@rpgoldman
Copy link
Contributor Author

2. Instead of passing self.lower and self.upper to draw_values we could pass (self.lower-self.mu)/self.sigma and (self.upper-self.mu)/self.sigma. That would give us aand b with the right shape directly.

So:

        mu_v, std_v, a_v, b_v = draw_values(
            [self.mu, self.sigma, self.lower, self.upper], point=point, size=size)

Becomes

        mu_v, std_v, a_v, b_v = draw_values(
            [self.mu, self.sigma, (self.lower - self.mu)/self.sigma, (self.upper - self.mu)/self.sigma], point=point, size=size)

and then we do:

       return generate_samples(stats.truncnorm.rvs,
                                a=a_v, # change here
                                b=b_v, # and here...
                                loc=mu_v,
                                scale=std_v,
                                dist_shape=self.shape,
                                size=size,
                                )

If that's right, I can try to test it out.

@lucianopaz
Copy link
Contributor

Yeah, both options will work. Upon further thought I think that option 1 is the best. Option 2 will always recompile the theano function because it will use different tensors each time random is called. Do that will make things slower.

If you want to try to solve this, take a look at the documentation in shape_utils.py and also look at the new shape_utils.rst in the docs. Those should help you understand what caused this error and how it is addressed in other places of distribution.py.

@lucianopaz
Copy link
Contributor

lucianopaz commented May 17, 2019

@rpgoldman, I realize that I might have discouraged you by saying that option 1 should be used, and that it involved reading through new documentation and docstrings in the source. That wasn't my intention, the fix using option 1 should be as simple as this:

mu_v, std_v, a_v, b_v = broadcast_distribution_samples(
    draw_values(
        [self.mu, self.sigma, self.lower, self.upper],
        point=point,
        size=size),
    size=size,
)

The only extra work would be to write some new tests, track down other distributions in continuous.py and discrete.py that could suffer from the same problem, and finally add all of the fixes to the release notes. I wont have time to do it this weekend, so if you are willing to give it a shot, you're more than welcome!

@rpgoldman
Copy link
Contributor Author

I will give a try at that patch (once I have my other PR ready for merge).
I will probably need help identifying new tests, and I definitely don't know what other distributions would suffer this problem. But I will see what I can do, especially if you don't mind advising me.

For some reason, I didn't get an email notification of your comments -- sorry to take so long to respond.

rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 20, 2019
@lucianopaz
Copy link
Contributor

I'll try to explain what is the cause of the error you saw, to help you understand why other distributions might be prone to this. Consider the following model, inspired by your example that had the error:

with pm.Model():
    means_vector = pm.Normal('means', 0, 1, len(self.df))
    err_sd = pm.Halfnormal('err_sd', 1)
    obs = pm.TruncatedNormal("obs", mu=pred + means_vector, sd=err_sd, 
                            observed=self.df['mean_log_gfp_live'].to_numpy(),
                            lower=0.0, upper=12.0)

The RVs means_vector and err_sd have shapes (len(self.df),) and tuple() respectively. obs will have its shape determined by the observed parameter, and thus has shape (len(self.df),). All is well for pm.sample because all of the RV shapes broadcast properly. We kind of use shape all around the place so it gets confusing what is being intended in each context. When you define the model, the shapes represent the random variable's core dimension shapes (tensorflow probability has a better notion of shapes and names them sample_shape, batch_shape and event_shape).

The problem comes when you want to draw a sample (really more than 1 sample) from a given distribution. If you call draw_values([means_vector], size=2) you will get an array of results that has the following shape: (2, len(self.df)). The 2 comes from the size parameter, and the rest are the distribution's core dimensions. Now, if you do the same for err_sd you get an array of shape (2,), because err_sd is a scalar RV and has empty core dimensions. What happens now? If you try to add or divide the results, you'll see that an array of shape (2, len(self.df)) does not broadcast with an array of shape (2,)!

So what can we do? The lucky thing is that the size part of the shape is always prepended to the core dimension's shape. So we wrote the module shape_utils.py to do broadcasting across the core dimensions, while ignoring the size prepend, and then adding it to the final result. All of this is done under the hood in generate_samples and in draw_value_. If you operate on values drawn from RVs outside of these, then you have to handle the broadcasting yourself with the machinery provided in shape_utils.py. I know that we have to write this up as a clean tutorial notebook because it's one of the main confusion points we have, but I haven't got the time to do it yet.

rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 21, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 23, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 23, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 24, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 28, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue May 30, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue Jun 3, 2019
rpgoldman added a commit to rpgoldman/pymc3 that referenced this issue Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants