Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predictive sampling fails with multivariate posterior #5002

Closed
falkmielke opened this issue Sep 17, 2021 · 7 comments
Closed

predictive sampling fails with multivariate posterior #5002

falkmielke opened this issue Sep 17, 2021 · 7 comments

Comments

@falkmielke
Copy link

falkmielke commented Sep 17, 2021

description

  • goal: pm.sample_posterior_predictive() from a model+trace
  • but: draw a different number of samples than originally observed
  • this fails when using a multivariate likelihood/posterior/observed with LKJ/Cholesky prior.

Thanks in advance for looking into this!

additional information
cf. https://discourse.pymc.io/t/prediction-setting-data-fails-with-multivariate-observed/8011

I suspect some data-shape-dependent theano object is initialized with the LKJ prior.
This is unexpected; I don't see how it depends on the data (but I'm uninformed).
A quick solution would be to convert that internal to a shared "pm.Data" which can be accessed ex post.

example
(note the "KILLSWITCH")

import os as os
import numpy as np
import pandas as pd
import pymc3 as pm
import scipy.stats as stats
import theano as th
import theano.tensor as tt
import matplotlib.pyplot as plt


### simulated data
print ('#### Simulation ####')
n_observations = 99
n_observables = 2

mean = [-0.5, 0.2]
cov = [[0.8, -0.6], [-0.6, 1.4]]
obs0, obs1 = np.random.multivariate_normal(mean, cov, n_observations).T

print ('obs corr:', stats.pearsonr(obs0, obs1))


# combining the data
data = pd.DataFrame.from_dict({ \
                                  'index': np.arange(n_observations) \
                                , 'obs0': obs0 \
                                , 'obs1': obs1 \
                              })

### Model Construction and sampling
print ('#### Modeling ####')
# shared theano objects (pm.Data)
shared = {}

with pm.Model() as model:

    # using pm.DAta; yet the error also occurs when using th.shared directly
    shared['interecept'] = pm.Data(f'intercept_data', np.ones((n_observations, 1)))

    ## intercept
    intercept = pm.Normal( 'intercept' \
                         , mu = 0., sd = 1. \
                         , shape = (1, n_observables) \
                        )
    estimator = tt.dot(shared['interecept'], intercept)


    ## observable
    if False: # <- this is the "KILLSWITCH"
        # individual observables
        # with this, prediction WORKS

        # residual
        residual = pm.HalfCauchy('residual', 1., shape = (1,n_observables))

        posterior = pm.Normal(  'posterior' \
                                , mu = estimator \
                                , sd = residual \
                                , observed = data.loc[:, ['obs0', 'obs1']].values \
                              )
    else:
        # multivariate observables
        # with this, prediction FAILS

        # LKJ Prior
        sd_dist = pm.HalfNormal.dist(1.)
        packed_cholesky = pm.LKJCholeskyCov(  'packed_cholesky' \
                                              , n = n_observables \
                                              , eta = 1. \
                                              , sd_dist = sd_dist \
                                            )

        # compute the observables covariance and correlation matrix
        cholesky_matrix = pm.expand_packed_triangular(n_observables, packed_cholesky, lower = True)

        # posterior|likelihood|observables
        posterior = pm.MvNormal(  'posterior' \
                                , mu = estimator \
                                , chol = cholesky_matrix \
                                , observed = data.loc[:, ['obs0', 'obs1']].values \
                               )

    # inference
    trace = pm.sample(  draws = 2**10, tune = 2**10 \
                      # , cores = 4, chains = 4 \
                      , return_inferencedata = True \
                      )

# show outcome
print (pm.summary(trace))
pm.plot_trace(trace)
plt.show()

### predictive sampling
print ('#### Prediction ####')

# CRITICAL: the desired number of predictions is different from the number of observations
n_predictions = 57

# adjust intercept shape
probe_data = {}
probe_data['intercept_data'] = np.ones((n_predictions, 1))

# how often to repeat the n_predictions draws
n_repeats = 10

with model:
    pm.set_data(probe_data)
    prediction = pm.sample_posterior_predictive(trace, samples = n_repeats)

print (prediction)

traceback:

Complete error traceback
Traceback (most recent call last):--------------------------------------------------------------------| 0.00% [0/10 00:00<00:00]
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/shape_utils.py", line 174, in broadcast_dist_samples_shape
    broadcast_shape = shapes_broadcasting(*sp_shapes, raise_exception=True)
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/shape_utils.py", line 101, in shapes_broadcasting
    raise ValueError(
ValueError: Supplied shapes (57, 2), (99, 2) do not broadcast together

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/distribution.py", line 971, in _draw_value
    return dist_tmp.random(point=point, size=size)
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/multivariate.py", line 275, in random
    mu = broadcast_dist_samples_to(to_shape=output_shape, samples=[mu], size=size)[0]
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/shape_utils.py", line 396, in broadcast_dist_samples_to
    samples, to_shape = get_broadcastable_dist_samples(
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/shape_utils.py", line 271, in get_broadcastable_dist_samples
    out_shape = broadcast_dist_samples_shape(p_shapes, size=size)
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/shape_utils.py", line 176, in broadcast_dist_samples_shape
    raise ValueError(
ValueError: Cannot broadcast provided shapes (57, 2), (99, 2) given size: ()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/01_kinematics/14_papio_fcas/archive/MinimalExample_Prediction.py", line 127, in <module>
    prediction = pm.sample_posterior_predictive(trace, samples = n_repeats)
  File "/usr/lib/python3.9/site-packages/pymc3/sampling.py", line 1733, in sample_posterior_predictive
    values = draw_values(vars_, point=param, size=size)
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/distribution.py", line 791, in draw_values
    value = _draw_value(next_, point=point, givens=temp_givens, size=size)
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/distribution.py", line 979, in _draw_value
    val = np.atleast_1d(dist_tmp.random(point=point, size=None))
  File "/usr/lib/python3.9/site-packages/pymc3/distributions/multivariate.py", line 276, in random
    param = np.broadcast_to(param, shape=output_shape + dist_shape[-1:])
  File "<__array_function__ internals>", line 5, in broadcast_to
  File "/usr/lib/python3.9/site-packages/numpy/lib/stride_tricks.py", line 411, in broadcast_to
    return _broadcast_to(array, shape, subok=subok, readonly=True)
  File "/usr/lib/python3.9/site-packages/numpy/lib/stride_tricks.py", line 343, in _broadcast_to
    raise ValueError('cannot broadcast a non-scalar to a scalar array')
ValueError: cannot broadcast a non-scalar to a scalar array

Versions and main components

  • PyMC3 Version: 3.11.4
  • Theano Version: 1.1.2
  • Python Version: 3.9.7 (default, Aug 31 2021, 13:28:12)
  • Operating system: arch linux
  • How did you install PyMC3: pip
@falkmielke
Copy link
Author

falkmielke commented Sep 18, 2021

I have attempted to solve this, but I fail because the code gets too complex for my simple mind to grasp;
I'm too unfamiliar with the pymc code and in particular the sampling/drawing procedure.
Yet I'd like to share some observations. Hope this helps, and thanks again for trying to improve it!

Following the traceback, I see that in distributions/multivariate.py/MvNormal.random(...), there is a shape broadcast
which tries to give an output shape from inputs for dist_shape, size and mu.
The dist_shape is the problem; it comes from self.shape of the distribution, which stays fixed in my example.
When I temporarily adjust the code to
dist_shape = to_tuple(mu.shape)
then sample_posterior_predictive works and I get output of the desired shape.

However, this is "cheating", because mu and self might have different shapes for good reasons.
edit: indeed, this does not solve the issue at all; instead it leads to shape inconsistencies in MvStudentT sampling

I also followed the drawing procedure and attempted to find out which model component is causing the trouble.
However, this might be a parallelized procedure and I would not be surprised if I am on a wrong track here.
In a minimal model, I have three params which are drawn.
The one I didn't create myself is the LKJ covariance.
I observed that _LKJCholeskyCov ends up producing an AdvancedIncSubtensor
(created via tt.advanced_set_subtensor1 in distributions/transforms.py/CholeskyCovPacked )
It might be coincidence, but the drawing seems to always crash on that one.
in distributions/distribution.py/_draw_value, it ends up with the procedure of a tt.TensorVariable instance which, according to param_ancestors, is filled with TensorConstants.

So should it not rather end up in a category like tt.TensorConstant, and not be drawn?

Or, from a conceptual standpoint, shouldn't Covariance/Cholesky stay unchanged for posterior predictive sampling?

@falkmielke
Copy link
Author

falkmielke commented Sep 20, 2021

This seems to develop into a little wasp nest for me.
I attempted to get this to work with MvStudentT instead of an MvNormal.

vals = pm.MvStudentT('vals', mu=intercept_shared, cov=cov, nu = dof, observed = data)

If I understood it correctly (cf. https://stackoverflow.com/a/41602739 ), then MvStudentT can be generated by copuling MvNormal with the nu and chi2 or gamma; and this seems to be happening in "distributions/multivariate.py/MvStudentT.random" (though I feel too illiterate to understand all the exact maths details). Hence, the MvNormal quickfix mentioned above is required to get any posterior samples out of a MvStudentT.

However, when using a nu = dof above where the prior distribution entered for nu has a shape any different from (1,), posterior sampling will fail in "distributions/multivariate.py/MvStudentT.random" because the np.random.chisquare draw does not yield the right shape.

But is this any relevant use case? I found that, on my real data, the results for nu are always very similar for each of the variates. So maybe it is stupid to give multiple nus, because there is only one logical nu for the whole parameter block? On the otherhand, I understood nu is the deegree of freedom, and will also tell me how similar to a Normal the parameter distribution is; and different variates can have a different degree of normality. So I think this is relevant. Otherwise, I appreciate explanation.

tl;dr: once at it, it would be great to make sure that anything which works with MvNormal can also work with a similar MvStudentT model.

@twiecki
Copy link
Member

twiecki commented Sep 20, 2021

Which PyMC3 version is this? It could very well be fixed on main/v4 where we reworked all of this.

@falkmielke
Copy link
Author

Dear Thomas,
I'm glad to hear there is a rework!

I'm currently on the latest installed from pip, see above:
PyMC3 Version: 3.11.4

I'll try to find and install the cutting edge version.

@falkmielke
Copy link
Author

Okay, indeed the update worked! Kudos for the rework.

Here's what I did:

  • install pymc via git clone and python setup.py develop
  • replace "theano" by "Aesara"

... and I can draw the samples!

Thanks a lot, I'll keep using the dev version.

Note that some of the docs still writes of "theano-pymc", which was confusing: that is "aesara" now?
I'm glad things are advancing with the new version!

Cheers!

@falkmielke
Copy link
Author

Sorry for a follow up:

LKJCholeskyCov seems to be dysfunctional in the latest version:
https://discourse.pymc.io/t/lkjcholeskycov-requires-dist-params/8057

I appreciate further help!

@falkmielke falkmielke reopened this Sep 21, 2021
@falkmielke
Copy link
Author

Thanks to a hint in discourse, I now know that there is a PR to solve this:
#4784

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants