Wishart fails with valid arguments #538

fonnesbeck · 2014-05-02T19:27:22Z

Models that use the Wishart as a prior currently fail with valid parameters. For example,

s = pm.Wishart('s', 4, 3, np.eye(3))

Fails with the following:

TypeError: Wrong number of dimensions: expected 0, got 2 with shape (3, 3)

I've tried a variety of 3x3 matrices, but all yield similar errors.

PS - I have a pull request that removed the redundant second parameter.

The text was updated successfully, but these errors were encountered:

jsalvatier · 2014-05-02T21:32:50Z

I think the issue is that you didn't specify 'shape'. That's definitely
something we should be able to infer thought.

On Fri, May 2, 2014 at 12:27 PM, Chris Fonnesbeck
notifications@github.comwrote:

Models that use the Wishart as a prior currently fail with valid
parameters. For example,

s = pm.Wishart('s', 4, 3, np.eye(3))

Fails with the following:

TypeError: Wrong number of dimensions: expected 0, got 2 with shape (3, 3)

I've tried a variety of 3x3 matrices, but all yield similar errors.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/538
.

fonnesbeck · 2014-05-03T01:19:02Z

Yeah, I tried that, and got:

---------------------------------------------------------------------------
AsTensorError                             Traceback (most recent call last)
<ipython-input-73-6f02cca3962c> in <module>()
      1 with pm.Model():
----> 2     s = pm.Wishart('s', 4, 3, np.eye(3), shape=(3,3))

/Users/fonnescj/Code/pymc/pymc/distributions/distribution.pyc in __new__(cls, name, *args, **kwargs)
     17             data = kwargs.pop('observed', None)
     18             dist = cls.dist(*args, **kwargs)
---> 19             return model.Var(name, dist, data)
     20         elif name is None:
     21             return object.__new__(cls) #for pickle

/Users/fonnescj/Code/pymc/pymc/model.pyc in Var(self, name, dist, data)
    142         """
    143         if data is None:
--> 144             var = FreeRV(name=name, distribution=dist, model=self)
    145             self.free_RVs.append(var)
    146         else:

/Users/fonnescj/Code/pymc/pymc/model.pyc in __init__(self, type, owner, index, name, distribution, model)
    321             self.tag.test_value = np.ones(
    322                 distribution.shape, distribution.dtype) * distribution.default()
--> 323             self.logp_elemwiset = distribution.logp(self)
    324             self.model = model
    325 

/Users/fonnescj/Code/pymc/pymc/distributions/multivariate.pyc in logp(self, X)
    179              2) - n * log(IVI) - 2 * multigammaln(p, n / 2)) / 2,
    180 
--> 181             all(n > p - 1))

/Library/Python/2.7/site-packages/theano/tensor/basic.pyc in all(x, axis, keepdims)
   5078 
   5079 def all(x, axis=None, keepdims=False):
-> 5080     out = elemwise.All(axis)(x)
   5081 
   5082     if keepdims:

/Library/Python/2.7/site-packages/theano/gof/op.pyc in __call__(self, *inputs, **kwargs)
    409         """
    410         return_list = kwargs.pop('return_list', False)
--> 411         node = self.make_node(*inputs, **kwargs)
    412         if self.add_stack_trace_on_call:
    413             self.add_tag_trace(node)

/Library/Python/2.7/site-packages/theano/tensor/elemwise.pyc in make_node(self, input)
   1639 
   1640     def make_node(self, input):
-> 1641         input = as_tensor_variable(input)
   1642         if input.dtype not in ["int8", "uint8"]:
   1643             input = theano.tensor.neq(input, 0)

/Library/Python/2.7/site-packages/theano/tensor/basic.pyc in as_tensor_variable(x, name, ndim)
    181     if isinstance(x, bool):
    182         raise AsTensorError(
--> 183             "Cannot cast True or False as a tensor variable. Please use 1 or "
    184             "0. This error might be caused by using the == operator on "
    185             "Variables. v == w does not do what you think it does, "

AsTensorError: Cannot cast True or False as a tensor variable. Please use 1 or 0. This error might be caused by using the == operator on Variables. v == w does not do what you think it does, use theano.tensor.eq(v, w) instead.

It does not seem to like the boundary check.

jsalvatier · 2014-05-03T01:32:16Z

Hmm that seems bad. Do we have tests for it?
On May 2, 2014 6:19 PM, "Chris Fonnesbeck" notifications@github.com wrote:

Yeah, I tried that, and got:

AsTensorError Traceback (most recent call last)
in ()
1 with pm.Model():
----> 2 s = pm.Wishart('s', 4, 3, np.eye(3), shape=(3,3))

/Users/fonnescj/Code/pymc/pymc/distributions/distribution.pyc in new(cls, name, _args, *_kwargs)
17 data = kwargs.pop('observed', None)
18 dist = cls.dist(_args, *_kwargs)
---> 19 return model.Var(name, dist, data)
20 elif name is None:
21 return object.new(cls) #for pickle

/Users/fonnescj/Code/pymc/pymc/model.pyc in Var(self, name, dist, data)
142 """
143 if data is None:
--> 144 var = FreeRV(name=name, distribution=dist, model=self)
145 self.free_RVs.append(var)
146 else:

/Users/fonnescj/Code/pymc/pymc/model.pyc in init(self, type, owner, index, name, distribution, model)
321 self.tag.test_value = np.ones(
322 distribution.shape, distribution.dtype) * distribution.default()
--> 323 self.logp_elemwiset = distribution.logp(self)
324 self.model = model
325

/Users/fonnescj/Code/pymc/pymc/distributions/multivariate.pyc in logp(self, X)
179 2) - n * log(IVI) - 2 * multigammaln(p, n / 2)) / 2,
180
--> 181 all(n > p - 1))

/Library/Python/2.7/site-packages/theano/tensor/basic.pyc in all(x, axis, keepdims)
5078
5079 def all(x, axis=None, keepdims=False):
-> 5080 out = elemwise.All(axis)(x)
5081
5082 if keepdims:

/Library/Python/2.7/site-packages/theano/gof/op.pyc in call(self, _inputs, *_kwargs)
409 """
410 return_list = kwargs.pop('return_list', False)
--> 411 node = self.make_node(_inputs, *_kwargs)
412 if self.add_stack_trace_on_call:
413 self.add_tag_trace(node)

/Library/Python/2.7/site-packages/theano/tensor/elemwise.pyc in make_node(self, input)
1639
1640 def make_node(self, input):
-> 1641 input = as_tensor_variable(input)
1642 if input.dtype not in ["int8", "uint8"]:
1643 input = theano.tensor.neq(input, 0)

/Library/Python/2.7/site-packages/theano/tensor/basic.pyc in as_tensor_variable(x, name, ndim)
181 if isinstance(x, bool):
182 raise AsTensorError(
--> 183 "Cannot cast True or False as a tensor variable. Please use 1 or "
184 "0. This error might be caused by using the == operator on "
185 "Variables. v == w does not do what you think it does, "

AsTensorError: Cannot cast True or False as a tensor variable. Please use 1 or 0. This error might be caused by using the == operator on Variables. v == w does not do what you think it does, use theano.tensor.eq(v, w) instead.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/538#issuecomment-42092620
.

fonnesbeck · 2014-05-03T02:56:47Z

We have one, which it passes. Does my example fail for you?

jsalvatier · 2014-05-03T22:27:20Z

Yup, it fails for me too.

On Fri, May 2, 2014 at 7:56 PM, Chris Fonnesbeck
notifications@github.comwrote:

We have one, which it passes. Does my example fail for you?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/538#issuecomment-42094468
.

twiecki · 2014-08-17T21:13:13Z

#547 has a bunch of changes to Wishart. Might that fix it? @kadeng?

kadeng · 2014-08-18T06:41:01Z

@twiecki: You should definitely take a look at the changes I made there. I think that the current Wishart Implementation does not really work. I found discrepancies between the likelihood formula in the code and the formulas I found in literature (for example, in Gelman's Bayesian Data Analysis and in Murphy's Machine Learning, a probabilistic perspective). What's problematic there is that different authors seem to use different parametrizations and seem to name the parameters inconsistently, so in the end I was pretty confused.

But it's not like I have spent a lot of time to validate that my fix in turn is correct and works, so please don't rely on it. Use it as a suggestion :) The following paper is also an interesting read when it comes to actually using the Wishart or inverse Wishart as prior for the precision or covariance matrix for MVNs: http://ba.stat.cmu.edu/journal/2013/vol08/issue02/huang.pdf

wgreenr · 2015-03-29T17:57:57Z

Recently have tried to sample from Wishart prior using:

import pymc3 as pm
import numpy as np

prec_prior = np.array([[ 25.3968254,   -1.58730159],
                       [ -1.58730159,   6.34920635]])
with pm.Model() as model:
    prec = pm.Wishart('prec', 100.0, prec_prior / 100.0, shape=(2, 2))
    step = pm.Metropolis()
    trace = pm.sample(10000, step)

And instead I obtained clearly diverging sampling process, with some entries of sampled matrices moving towards negative infinity, and not in any way resembling the prior precision matrix:

Then I decided to check whether my notions about Wishart distribution is fundamentally flawed, and using the code from http://www.mit.edu/~mattjj/released-code/hsmm/stats_util.py,
I obtained samples which look as one might expect, given a quite narrow prior centered at precision matrix prec_prior:

    sample_wishart(prec_prior / 100.0, 100.0)
    array([[ 26.37928925,  -3.88792305],
           [ -3.88792305,   7.6372514 ]])

The question is: does this behavior of MCMC sampler is something expected, or is this a bug, and how to properly sample from Wishart?

kiudee · 2015-03-29T18:27:08Z

Yes, this is a bug and the same behavior I observed with the Wishart distribution (see issue #672).
If you can find the source of the error in the current code, we would greatly appreciate it.

I recommend taking a look at the LKJ distribution, which is a prior distribution for the correlation matrix instead of the covariance/precision matrix.
This has the advantage that you can specify separate priors for the standard deviations and the correlations.

wgreenr · 2015-03-29T19:22:45Z

Will run some tests, Wishart logp seems to look good

twiecki · 2015-03-29T19:30:13Z

Seems like we're not enforcing the tau > 0 constraint.

kiudee · 2015-03-29T21:44:05Z

~~Indeed, this is the case.~~ (See below)

I added the constraint and tried it with the given example:

import numpy as np
import pymc3 as pm
import scipy.optimize as opt
prec_prior = np.array([[ 25.3968254,   -1.58730159],
                       [ -1.58730159,   6.34920635]])
with pm.Model() as model:
    prec = pm.Wishart('prec', 100.0, prec_prior / 100.0, shape=(2, 2))

    start = pm.find_MAP(fmin=opt.fmin_powell)
    step = pm.Metropolis()
    trace = pm.sample(10000, step, start)

Here is the resulting trace (without any thinning):

PS: I’m sorry, it is too late here. I just realized that this is not true. The matrices need to be positive definite.

twiecki · 2015-03-30T03:54:51Z

Oh, right. I experimented a bit with all(gt(eig(X), 0)) as a condition but didn't get it to work yet.

twiecki · 2015-04-18T03:46:49Z

It should actually read all(gt(eigh(X)[0], 0)) which at least does not raise an exception. However, I'm still seeing the divergent sampling... maybe @karlnapf can advise on sampling from the Wishart distribution.

twiecki · 2015-04-18T09:56:04Z

The other thing we have to enforce is the symmetry:

        return bound(
            ((n - p - 1) * log(IXI) - trace(matrix_inverse(V).dot(X)) -
                n * p * log(2) - n * log(IVI) - 2 * multigammaln(n / 2., p)) / 2,
            gt(n, (p - 1)),
            all(gt(eigh(X)[0], 0)),
            eq(X, X.T)
        )

Unfortunately, it now becomes quite obvious that trying to sample this matrix naively with these constraints is impossible, so we should probably try to sample only the upper triangular and use the cholesky decomposition.

kadeng · 2015-04-18T10:11:23Z

Sampling from the Wishart and Inverse Wishart seems to be something that
has been the attention of some rather sophisticated analysis to date. Make
sure you take at least a quick look at these ..

http://www.math.wustl.edu/~sawyer/hmhandouts/Wishart.pdf
http://www.gwu.edu/~forcpgm/YuChengKu-030510final-WishartYu-ChengKu.pdf

best,

Kai

On 18 April 2015 at 11:56, Thomas Wiecki notifications@github.com wrote:

The other thing we have to enforce is the symmetry:
    return bound(
        ((n - p - 1) * log(IXI) - trace(matrix_inverse(V).dot(X)) -
            n * p * log(2) - n * log(IVI) - 2 * multigammaln(n / 2., p)) / 2,
        gt(n, (p - 1)),
        all(gt(eigh(X)[0], 0)),
        eq(X, X.T)
    )
Unfortunately, it now becomes quite obvious that trying to sample this
matrix naively with these constraints is impossible, so we should probably
try to sample only the upper triangular and use the cholesky decomposition.

—
Reply to this email directly or view it on GitHub
#538 (comment).

twiecki · 2015-04-18T10:29:17Z

Thanks @kadeng. I'm starting to wonder if we shouldn't just ditch the Wishart. @betanalpha suggests here: https://www.youtube.com/watch?v=xWQpEAyI5s8 that it's a terrible choice as a prior to begin with and suggests using the LKJ which we already have working.

betanalpha · 2015-04-18T10:42:51Z

Referencing a youtube video -- is that a first? There are two problems going on with the Wishart.

Firstly, because of the positive-definite and symmetric constraints the probability of generating a valid sample by varying every element is zero (technically nonzero on floating point, but small enough to make it unfeasible). Really the only way to sample from these objects is to transform the target distribution to the N * (N + 1) / 2 dimensional unconstrained space, sample there, and then map the samples back to the constrained space. For details see the "Transformations of Constrained Variables" chapter in the Stan manual.

Secondly, the Wishart distribution is very heavy-tailed which causes all kinds of problems for naive samplers. Formally heavy tails are obstructions to geometric ergodicity which is essentially the minimal condition needed to trust MCMC expectations in practice. More intuitively, when the tails of the target distribution are heavy the sampler drifts around in the tails and can't find its way back into the bulk of the distribution. NUTS will technically work okay, but only if you let the depth of the tree grow very, very large and incur the resulting computational cost. LKJ has much nicer tails and consequently pairs much better with most MCMC samplers. We'd also argue that the shape of LKJ priors are more inline with modeler's intuitions than Wishart (in particular the really nice interpretation is terms of the Cholesky factor), hence why we recommend them as defaults.

The Wishart has never worked correctly and produced divergent samples. One of the reasons was that we did not enforce the matrix to be symmetric nor positive semi-definite. Now, however, it is impossible to randomly sample a valid matrix so we raise a warning that the Wishart is basically unusable currently and to instead use the LKJ prior which has many desriable properties. For more discussions, see #538.

twiecki · 2015-04-18T11:22:44Z

Many thanks for chiming in @betanalpha, I actually meant to connect with you to get some feedback on pymc3 if you don't mind. One thing that stan does very well is the automatic transformations. I think it wouldn't be too hard to implement something similar here.

I now added a warning not to use the Wishart unless we fix it using the transform method.

Here's an example of how one can use the LKJ prior in pymc3:
https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/LKJ_correlation.py

wgreenr · 2015-04-18T13:18:11Z

Thanks everyone, especially @betanalpha and @twiecki, for insightful comments! It seems that LKG is the way to go.

twiecki · 2015-04-21T04:58:23Z

I tried the transform approach as suggested by @betanalpha and tried to follow the Stan docs. This is the model I came up with. @betanalpha if you have a sec it'd be great to get your input on whether you think this is correctly set up, the results make sense but this is the first time for me to work with multivariate transformations.

import pymc3 as pm
import numpy as np
import theano
import theano.tensor as T
import scipy.stats
import matplotlib.pyplot as plt

covariance = np.matrix([[2, .5],
                        [.5, 1]])

prec = np.linalg.inv(covariance)

mean = [.5, 1]
data = scipy.stats.multivariate_normal(mean, covariance).rvs(5000)

plt.scatter(data[:, 0], data[:, 1])

S = np.matrix([[1, .5],
               [.5, 1]])
L = scipy.linalg.cholesky(S)
nu = 5
with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=1, shape=2)
    c = T.sqrt(pm.ChiSquared('c', nu - np.arange(2, 4), shape=2))
    z = pm.Normal('z', 0, 1)
    A = T.stacklists([[c[0], 0], 
                      [z, c[1]]])

    # L * A * A.T * L.T ~ Wishart(L*L.T, nu)
    wishart = pm.Deterministic('wishart', T.dot(T.dot(T.dot(L, A), A.T), L.T))
    cov = pm.Deterministic('cov', T.nlinalg.matrix_inverse(wishart))
    lp = pm.MvNormal('likelihood', mu=mu, tau=wishart, observed=data)

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(500, step)

pm.traceplot(trace[100:]);

print np.mean(trace['wishart'], axis=0)
print prec

print np.mean(trace['cov'], axis=0)
print covariance

Output:

[[ 0.57725294 -0.29173294]
 [-0.29173294  1.16447192]]
[[ 0.57142857 -0.28571429]
 [-0.28571429  1.14285714]]

[[ 1.9955087   0.50381087]
 [ 0.50381087  0.9889754 ]]
[[ 2.   0.5]
 [ 0.5  1. ]]

twiecki · 2015-04-26T19:13:23Z

I refactored this now into a function with the same syntax as a distribution. However, it's a bit odd as it adds RVs itself to the model. I think it's pretty convenient though. @jsalvatier @fonnesbeck thoughts on adding this?

import pymc3 as pm
import numpy as np
import theano
import theano.tensor as T
import scipy.stats
import matplotlib.pyplot as plt

covariance = np.matrix([[2, .5, -.5],
                        [.5, 1.,  0.],
                        [-.5, 0., 0.5]])

prec = np.linalg.inv(covariance)

mean = [.5, 1, .2]
data = scipy.stats.multivariate_normal(mean, covariance).rvs(5000)

plt.scatter(data[:, 0], data[:, 1])

S = np.eye(3)
L = scipy.linalg.cholesky(S)
nu = 5

def WishartBartlett(name, S, nu, is_cholesky=False, return_cholesky=False):
    """
    Bartlett decomposition of the Wishart distribution. As the Wishart
    distribution requires the matrix to be symmetric positive semi-definite
    it is impossible for MCMC to ever propose acceptable matrices.

    Instead, we can use the Barlett decomposition which samples a lower
    diagonal matrix. Specifically:

    If L ~ [[sqrt(c_1), 0, ...],
             [z_21, sqrt(c_1), 0, ...],
             [z_31, z32, sqrt(c3), ...]]
    with c_i ~ Chi²(n-i+1) and n_ij ~ N(0, 1), then
    L * A * A.T * L.T ~ Wishart(L * L.T, nu)

    See http://en.wikipedia.org/wiki/Wishart_distribution#Bartlett_decomposition
    for more information.

    :Parameters:
      S : ndarray
        p x p positive definite matrix
      nu : int
        Degrees of freedom, > dim(S).
      is_cholesky : bool (default=False)
        Input matrix S is already Cholesky decomposed as S.T * S
      return_cholesky : bool (default=False)
        Only return the Cholesky decomposed matrix.

    :Note:
      This is not a standard Distribution class but follows a similar
      interface. Besides, the Wishart distribution, it will add RVs 
      c and z to your model which make up the matrix.
    """

    L = S if is_cholesky else scipy.linalg.cholesky(S)

    diag_idx = np.diag_indices_from(S)
    tril_idx = np.tril_indices_from(S, k=-1)
    n_diag = len(diag_idx[0])
    n_tril = len(tril_idx[0])
    c = T.sqrt(pm.ChiSquared('c', nu - np.arange(2, 2+n_diag), shape=n_diag))
    z = pm.Normal('z', 0, 1, shape=n_tril)
    # Construct A matrix
    A = T.zeros(S.shape, dtype=np.float32)
    A = T.set_subtensor(A[diag_idx], c)
    A = T.set_subtensor(A[tril_idx], z)

    # L * A * A.T * L.T ~ Wishart(L*L.T, nu)
    if return_cholesky:
        return pm.Deterministic(name, T.dot(L, A))
    else:
        return pm.Deterministic(name, T.dot(T.dot(T.dot(L, A), A.T), L.T))

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=1, shape=3)

    prec = WishartBartlett('prec', S, nu)
    cov = pm.Deterministic('cov', T.nlinalg.matrix_inverse(prec))

    lp = pm.MvNormal('likelihood', mu=mu, tau=prec, observed=data)

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(500, step)

pm.traceplot(trace[100:]);

betanalpha · 2015-04-27T21:53:15Z

Sorry for the late reply -- seems reasonable although I can't comment on the integration of the transformation into your sampler.

fonnesbeck added the defects label May 2, 2014

twiecki mentioned this issue Feb 5, 2015

Problem in MvNormal: Matrices not aligned #672

Closed

twiecki mentioned this issue Apr 20, 2015

Symmetry, non-negative-definiteness not enforced for Wishart sampling #395

Closed

twiecki mentioned this issue Apr 22, 2015

Stalling NUTS #540

Closed

twiecki mentioned this issue Apr 27, 2015

Add Wishart with Bartlett decomposition. #701

Merged

springcoil closed this as completed in #701 Jun 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wishart fails with valid arguments #538

Wishart fails with valid arguments #538

fonnesbeck commented May 2, 2014

jsalvatier commented May 2, 2014

fonnesbeck commented May 3, 2014

jsalvatier commented May 3, 2014

fonnesbeck commented May 3, 2014

jsalvatier commented May 3, 2014

twiecki commented Aug 17, 2014

kadeng commented Aug 18, 2014

wgreenr commented Mar 29, 2015

kiudee commented Mar 29, 2015

wgreenr commented Mar 29, 2015

twiecki commented Mar 29, 2015

kiudee commented Mar 29, 2015

twiecki commented Mar 30, 2015

twiecki commented Apr 18, 2015

twiecki commented Apr 18, 2015

kadeng commented Apr 18, 2015

twiecki commented Apr 18, 2015

betanalpha commented Apr 18, 2015

twiecki commented Apr 18, 2015

wgreenr commented Apr 18, 2015

twiecki commented Apr 21, 2015

twiecki commented Apr 26, 2015

betanalpha commented Apr 27, 2015

Wishart fails with valid arguments #538

Wishart fails with valid arguments #538

Comments

fonnesbeck commented May 2, 2014

jsalvatier commented May 2, 2014

fonnesbeck commented May 3, 2014

jsalvatier commented May 3, 2014

fonnesbeck commented May 3, 2014

jsalvatier commented May 3, 2014

twiecki commented Aug 17, 2014

kadeng commented Aug 18, 2014

wgreenr commented Mar 29, 2015

kiudee commented Mar 29, 2015

wgreenr commented Mar 29, 2015

twiecki commented Mar 29, 2015

kiudee commented Mar 29, 2015

twiecki commented Mar 30, 2015

twiecki commented Apr 18, 2015

twiecki commented Apr 18, 2015

kadeng commented Apr 18, 2015

twiecki commented Apr 18, 2015

betanalpha commented Apr 18, 2015

twiecki commented Apr 18, 2015

wgreenr commented Apr 18, 2015

twiecki commented Apr 21, 2015

twiecki commented Apr 26, 2015

betanalpha commented Apr 27, 2015