ENH: speed up multivariate_normal.logpdf for nonsingular matrices #9973

dniku · 2019-03-22T21:00:11Z

NOTE: this is a draft PR. There is a potential critical issue with precision, no tests, no documentation, and no benchmarks.

This PR changes the behavior of the following functions when allow_singular=False:

multivariate_normal.logpdf
multivariate_normal.pdf
multivariate_normal_frozen.logpdf
multivariate_normal_frozen.pdf

With these changes, I observed a 35x speedup on average. A Jupyter notebook with benchmarks can be found here (executed on Dell XPS 15 9560). The central idea here is to replace eigh with the Cholesky decomposition. The code (as submitted in this PR) is equivalent to the following:

_LOG_2PI = np.log(2 * np.pi)

def multivariate_normal_logpdf(x, mean, cov, allow_singular=False):
    if allow_singular:
        # ... do the old thing ...
    else:
        rank = cov.shape[0]

        dev = x - mean

        cho = scipy.linalg.cho_factor(cov)
        prec_times_dev_t = scipy.linalg.cho_solve(cho, dev.T)

        log_det_cov = 2 * np.log(cho[0].diagonal()).sum()
        maha = (dev * prec_times_dev_t.T).sum(axis=-1)

        return -0.5 * (rank * _LOG_2PI + log_det_cov + maha)

It is very likely that the current implementation has an issue with precision. In the notebook, it can be seen that at D = 385 I generated a vector mean and a matrix cov such that the result computed by my implementation was significantly different from the one computed by the current version in Scipy (detected by np.allclose). A dump of the corresponding arrays can be found here (with m for mean and S for cov); I have not investigated yet what causes this issue.

TODO:

Resolve issue with precision
Tests
Documentation
Benchmarks

I am submitting this PR in raw state to get feedback on whether this approach is worth pursuing at all, or it is something that has known problems and thus cannot be included in Scipy.

Relevant:

ilayn · 2019-03-25T12:58:18Z

The typical response to such determinant and rank computations is that Cholesky factorization is not a rank-revealing factorization hence cannot be used reliably for near-singular matrices. Often cholesky is a pre-step for positive definiteness "if it passes nice but otherwise do the expensive way". Because the expensive way is way more expensive than computing the Cholesky factorization it doesn't harm too much but if exists saves quite a bit.

Hence instead of "if, else" block we need a "try, except" (I mean conceptually, it doesn't have to live in try, except necessarily).

dniku · 2019-06-29T15:54:08Z

Apologies for not going back to this work lately.

@ilayn can you suggest a good way to check whether a matrix is not near-singular given its Cholesky decomposition? I have found some relevant information in this presentation starting from slide 22, but I'm not sure if I want to go all the way into implementing the algorithm they suggest.

In other words, what should trigger the except you mention?

dniku · 2020-11-01T15:56:17Z

@ilayn could you elaborate on how to detect near-singular matrices using Cholesky decomposition?

ilayn · 2020-11-01T19:48:25Z

Ouch, my apologies for the ridiculous delay. What I meant (I hope) is that Cholesky decomposition test is used as a canary for positive definiteness, if it succeeds then nice we carry on. If it fails either the matrix is not positive definite (which is on paper not possible since it is covariance) or precision-related failures. If you do this test inside a try except block you can catch it and fall back to eigh or something else. allow_singular then controls the "fallback or error" switch.

josef-pkt · 2020-11-01T20:10:53Z

just a side comment

the matrix is not positive definite (which is on paper not possible since it is covariance)

I have several cases in statsmodels where I work with singular, reduced rank covariance matrices, that are not directly estimated from data.
simplest case is when the data/model has to satisfy a linear constraint.

dniku · 2020-11-01T20:18:08Z

I'm not sure that it cho_factor does not raise an exception, that's a sufficient condition for non-singularity (or even non-near-singularity) of the covariance matrix.

@josef-pkt also rightfully notes that the covariance matrix might be singular in practice (e.g. a 1D Gaussian on a 2D plane).

So is there a good way to detect if the matrix is non-singular given the output of cho_factor, provided that it does not raise an exception?

ilayn · 2020-11-01T20:37:43Z

Yes, so if that is such a problem then you can fallback to whichever branch you want.

We switch to a more expensive method like eigh or svd and bite the bullet or otherwise Cholesky should go through without any performance penalty.

But shouldn't get stuck at Cholesky step. You can check the info of cho_factor and raise. But regular linalg.cholesky followed by solve_triangular is also fine. I don't think it would matter as much in terms of performance.

mdhaber · 2022-07-18T05:37:33Z

Hi @dniku just thought I'd check in to see if you planned to finish this up? What do you need from us?

mdhaber · 2022-08-26T06:32:33Z

This had been inactive for a while, and it's out of date, so I think it's safe to close. This is a good idea, though, add we'll take care of this in a follow-up to gh-16002. See mdhaber#88 for how this might work now.

ENH: speed up multivariate_normal.logpdf for nonsingular matrices

87cbb5f

ilayn added scipy.stats enhancement A new feature or improvement labels Mar 25, 2019

mdhaber mentioned this pull request Mar 15, 2022

A Solid Foundation for Statistics in Python with SciPy mdhaber/scipy#26

Closed

mdhaber closed this Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: speed up multivariate_normal.logpdf for nonsingular matrices #9973

ENH: speed up multivariate_normal.logpdf for nonsingular matrices #9973

dniku commented Mar 22, 2019 •

edited

ilayn commented Mar 25, 2019

dniku commented Jun 29, 2019

dniku commented Nov 1, 2020

ilayn commented Nov 1, 2020

josef-pkt commented Nov 1, 2020

dniku commented Nov 1, 2020 •

edited

ilayn commented Nov 1, 2020

mdhaber commented Jul 18, 2022

mdhaber commented Aug 26, 2022 •

edited

ENH: speed up multivariate_normal.logpdf for nonsingular matrices #9973

ENH: speed up multivariate_normal.logpdf for nonsingular matrices #9973

Conversation

dniku commented Mar 22, 2019 • edited

ilayn commented Mar 25, 2019

dniku commented Jun 29, 2019

dniku commented Nov 1, 2020

ilayn commented Nov 1, 2020

josef-pkt commented Nov 1, 2020

dniku commented Nov 1, 2020 • edited

ilayn commented Nov 1, 2020

mdhaber commented Jul 18, 2022

mdhaber commented Aug 26, 2022 • edited

dniku commented Mar 22, 2019 •

edited

dniku commented Nov 1, 2020 •

edited

mdhaber commented Aug 26, 2022 •

edited