Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow specyfing inverse covariance of a multivariate normal distribution #16002

Merged
merged 33 commits into from
Sep 27, 2022

Conversation

siddhantwahal
Copy link
Contributor

Reference issue

Closes #11053, #11772

What does this implement/fix?

This PR allows specifying the inverse covariance of a multivariate normal in addition or instead of the covariance by adding the inverse_cov keyword argument.

To provide functionality with the covariance or its inverse, essential information from the supplied matrix is extracted into a _CovInfo object. This object is then passed to private helper methods such as multivariate_normal_gen._logpdf.

This PR also introduces the private multivariate_normal_gen._rvs method. This introduction is only needed to solve #11772 and is a little orthogonal to the changes needed to solve #11053. However, the introduction of the _CovInfo allows solving #11772 in just a few lines, so I've gone ahead and included them here.

If desired, the _CovInfo class could be included in the public API so that advanced users can instantiate it and pass it as a keyword argument instead. For example, instead of only being restricted to supplying the inverse

multivariate_normal(mean=0, inverse_cov=1)

users would get complete control over the description of the covariance matrix through its square root factors:

multivariante_normal(mean=0, cov_info=CovInfo(...)

Thus, this PR also makes progress towards #15675.

Additional information

This PR supersedes #12184 which should be closed if this is merged.

@tylerjereddy tylerjereddy added scipy.stats enhancement A new feature or improvement labels Apr 18, 2022
@mdhaber
Copy link
Contributor

mdhaber commented May 16, 2022

@siddhantwahal thanks for your patience here. Personally, I've been focusing on fixing bugs... When that's a little more under control, I know there are a lot of enhancements waiting!
Could you take a look at gh-15675? I wouldn't ask you to include that in this PR, but can you envision what that might look like as a follow-up to this one?

@siddhantwahal
Copy link
Contributor Author

siddhantwahal commented May 17, 2022

@mdhaber no worries!

I think gh-15675 can be broken into two tasks:

  1. allow specifying the eigendecomposition of the covariance if known (so that the covariance isn't factorized again)
  2. allow specifying the covariance as a composition of rotation angles and variances

To enable (1) after this PR:

  • make the CovInfo class public
  • swap the inverse_cov=None keyword argument in multivariate_normal with cov_info=None
    • this will require some care when setting defaults in _process_parameters
    • don't create the CovInfo object if one is passed as a keyword argument, maybe this could just be done in _process_parameters?

I think that's it.

One possible UX is (assuming the eigen decomposition is u @ d @ u.T:

[nav] In [3]: cov_info = CovInfo(
         ...:     sqrt_cov=u @ np.sqrt(d),
         ...:     sqrt_inv_cov=u @ np.sqrt(1 / d),
         ...:     log_det_cov=np.sum(np.log(d)),
         ...:     rank=len(d)
         ...:   )
[nav] In [4]: mvn = multivariate_normal(mean=np.zeros(10_000), cov_info=cov_info)

The UX could be enhanced by adding factory methods to create CovInfo from the eigendecomposition or a combination of rotations and variances. For example, CovInfo.from_eigendecomp(eigvals=u, eigvecs=d) or CovInfo.from_rotation(angles=angles, variances=variances). That would accomplish (2).

@mdhaber
Copy link
Contributor

mdhaber commented May 23, 2022

Thanks @siddhantwahal for your patience. I'll take a look in June if nobody gets to this first.

self._cov = cov
self._inverse_cov = inverse_cov

@cached_property
Copy link
Contributor

@mdhaber mdhaber Jun 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of this class. I wonder why it doesn't go a little further. Why not accept all of the information the user can provide in the initializer and store all of it as private attributes in the initializer (e.g. self._sqrt_cov = _sqrt_cov), then have public cached_propertys for all of these attributes that computes them (lazily, in effect) in the most efficient way, given whatever information is available?

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I go much further with this review, I want to check what you think of the idea of refactoring this so that all the logic about the covariance matrix (including the stuff in process_parameters, _process_cov_shaped) goes in the _CovInfo class, and _process_parameters would instantiate the _CovInfo object rather than doing that separately in every method.

Or really, what is the distinction between _CovInfo and _PSD / what should it be? Maybe I haven't looked deeply enough, but it seems like it would be nice to have one class that accepts all sorts of different representations of a covariance matrix and has methods that do all the things you want to do with a covariance matrix. If we combined the things we like from this PR and the existing _PSD class to make a nice class representing a covariance matrix, maybe we could also endow other multivariate distributions (e.g. multivariate_t) with these same features, but with minimal changes to existing code?

@@ -437,7 +537,31 @@ def _process_quantiles(self, x, dim):

return x

def _logpdf(self, x, mean, prec_U, log_det_cov, rank):
def _create_cov_info(self, cov, inverse_cov, allow_singular=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of https://github.com/scipy/scipy/pull/16002/files#r906859132, I think this logic could go in _CovInfo. Similar logic could be added for eigendecomposition or variances and rotation angles.

@mdhaber
Copy link
Contributor

mdhaber commented Jul 15, 2022

What do you think about this comment #16002 (review) @siddhantwahal? (Sorry it took me a while to get to this!)

@siddhantwahal
Copy link
Contributor Author

What do you think about this comment #16002 (review) @siddhantwahal? (Sorry it took me a while to get to this!)

Thanks a lot for the thoughtful review @mdhaber. I was swamped with preparing for our Scipy 2022 tutorial so haven't had a chance to consider or respond to your suggestion. I plan on resuming work on this in the next few weeks!

@mdhaber
Copy link
Contributor

mdhaber commented Jul 15, 2022

@siddhantwahal oh, are you here? Hope to see you at the sprint tomorrow!

@siddhantwahal
Copy link
Contributor Author

siddhantwahal commented Aug 14, 2022

Apologies for the delay here @mdhaber.

Or really, what is the distinction between _CovInfo and _PSD / what should it be?

I was envisioning _CovInfo to be a simple container to hold properties (sqrt_cov, log_det_cov etc.) of the covariance matrix. How this container was created could be left unspecified. In effect,_PSD was an algorithm or policy for creating a _CovInfo instance. This separation made sense because the various multivariate distributions only care about what the various properties of the covariance are, not how they're created (strictly mathematically speaking; from the software perspective we need to maintain backwards compatibility and numerical stability).

However, this separation forces us to specify the _CovInfo object in its entirety and prevents lazy computation of properties. A compromise here is to introduce different implementations for different algorithms or representations. 7e75eeb is an attempt towards this. After this commit, _PSD could be entirely replaced with EigSvdCovInfo in other distributions, but that's best done in a separate PR.

Maybe I haven't looked deeply enough, but it seems like it would be nice to have one class that accepts all sorts of different representations of a covariance matrix and has methods that do all the things you want to do with a covariance matrix.

It's not yet clear to me whether we could or even should have one class to do this. For example, multivariate_normal and multivariate_t both use the Eigenvalue and SVD based algorithm (implemented in _PSD and the implementation of multivariate_normal.rvs in master) to compute covariance properties, but the Wishart distribution uses the Cholesky factorization. There could be several other ways to represent the covariance matrix. We could end up with a very long (and therefore difficult to maintain) class if we try to account for all of these possibilities in one class.

If we combined the things we like from this PR and the existing _PSD class to make a nice class representing a covariance matrix, maybe we could also endow other multivariate distributions (e.g. multivariate_t) with these same features, but with minimal changes to existing code?

That should be possible with the latest implementation too, EigSvdCovInfo should plug in directly into multivariate_t.

Before I go much further with this review, I want to check what you think of the idea of refactoring this so that all the logic about the covariance matrix (including the stuff in process_parameters, _process_cov_shaped) goes in the _CovInfo class, and _process_parameters would instantiate the _CovInfo object rather than doing that separately in every method.

_process_parameters is parsing mean, cov, andinverse_cov, inferring dimensionality, assigning suitable defaults (like mean=0, cov=1 if either is None), ensuring that they have a suitable data type. I find the current implementation appropriate -- the distributions remain responsible for parsing user input for mean and covariances, while CovInfo is concerned with further breaking down the parsed covariance input into properties commonly accessed by the distribution. Same for _process_cov_shaped, it's just checking whether the dimensions of the covariance match up with the mean.

Thanks for the suggestion of computing CovInfo in process_parameters instead of doing it in every method, that is now implemented in 39cbaed. Since all distributions need now is a CovInfo object, it's really easy to expose a cov_info=None optional argument to all methods so that users can supply their own CovInfo object.

[skip azp] [skip actions]
@mdhaber
Copy link
Contributor

mdhaber commented Sep 13, 2022

I was hoping Sphinx would automatically document the attributes based on the docstrings of the properties, but that's not happening. Direct link to documentation:
https://output.circle-artifacts.com/output/job/70f59c8e-3b9b-47b1-aacc-dfdbfa2b71d7/artifacts/0/html/reference/stats.html

Thoughts for how the properties should be documented (so that they are rendered)?

@mdhaber
Copy link
Contributor

mdhaber commented Sep 14, 2022

@tirthasheshpatel I thought it would be good to get your eyes on this. IIRC, you were interested in vectorization over many covariance matrices for the multivariate_t distribution, and this is setting us down that road.
We're not taking the straightest path. In addition, users will get several options for computing the tricky part of several multivariate distributions (inverse matrix vector product) and much greater control over what to do with singular covariance matrices. But I think that's important, too. If you want to see a little further than what we have here, please take a look at mdhaber#88, which has four of these Covariance classes (and notes about a fifth one using LDL), all vectorized to support broadcasting over ND shape parameters for the MVN pdf and entropy methods (for starters). I think it would take just a few lines to go from there to multivariate_t.

@siddhantwahal
Copy link
Contributor Author

Thoughts for how the properties should be documented (so that they are rendered)?

I started to look into this but didn't get very far. Sphinx is picking up the properties, (see the log_pdet docs, for example).

My working hypothesis is that inherited-members needs to be explicitly activated to render properties from the parent class, but haven't gotten round to testing it (dev env broke again).

Copy link
Member

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review! Just a few nitpicky comments otherwise the implementation looks good.

scipy/stats/_covariance.py Outdated Show resolved Hide resolved
scipy/stats/_covariance.py Outdated Show resolved Hide resolved
scipy/stats/_covariance.py Outdated Show resolved Hide resolved
@tirthasheshpatel
Copy link
Member

About cache_propertys not rendering in the docs: since only covariance needs to be cached, we can make others normal properties, and instead of making covariance a cache_property, we can instead make it a normal property and initialize _covariance to None in the __init__ method and lazily compute it (only once) when covariance is accessed. What do you think?

@mdhaber
Copy link
Contributor

mdhaber commented Sep 24, 2022

Thanks @tirthasheshpatel! I implemented these suggestions. As it turns out, we don't need to do anything special with covariance because the _covariance of the subclass is a cached_property, so it will only ever be calculated once.

@mdhaber
Copy link
Contributor

mdhaber commented Sep 24, 2022

Failures appear to be unrelated.

Copy link
Member

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look very strong! Just one nitpick and I think this PR is good to go. Thanks @mdhaber, @siddhantwahal!!

@mdhaber
Copy link
Contributor

mdhaber commented Sep 27, 2022

Thanks @tirthasheshpatel! Made the change. Please feel free to squash-merge with me as a co-author.

What would you be most interested in reviewing next? We can specify covariance matrices using:

  • diagonals (diagonal matrices only, of course)
  • cholesky decomposition
  • eigenvalue decomposition

And I can add LDL after we add these. Presumably, I should add these in short, separate PRs to keep things simple?

@tirthasheshpatel
Copy link
Member

What would you be most interested in reviewing next?

I think the diagonal one should be a bit easier. We can start with that!

Presumably, I should add these in short, separate PRs to keep things simple?

That sounds good!

@tirthasheshpatel tirthasheshpatel merged commit e13d4f3 into scipy:main Sep 27, 2022
@tirthasheshpatel
Copy link
Member

Merged! Thanks @mdhaber @siddhantwahal!

@mdhaber
Copy link
Contributor

mdhaber commented Sep 27, 2022

Thanks, Tirth!
And @siddhantwahal - we can also change the rvs method shortly.
This merged sooner than expected, so to anyone coming by - please feel free to comment here. This is still a work in progress and not yet released, so we can still make changes in a separate PR.

@mdhaber mdhaber added this to the 1.10.0 milestone Sep 27, 2022

class Covariance:
"""
Representation of a covariance matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really missing some documentation IMHO.

Also, from the naming of the subclasses, it seems that classmethods would be more appropriate. Then you would have Covariance.from_precision instead of CovViaPrecision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't public before. The subclasses had more information about how they were working. When we go to the classmethod idea, I can add some more information to the Covariance class itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scipy.stats: Allow specifying inverse-variance matrix to multivariate_normal
5 participants