-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Multivariate t-distribution #12582
Conversation
My tests pass locally, but the CI tests fail for a reason I don't understand, e.g.:
|
@gwgundersen Thanks for submitting this. Other opinions may differ, but I suspect it's OK to start without I'm not sure if this will help, but try It looks like it's not showing up in the documentation. I think you need to add your distribution to
I'm not sure about the failing test or how to test the RVS; perhaps @WarrenWeckesser has thoughts? |
@mdhaber, thanks for that suggestion. I tried initializing and updating the git submodule, but it did not seem to work. Is it possible I need to rebuild Scipy from scratch to update the version number based on the commit? I've tried to do that and, oddly, am completely failing to do so—I was able to build Scipy previously, and I don't know if it's a change in the build procedure or something local to me. I wrote more details here. |
@gwgundersen yes, it sounds to me like you need to build SciPy again, too. I'm not sure how to fix your build error, though. Sometimes my approach has been to nuke my environment and start from scratch, but I see that you've written up your issue, so hopefully you'll get a more helpful reply. I've also seen this issue for macs lately, but it looks different from yours. |
@mdhaber and @gwgundersen I did some testing of this PR locally to see if I could get the docstrings to build. I couldn't set my local changes to track this pr for some reason (not sure why) so I posted the changes here: it is mostly spacing around sections of docstrings (e.g. before example line, etc). Let me know if I can help further to get this PR to merging, seems close. |
@rlucas7 Do you want me to add those to this PR? |
@mdhaber wrote:
Assuming @gwgundersen is ok with that then yes, the edits I made were all only newline characters so nothing beyond cosmetic changes to make sphinx happy. Would be nice to get the doc build to green so we can focus on the scientific/probabilistic aspects of the PR and move this one forward. |
No worries @gwgundersen please let us know if you run into issues w/dev setup. |
OK I went ahead and pushed @rlucas7's changes to @gwgundersen's branch so we can see how they affect the doc rendering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdhaber I left a couple suggested changes to get the rvs
reproducibility test to pass for me locally. One had some weird issue trying to overwrite the suggested change you made, perhaps you can take a look, and if they look ok to you, commit and see if gets to green?
I also left several comments, mostly minor, 2 less minor ones though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gwgundersen hopefully my review/clarifying comments here help. Ping me in comments if still unclear. I think is mostly a matter of adding some branching for np.inf
df case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gwgundersen apologies for my confusing review yesterday. For the df
parameter the way things are currently written in this PR, everything should work correctly if df
is e.g. 1.1
except for the check that you have for int
. So replace that check with the suggested change and I think the PR is good from that side. I think my comments for when df==np.inf
is/are still valid though, let me know if unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gwgundersen left a couple more comments. I noticed something weird with the shapes, not sure the cause but behavior is inconsistent:
>>> import numpy as np
>>> import scipy.stats as ss
>>> loc = np.zeros(4)
>>> shape = np.eye(4)
>>> df = 1
>>> shape[2, 2] = 0
>>> ss.multivariate_t(loc, shape, df, allow_singular=True).pdf(np.zeros(4))
array([0.07599089])
>>> ss.multivariate_t(loc, shape, np.inf, allow_singular=True).pdf(np.zeros(4))
0.06349363593424098
Could you take a look and add a test for that one?
Left a couple suggestions for the last unit test that was previously puzzling me-now I think I understand that one better.
Most other comments are cosmetic or style changes.
Ah, good catch. When using def test_shape_correctness(self):
# pdf and logpdf should return scalar when the
# number of samples in x is one.
dim = 4
loc = np.zeros(dim)
shape = np.eye(dim)
df = 1
x = np.zeros(dim)
res = multivariate_t(loc, shape, df).pdf(x)
assert np.isscalar(res)
res = multivariate_t(loc, shape, df).logpdf(x)
assert np.isscalar(res)
# pdf() and logpdf() should return probabilities of shape
# (n_samples,) when x has n_samples.
n_samples = 7
x = np.random.random((n_samples, dim))
res = multivariate_t(loc, shape, df).pdf(x)
assert (res.shape == (n_samples,))
res = multivariate_t(loc, shape, df).logpdf(x)
assert (res.shape == (n_samples,))
# rvs() should return scalar unless a size argument is applied.
res = multivariate_t(np.zeros(1), np.eye(1), 1).rvs()
assert np.isscalar(res)
# rvs() should return vector of shape (size,) if size argument
# is applied.
size = 7
res = multivariate_t(1, 1, 1).rvs(size=size)
assert (res.shape == (size,)) |
try: | ||
multivariate_t(loc, shape, df, allow_singular=True) | ||
except np.linalg.LinAlgError: | ||
pytest.fail("Test failed by unexpectedly raising `LinAlgError`.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above-writing unit tests that happy code paths do not raise exceptions is unnecessary.
@rlucas, what if I just do a
Alternatively, I could ask the developers at #11524 for help, since that PR introduced the unit tests that are failing for me. |
Also, the failing tests "smell" like they may be caused by differing mathematical or algorithmic conventions, e.g.:
|
Try
python runtests.py -v -s stats
That will run only the stats tests. If that passes for you locally then yeah I think a force push is fine.
Also, can you leave a comment on the other PR with the failing test stack trace?
If you’re seeing that fail others are or will too.
I run a Mac box so if this is specific to Ubuntu (Not sure if it is) I can’t help much.
…-Lucas Roberts
On Oct 5, 2020, at 5:21 PM, Gregory Gundersen ***@***.***> wrote:
Also, the failing tests "smell" like they may be caused by differing mathematical or algorithmic conventions, e.g.:
>>> np.isclose(u1_2, u1)
array([[False, True, False, ..., False, False, False],
[False, True, False, ..., False, False, False],
[False, True, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, True, False, ..., False, False, False],
[False, True, False, ..., False, False, False]])
>>> u1_2[0,0]
0.058479194
>>> u1[0,0]
-0.058478586
>>> u1_2[0,2]
0.040936727
>>> u1[0,2]
-0.040936757
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Fixed doc building. Removing unneeded newline formatting Added multivariate T to documentation. Removing thanks update test changes to multivariate-t docs
Co-authored-by: Lucas Roberts <rlucas7@users.noreply.github.com>
Fix rendering in docs. Co-authored-by: Matt Haberland <mhaberla@calpoly.edu> Added reference Co-authored-by: Lucas Roberts <rlucas7@users.noreply.github.com> Changed positive definite to positive semidefinite.
Fixes and tests for handling default arguments. Removed unneeded comment. Code and tests for defaulting to multivariate_normal when df=np.inf. Amended code to allow `df` to be non-integer, but must be greater than zero. Added unit tests. Amended documentation for df Call _squeeze_output on pdf result; unit test pdf shapes. Update scipy/stats/tests/test_multivariate.py Co-authored-by: Lucas Roberts <rlucas7@users.noreply.github.com> Update scipy/stats/_multivariate.py Co-authored-by: Lucas Roberts <rlucas7@users.noreply.github.com> Update scipy/stats/_multivariate.py Co-authored-by: Lucas Roberts <rlucas7@users.noreply.github.com> Added squeeze_output and tests for rvs.
Simplified tests using parameterize decorator; fixed names in pdf_correctness test. Use parameterize and assert_equal Use assert_raises alias rather than context manager pattern. Added newline to break up long line Removed test coverage simplified unit test Added numpy to imports Removed unneeded numpy import; made axes eual. Removed explicitly test not raising ValueError with df float; instead, use df float in another passing test.
c622659
to
4af54d7
Compare
Just to make sure that this is not a sign problem (since the results are unique up to columns signs), could you please check if this turns out all true >>> np.allclose(np.abs(u1_2), np.abs(u1), rtol=0., atol=np.finfo(np.float32).eps*100) |
@rlucas7, I rebased, confirmed the stats tests pass, and force pushed. The CI checks pass, and I didn't bring in any additional commits. |
@ilayn, that command returns
Also, despite my |
That can happen also due to an outdated blas/lapack distro (mkl or openblas etc.) Because apparently not only signs are wrong but the results are also different so it is an actual miscalculation. |
Thanks for commenting on this @ilayn I hadn't think about BLAS version issues. So that we know in case of future questions, it would be great @gwgundersen if you could let us know which BLAS and version you have running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making 2 minor (cosmetic) changes to docs and marking approved. I'll leave open until the upcoming weekend to give others some time to comment. If no further comments by Sunday October 11th, at 9am EST, I'll merge to master (sometime after 9am).
@ilayn, I'm not sure which BLAS and LAPACK versions are being used. This is what I tried when trying to find out: >>> import numpy.distutils.system_info as sysinfo
>>> sysinfo.get_info('lapack')
{'libraries': ['lapack', 'lapack'],
'library_dirs': ['/usr/lib/x86_64-linux-gnu'],
'language': 'f77'}
>>> sysinfo.get_info('blas')
{'libraries': ['blas', 'blas'],
'library_dirs': ['/usr/lib/x86_64-linux-gnu'],
'include_dirs': ['/usr/local/include',
'/usr/include',
'/home/gwg/miniconda3/envs/scipy/include'],
'language': 'c',
'define_macros': [('HAVE_CBLAS', None)]} When I
I also tried to re-build Scipy from scratch pointing to the latest object files:
But the tests still fail. Happy to try something else if you have advice, but I can't find any info online about how to find the actual BLAS or LAPACK versions being used. |
Squashed and merged, nice work @gwgundersen |
Thanks @gwgundersen, @rlucas! |
Hello @gwgundersen , scipy/scipy/stats/_multivariate.py Line 4083 in 882849f
it produces the following error
|
@hp5588 I'm seeing that now, I've got a PR to fix it. ref: |
Reference issue
Closes #10042. Duplicates #11413 because the old branch was out-of-date.
What does this implement/fix?
Adds support for the multivariate t-distribution, particularly
pdf
,logpdf
, andrvs
. See my blog post for a high-level discussion of my approach.Additional information
Completed:
logpdf
andpdf
to MATLAB'smvtpdf
function; unit tests for arguments, reproducibility (withrandom_state
), and some testable mathematical properties._multivariate.py
.THANKS.txt
Would appreciate feedback on:
rvs
?cdf
andlogcdf
given that the multivariate t-distribution does not have a closed-form solution for the CDF? While I have seen some implementations (see Consider support for multivariate student-t distribution? #10042 (comment) in particular), I don't yet fully understand how they work and would prefer adding these methods to be a separate PR.entropy
? I don't know anything about the differential entropy of the multivariate t-distribution, and would also prefer punting that to a future PR.