Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parameter allow_singular for gaussian copula #8504

Merged

Conversation

Tobi0
Copy link
Contributor

@Tobi0 Tobi0 commented Nov 4, 2022

Notes:

  • It is essential that you add a test when making code changes. Tests are not
    needed for doc changes.
  • When adding a new function, test values should usually be verified in another package (e.g., R/SAS/Stata).
  • When fixing a bug, you must add a test that would produce the bug in main and
    then show that it is fixed with the new code.
  • New code additions must be well formatted. Changes should pass flake8. If on Linux or OSX, you can
    verify you changes are well formatted by running
    git diff upstream/main -u -- "*.py" | flake8 --diff --isolated
    
    assuming flake8 is installed. This command is also available on Windows
    using the Windows System for Linux once flake8 is installed in the
    local Linux environment. While passing this test is not required, it is good practice and it help
    improve code quality in statsmodels.
  • Docstring additions must render correctly, including escapes and LaTeX.

@josef-pkt
Copy link
Member

josef-pkt commented Nov 7, 2022

Do you have a usecase for this?

I'm not sure what the implication for the different methods is?
I don't really want to support things like checking whether the values in pdf are in the support of the space reduced by perfect collinearity or dependence.

We can add it essentially as in this PR, but with a warning that singular correlation matrices are not supported by all methods, and that the behavior is determined by the scipy.stats distribution class.

Aside:
AFAICS/AFAIU, perfect correlation would imply perfect dependence between marginal distributions, which, I think, does not imply perfect correlation between marginal random variables.
Although, spearman's correlation should be perfect (monotonic transformations)

@josef-pkt
Copy link
Member

josef-pkt commented Nov 7, 2022

However, copula cdf, pdf and rvs delegate to scipy.stats.
So we would just inherit whatever behavior scipy supports.
This assumes we don't replace scipy by our own functions or distribution class.

And the keyword is opt-in. So we would not have an effect on the current case without singular correlation.

For checking:
We could check continuity at singular correlation, e.g. the results of methods when correlation close to 1 should be close to the results for perfect correlation. (except for parts that nan or raise on violation of the subspace restrictions in the support of the distribution.)

@Tobi0
Copy link
Contributor Author

Tobi0 commented Nov 8, 2022

It is a border case for me where I vary a parameter and at one end it tends to a correlation matrix full of ones. Always taking care of this border case and only do it with 0.99 is getting ugly.

It's just about passing a parameter from scipy to the copula api here.

I don't really want to support things like checking whether the values in pdf are in the support of the space reduced by perfect collinearity or dependence.

You often have to expect such problems at extreme values. Just to give you another example, the plot_pdf function raises an error for the Independence case (corr=np.eye).
Also, the pdf method return nan for 0 and 1.

Imo that is up to the user, to deal with singular matrix. The argument is default allow_singular=False .

AFAICS/AFAIU, perfect correlation would imply perfect dependence between marginal distributions, which, I think, does not
imply perfect correlation between marginal random variables.

correlation 1 means u_1=u_2=...=u_n and at least for identical marginals x_i=cdf^-1(u_i) : x_1=x_2=...x_n

from statsmodels.distributions.copula.api import GaussianCopula
c = GaussianCopula(corr=1, allow_singular=True)
c.rvs(4)
> array([[0.33369264, 0.33369264],
       [0.39963804, 0.39963804],
       [0.98170826, 0.98170826],
       [0.72875669, 0.72875669]])
c.cdf((1,1))
> 1.0
c.cdf((0,0))
> 0.0
c.plot_scatter()

@josef-pkt
Copy link
Member

at least for identical marginals

but it will in general not be true for copula distributions with unequal marginal distributions, where, I guess, marginal distributions could be in the same family but with different parameters.

@josef-pkt
Copy link
Member

The multivariate t distribution in scipy also has the allow_singular option. However the copula does not have the cdf yet.
So, I guess we can skip allow_singular for the t copula for now.

Can you add the caveat about the behavior of singular cases to the docstring for allow_singular?

Then I can merge

@josef-pkt
Copy link
Member

aside: we have helper functions that make a matrix positive definite or semi-definite.

@Tobi0
Copy link
Contributor Author

Tobi0 commented Nov 9, 2022

caveat about what?
No method of the copula is causing a problem.

@josef-pkt
Copy link
Member

We have no unit tests for the singular case. behavior is inherited from scipy which might not be stable (and which might not be what we want).
Impact on copula distributions is not clear in details.

I would not make any guarantees on specific behavior in methods for the singular case, until it has been investigated and has full unit tests.
Some methods are clear, like rvs should be well defined, however behavior of pdf, cdf and other methods might not be stable in edge cases.

@josef-pkt josef-pkt merged commit 0720940 into statsmodels:main Nov 30, 2022
@josef-pkt josef-pkt mentioned this pull request Nov 30, 2022
20 tasks
@Tobi0 Tobi0 deleted the Gaussian-Copula-Add-parameter-allow_singular branch January 27, 2023 09:25
@josef-pkt josef-pkt added this to the 0.14 milestone Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gaussian Copula: Add parameter allow_singular
2 participants