Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Copulas univariate fit fails, produce a log instead of a warning #359

Closed
npatki opened this issue Aug 15, 2023 · 0 comments · Fixed by #365
Closed

When Copulas univariate fit fails, produce a log instead of a warning #359

npatki opened this issue Aug 15, 2023 · 0 comments · Fixed by #365
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link

npatki commented Aug 15, 2023

Problem Description

When using the multivariate Gaussian Copula, there are occasionally cases where scipy fails to fit a univariate distribution for a given shape. In this case, the default functionality is to catch the error and fallback to using the Gaussian distribution, which is most stable and able to be fit in almost all cases.

When we do this fallback, we produce a warning to let the user know this is happening.

UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.

Expected behavior

This information would be better off as a logged item rather than a warning.
(In other parts of the SDV ecosystem, we are using logger to dump any info or debug messages.)

Additional context

We should be careful when producing warnings to the end user. There are several reasons why it may be better to log info rather than produce a warning.

  1. In this case, there's nothing explicitly wrong -- certain scipy distributions just aren't great at fitting specific marginals.
  2. A warning captures the user's attention and indicates that something should be done differently. In this case, the user can choose a different marginal distribution, but it is very data-dependent and not always needed.
  3. A warning will disrupt any progress bars. If the information isn't actionable, this becomes a nuisance. For example the SDV's GaussianCopulaSynthesizes produces a progress bar to show the fit progress, but is interrupted every time the univariate falls back from Beta to Gaussian. There is nothing I can do as a user about this.
Learning relationships:
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   0%|          | 0/1565 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   1%|▏         | 22/1565 [00:01<02:15, 11.36it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   3%|▎         | 45/1565 [00:04<03:17,  7.71it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):   8%|▊         | 125/1565 [00:11<02:24,  9.98it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):  15%|█▍        | 228/1565 [00:19<01:45, 12.71it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'):  16%|█▌        | 254/1565 [00:21<01:04, 20.36it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
  warnings.warn(warning_message)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants