You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the multivariate Gaussian Copula, there are occasionally cases where scipy fails to fit a univariate distribution for a given shape. In this case, the default functionality is to catch the error and fallback to using the Gaussian distribution, which is most stable and able to be fit in almost all cases.
When we do this fallback, we produce a warning to let the user know this is happening.
UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
Expected behavior
This information would be better off as a logged item rather than a warning.
(In other parts of the SDV ecosystem, we are using logger to dump any info or debug messages.)
Additional context
We should be careful when producing warnings to the end user. There are several reasons why it may be better to log info rather than produce a warning.
In this case, there's nothing explicitly wrong -- certain scipy distributions just aren't great at fitting specific marginals.
A warning captures the user's attention and indicates that something should be done differently. In this case, the user can choose a different marginal distribution, but it is very data-dependent and not always needed.
A warning will disrupt any progress bars. If the information isn't actionable, this becomes a nuisance. For example the SDV's GaussianCopulaSynthesizes produces a progress bar to show the fit progress, but is interrupted every time the univariate falls back from Beta to Gaussian. There is nothing I can do as a user about this.
Learning relationships:
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'): 0%| | 0/1565 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'): 1%|▏ | 22/1565 [00:01<02:15, 11.36it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'): 3%|▎ | 45/1565 [00:04<03:17, 7.71it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'): 8%|▊ | 125/1565 [00:11<02:24, 9.98it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'): 15%|█▍ | 228/1565 [00:19<01:45, 12.71it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
warnings.warn(warning_message)
(1/3) Tables 'paper' and 'cites' ('cited_paper_id'): 16%|█▌ | 254/1565 [00:21<01:04, 20.36it/s]/usr/local/lib/python3.10/dist-packages/copulas/multivariate/gaussian.py:119: UserWarning: Unable to fit to a <class 'copulas.univariate.beta.BetaUnivariate'> distribution for column add_numerical. Using a Gaussian distribution instead.
warnings.warn(warning_message)
The text was updated successfully, but these errors were encountered:
Problem Description
When using the multivariate Gaussian Copula, there are occasionally cases where scipy fails to fit a univariate distribution for a given shape. In this case, the default functionality is to catch the error and fallback to using the Gaussian distribution, which is most stable and able to be fit in almost all cases.
When we do this fallback, we produce a warning to let the user know this is happening.
Expected behavior
This information would be better off as a logged item rather than a warning.
(In other parts of the SDV ecosystem, we are using
logger
to dump any info or debug messages.)Additional context
We should be careful when producing warnings to the end user. There are several reasons why it may be better to log info rather than produce a warning.
The text was updated successfully, but these errors were encountered: