Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check and supply a more descriptive error when trying to use 'gaussian_kde' with HMA #1604

Closed
npatki opened this issue Sep 22, 2023 · 0 comments · Fixed by #1612
Closed

Check and supply a more descriptive error when trying to use 'gaussian_kde' with HMA #1604

npatki opened this issue Sep 22, 2023 · 0 comments · Fixed by #1612
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Sep 22, 2023

Problem Description

The HMASynthesizer is not compatible with the 'gaussian_kde' distribution. The HMA algorithm is only designed to work with parametric distributions (distributions must have pre-defined parameters and a predetermined number of them).

Even though we know this incompatibility exists, the HMASynthesizer is still allowing you to apply the 'gaussian_kde' distribution in the set_table_parameters function without complaining. Then it is erroring on fit with a KeyError that makes it every hard to understand what's happening (see #1602).

Expected behavior

If I try to apply the gaussian_kde for an HMA distribution, the set_table_parameters method should produce a clear error with guidance about what to do next.

from sdv.multi_table import HMASynthesizer
from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

synthesizer = HMASynthesizer(metadata)
synthesizer.set_table_parameters(
    table_name='guests',
    table_parameters={
        'default_distribution': 'gaussian_kde',
        'numerical_distributions': {
            'checkin_date': 'uniform',
            'checkout_date': 'gaussian_kde'
        }
    }
)

synthesizer.fit(data)
SynthesizerInputError: The 'gaussian_kde' is not compatible with the HMA algorithm. Please choose a different distribution such as 'beta' or 'truncnorm'. Or try a different algorithm such as HSA.

Additional context

Note that the error should trigger for either case below:

  • I try to set default distribution to 'gaussian_kde'
  • I try to set any individual column to 'gaussian_kde' using the numerical_distributions parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants