Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When adding constraints, auto_assign_transformers is showing columns that should no longer exist #1260

Closed
npatki opened this issue Feb 12, 2023 · 0 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 12, 2023

Environment Details

  • SDV version: 1.0.0 (in progress)
  • Python version: 3.8
  • Operating System: Linux (Colab Notebook)

Error Description

When adding constraints, it's expected that some of the columns will be added/removed/modified as a result.

The transformer assignment should happen after the constraints, so I should be able to see the modified column set when inspecting the transformers. However, I see the original column set instead -- with column names and transformers that should no longer exist.

Steps to reproduce

from sdv.multi_table import HMASynthesizer
from sdv.datasets.demo import download_demo

real_data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

synthesizer = HMASynthesizer(metadata)

fixed_location_combinations = {
    'constraint_class': 'FixedCombinations',
    'table_name': 'hotels',
    'constraint_parameters': {
        'column_names': ['city', 'state']
    } 
}

synthesizer.add_constraints([fixed_location_combinations])
synthesizer.auto_assign_transformers(real_data)
synthesizer.get_transformers(table_name='hotels')

The output is:

{'hotel_id': RegexGenerator(regex_format='HID_[0-9]{3}', enforce_uniqueness=True),
 'city': LabelEncoder(add_noise=True),
 'state': LabelEncoder(add_noise=True),
 'rating': FloatFormatter(learn_rounding_scheme=True, enforce_min_max_values=True),
 'classification': LabelEncoder(add_noise=True)}

It's strange that city and state appear here. These columns names should not exist after the constraint processing is done.

Expected

I expect this to be the output of get_transformers:

{'hotel_id': RegexGenerator(regex_format='HID_[0-9]{3}', enforce_uniqueness=True),
 'rating': FloatFormatter(learn_rounding_scheme=True, enforce_min_max_values=True),
 'classification': LabelEncoder(add_noise=True),
 'city#state': LabelEncoder(add_noise=True)}

We can verify that these are the column names by calling preprocess:

synthesizer.preprocess(real_data)['hotels']

Output:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants