Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order the output of get_transformers() based on the metadata #1222

Closed
npatki opened this issue Feb 1, 2023 · 1 comment
Closed

Order the output of get_transformers() based on the metadata #1222

npatki opened this issue Feb 1, 2023 · 1 comment
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 1, 2023

Problem Description

In the new SDV version 1.0, I am able to inspect the transformers that will be used if I run:

synthesizer.auto_assign_transformers(real_data)
synthesizer.get_transformers()

This returns a dictionary mapping each column name to the transformer object.

{
  'age': FloatFormatter(),
  'weight': FloatFormatter(),
  ...
}

Expected behavior

In the returned dictionary from get_transformers(), I observed that the keys were out of order.

It would be nice if the keys were in the same order as the metadata, as the metadata is considered the ground truth. This will make it easier for me to inspect the transformers.

@npatki npatki added the feature request Request for a new feature label Feb 1, 2023
@npatki npatki added this to the 1.0.0 milestone Feb 1, 2023
@npatki
Copy link
Contributor Author

npatki commented Feb 1, 2023

Replication

from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer

# metadata columns are in the order of: start_date, end_date, address
data, metadata = download_demo(
    modality='single_table',
    dataset_name='student_placements_pii'
)

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.auto_assign_transformers(data)
synthesizer.get_transformers()

This prints out the dictionary in a different order

{
  'experience_years': FloatFormatter(learn_rounding_scheme=True, enforce_min_max_values=True),
 'second_perc': FloatFormatter(learn_rounding_scheme=True, enforce_min_max_values=True),
 'duration': LabelEncoder(add_noise=True),
  ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants