Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add update_transformers to synthesizers #1021

Closed
amontanez24 opened this issue Sep 21, 2022 · 2 comments · Fixed by #1058
Closed

Add update_transformers to synthesizers #1021

amontanez24 opened this issue Sep 21, 2022 · 2 comments · Fixed by #1058
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Sep 21, 2022

Problem Description

As a user, it would be helpful to have ways to manually set custom transformers to use on my data before modeling.

Expected behavior

  • Add update_transformers method to BaseSynthesizer
  • Parameters:
    • column_name_to_transformer (dict): A dictionary mapping the name of the column to the transformer instance.
  • Method should update the HyperTransformer based of the provided dict
  • Validation:
    • Errors: (Raise if any 1 or more columns encounter the case. Do the checks first. We shouldn't partially update anything.)
      • Updating a transformer that is incompatible with the sdtype provided in the metadata
        Error: Column 'age' is a numerical column, which is incompatible with the 'LabelEncoder' preprocessing.
      • Adding a transformer other than AnonymizedFaker or RegexGenerator for a key column (primary, alternate, sequence key)
        Error: Column 'user_id' is a key. It cannot be preprocessed using the 'FloatFormatter' transformer.
      • The user is assigning a transformer object that has already been fit
        Error: Transformer for column 'age' has already been fit on data.
    • Warnings: Raise all that arise
      • (CTGAN, CopulaGAN, TVAE, PAR only): Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
        Warning: Replacing the default transformer for column 'degree_type' might impact the quality of your synthetic data
      • (GaussianCopula): Whenever the user is adding a OneHotEncoder to a categorical column
        Warning: Using the OneHotEncoder for column 'degree_type' may slow down the preprocessing and modeling time
@amontanez24 amontanez24 added the feature request Request for a new feature label Sep 21, 2022
@amontanez24 amontanez24 added this to the 1.0.0 milestone Sep 21, 2022
@amontanez24 amontanez24 changed the title Add update_transformers and set_hyper_transformer to synthesizers Add update_transformers to synthesizers Sep 29, 2022
@fealho
Copy link
Member

fealho commented Oct 5, 2022

@amontanez24 Could you clarify what you meant by Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)

@amontanez24
Copy link
Contributor Author

@amontanez24 Could you clarify what you meant by Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)

For CTGAN, CopulaGAN and TVAE, the categorical and boolean transformations are skipped. Instead of using the default categorical transformer for them, we should use None. If a user tries to change that, we raise the warning but let them do it since it won't technically break

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants