You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, it would be helpful to have ways to manually set custom transformers to use on my data before modeling.
Expected behavior
Add update_transformers method to BaseSynthesizer
Parameters:
column_name_to_transformer (dict): A dictionary mapping the name of the column to the transformer instance.
Method should update the HyperTransformer based of the provided dict
Validation:
Errors: (Raise if any 1 or more columns encounter the case. Do the checks first. We shouldn't partially update anything.)
Updating a transformer that is incompatible with the sdtype provided in the metadata Error: Column 'age' is a numerical column, which is incompatible with the 'LabelEncoder' preprocessing.
Adding a transformer other than AnonymizedFaker or RegexGenerator for a key column (primary, alternate, sequence key) Error: Column 'user_id' is a key. It cannot be preprocessed using the 'FloatFormatter' transformer.
The user is assigning a transformer object that has already been fit Error: Transformer for column 'age' has already been fit on data.
Warnings: Raise all that arise
(CTGAN, CopulaGAN, TVAE, PAR only): Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical) Warning: Replacing the default transformer for column 'degree_type' might impact the quality of your synthetic data
(GaussianCopula): Whenever the user is adding a OneHotEncoder to a categorical column Warning: Using the OneHotEncoder for column 'degree_type' may slow down the preprocessing and modeling time
The text was updated successfully, but these errors were encountered:
@amontanez24 Could you clarify what you meant by Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
@amontanez24 Could you clarify what you meant by Whenever the user tries to add a transformer for a column that is auto-assigned to None (boolean/categorical)
For CTGAN, CopulaGAN and TVAE, the categorical and boolean transformations are skipped. Instead of using the default categorical transformer for them, we should use None. If a user tries to change that, we raise the warning but let them do it since it won't technically break
Problem Description
As a user, it would be helpful to have ways to manually set custom transformers to use on my data before modeling.
Expected behavior
update_transformers
method toBaseSynthesizer
dict
): A dictionary mapping the name of the column to the transformer instance.HyperTransformer
based of the provided dictError: Column 'age' is a numerical column, which is incompatible with the 'LabelEncoder' preprocessing.
Error: Column 'user_id' is a key. It cannot be preprocessed using the 'FloatFormatter' transformer.
Error: Transformer for column 'age' has already been fit on data.
Warning: Replacing the default transformer for column 'degree_type' might impact the quality of your synthetic data
Warning: Using the OneHotEncoder for column 'degree_type' may slow down the preprocessing and modeling time
The text was updated successfully, but these errors were encountered: