Release v1.3.0 - 2023-1-18 · sdv-dev/RDT

This release makes changes to the way that individual transformers are stored in the HyperTransformer. When accessing the config via HyperTransformer.get_config(), the transformers listed in the config are now the actual transformer instances used during fitting and transforming. These instances can now be accessed and used to examine their properties post fitting. For example, you can now view the mapping for a PseudoAnonymizedFaker instance using PseudoAnonymizedFaker.get_mapping() on the instance retrieved from the config.

Additionally, the output of reverse_tranform no longer appends the .value suffix to every unnamed output column. Only output columns that are created from context extracted from the input columns will have suffixes (eg. .normalized in the ClusterBasedNormalizer).

The AnonymizedFaker and RegexGenerator now have an enforce_uniqueness parameter, which controls whether the data returned by reverse_transform should be unique. The HyperTransformer now has a method called create_anonymized_columns that can be used to generate columns that are matched with anonymizing transformers like AnonymizedFaker and RegexGenerator. The method can be used as follows:
HyperTransformer.create_anonymized_columns(num_rows=5, column_names=['email_optin', 'credit_card'])

Another major change in this release is the ability to control randomization. Every time a HyperTransformer is initialized, its randomness will be reset to the same seed, and it will yield the same results for reverse_transform if given the same input. Every subsequent call to reverse_transform yields a different result. If a user desires to reset the seed, they can call HyperTransformer.reset_randomization.

Finally, this release adds support for Python 3.10 and drops support for 3.6.

Bugs

The reset_randomization should also apply to fit and transform - Issue #608 by @amontanez24
Cannot print CustomLabelEncoder: ValueError - Issue #607 by @amontanez24
Float formatter learn_rounding_scheme doesn't work on all digits - Issue #556 by @fealho
Warnings not showing on update_transformers_by_sdtype - Issue #582 by @amontanez24
OneHotEncoder doesn't work with boolean sdtype - Issue #583 by @pvk-developer
Setting config on HyperTransformer does not read supported_sdtypes - Issue #560 by @pvk-developer
#545 - Issue #545 by @pvk-developer
Add error to NullTransformer when data only contains nans - PR #567 by @fealho
Update update_transformers validation - PR #563 by @fealho

Maintenance

Support Python 3.10 - Issue #593 by @pvk-developer
RDT 1.3 Package Maintenance Updates - Issue #594 by @pvk-developer

New Features

Update errors - Issue #599 by @amontanez24
Add ability to control randomness - Issue #584 by @amontanez24
Printing and error improvements - Issue #581 by @amontanez24
Make RegexGenerator not to reset itself - Issue #558 by @pvk-developer
Add a reset_anonymization method - Issue #559 by @pvk-developer
Don't copy instances of tranformer - Issue #541 by @fealho
Remove '.value' suffix - Issue #533 by @fealho
Change the NEXT_TRANSFORMERS logic - Issue #557 by @fealho
Add utility functions to AnonymizedFaker - Issue #561 by @pvk-developer
Update API for update_transformers_by_sdtype to be more explicit about instances vs. copies - Issue #540 by @fealho
Add create_anonymized_columns method to anonymize data from scratch - Issue #546 by @pvk-developer
Add parameter to AnonymizedFaker() and RegexGenerator() to generate only unique values - Issue #542 by @pvk-developer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.0 - 2023-1-18

Bugs

Maintenance

New Features

Contributors