You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With all of the metadata management moved to SingleTableMetadata and MultiTableMetadata, the table.py class no longer needs to manage metadata. As engineers, it would be useful to have a class that takes in the SingleTableMetadata and uses it to manage the HyperTransformer, constraints and anonymization of data.
Expected behavior
Add a data_processing module
Create a DataProcessor class and put it in that module
The __init__ params should be:
metadata (SingleTableMetadata): The single table metadata instance that will be used to apply constraints and transformations to the data.
learn_rounding_scheme (bool): Define rounding scheme for FloatFormatter. If True, the data returned by reverse_transform will be rounded to that place. Defaults to True.
enforce_min_max_values (bool): Specify whether or not to clip the data returned by reverse_transform of the numerical transformer, FloatFormatter, to the min and max values seen during fit. Defaults to True.
model_kwargs (dict): Dictionary specifying the kwargs that need to be used in each tabular model when working on this table. This dictionary contains as keys the name of the TabularModel class and as values a dictionary containing the keyword arguments to use. This argument exists mostly to ensure that the models are fitted using the same arguments when the same DataProcessor is used to fit different model instances on different slices of the same table.
During the init we should create a constraints attribute that loads all of the constraint instance from the SingleTableMetadata
It can be done similarly to what's done here except the constraints should always be loaded from the json.
The text was updated successfully, but these errors were encountered:
Problem Description
With all of the metadata management moved to
SingleTableMetadata
andMultiTableMetadata
, the table.py class no longer needs to manage metadata. As engineers, it would be useful to have a class that takes in theSingleTableMetadata
and uses it to manage theHyperTransformer
, constraints and anonymization of data.Expected behavior
data_processing
moduleDataProcessor
class and put it in that module__init__
params should be:metadata
(SingleTableMetadata): The single table metadata instance that will be used to apply constraints and transformations to the data.learn_rounding_scheme
(bool): Define rounding scheme forFloatFormatter
. IfTrue
, the data returned byreverse_transform
will be rounded to that place. Defaults toTrue
.enforce_min_max_values
(bool): Specify whether or not to clip the data returned byreverse_transform
of the numerical transformer,FloatFormatter
, to the min and max values seen duringfit
. Defaults toTrue
.model_kwargs
(dict): Dictionary specifying the kwargs that need to be used in each tabular model when working on this table. This dictionary contains as keys the name of the TabularModel class and as values a dictionary containing the keyword arguments to use. This argument exists mostly to ensure that the models are fitted using the same arguments when the same DataProcessor is used to fit different model instances on different slices of the same table.constraints
attribute that loads all of the constraint instance from theSingleTableMetadata
The text was updated successfully, but these errors were encountered: