Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DataProcessor class #946

Closed
amontanez24 opened this issue Aug 9, 2022 · 0 comments
Closed

Create DataProcessor class #946

amontanez24 opened this issue Aug 9, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

With all of the metadata management moved to SingleTableMetadata and MultiTableMetadata, the table.py class no longer needs to manage metadata. As engineers, it would be useful to have a class that takes in the SingleTableMetadata and uses it to manage the HyperTransformer, constraints and anonymization of data.

Expected behavior

  • Add a data_processing module
  • Create a DataProcessor class and put it in that module
  • The __init__ params should be:
    • metadata (SingleTableMetadata): The single table metadata instance that will be used to apply constraints and transformations to the data.
    • learn_rounding_scheme (bool): Define rounding scheme for FloatFormatter. If True, the data returned by reverse_transform will be rounded to that place. Defaults to True.
    • enforce_min_max_values (bool): Specify whether or not to clip the data returned by reverse_transform of the numerical transformer, FloatFormatter, to the min and max values seen during fit. Defaults to True.
    • model_kwargs (dict): Dictionary specifying the kwargs that need to be used in each tabular model when working on this table. This dictionary contains as keys the name of the TabularModel class and as values a dictionary containing the keyword arguments to use. This argument exists mostly to ensure that the models are fitted using the same arguments when the same DataProcessor is used to fit different model instances on different slices of the same table.
  • During the init we should create a constraints attribute that loads all of the constraint instance from the SingleTableMetadata
    • It can be done similarly to what's done here except the constraints should always be loaded from the json.
@amontanez24 amontanez24 added the feature request Request for a new feature label Aug 9, 2022
@amontanez24 amontanez24 added this to the 1.0.0 milestone Aug 16, 2022
@amontanez24 amontanez24 self-assigned this Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant