Add preprocess and fit methods to multi-table synthesizers #1074

amontanez24 · 2022-10-19T23:49:54Z

Problem Description

As a user, it would be useful to be able to preprocess my data in a separate step from modeling. It would also be helpful to do this from the multi-table level.

Acceptance criteria

Add the following methods:

preprocess(data)
- data is a dictionary mapping each table name to a pandas.dataFrame
- This method should essentially loop through the single table synthesizers for each table and call preprocess on them with the proper data
- It should return a dictionary mapping each table name to the transformed data
- This method can be added to the BaseMultiTableSynthesizer
- It should only raise one warning if any of the synthesizers have been fit. The warning should read:
  Warning: This synthesizer has already been fit. To use the new preprocessed data, please refit the synthesizer using 'fit' or 'fit_processed_data'
fit_processed_data(processed_data)
- processed_data is a dictionary mapping each table name to a pandas.dataFrame. This data should have already been ran through he data processor.
- This method will be specific to each MultiTableSynthesizer, so for now only needs to be implemented in the HMASynthesizer.
fit(data)
- data is a dictionary mapping each table name to a pandas.dataFrame
- should call preprocess and then fit_processed_data

Expected behavior

preprocess
- This method should essentially loop through each table and call SingleTableSynthesizer.preprocess with the correct data
fit_processed_data(processed_data)
- This is where the current HMA algorithm should take place. Each child table should be modeled and then the parameters for that model should be used to extend the table of the parent until eventually the parent is modeled. The code in hma should be reviewed as influence.

Additional context

It is a requirement that the primary keys be available to the MultiTableSynthesizer before it fits the models. This should be satisfied as the DataProcessor now makes the primary key the index during transform
There is a slight change in the workflow from what happens in hma. We now transform each table first, and then will be calling the fit method for each model and extending the tables with model parameters of the child table.

The text was updated successfully, but these errors were encountered:

amontanez24 added the feature request Request for a new feature label Oct 19, 2022

amontanez24 added this to the 1.0.0 milestone Oct 19, 2022

pvk-developer mentioned this issue Nov 7, 2022

Add preprocess and fit to MultiTable synthesizers #1093

Merged

amontanez24 closed this as completed Dec 1, 2022

amontanez24 assigned pvk-developer Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add preprocess and fit methods to multi-table synthesizers #1074

Add preprocess and fit methods to multi-table synthesizers #1074

amontanez24 commented Oct 19, 2022 •

edited

Loading

Add preprocess and fit methods to multi-table synthesizers #1074

Add preprocess and fit methods to multi-table synthesizers #1074

Comments

amontanez24 commented Oct 19, 2022 • edited Loading

Problem Description

Acceptance criteria

Expected behavior

Additional context

amontanez24 commented Oct 19, 2022 •

edited

Loading