Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Multi Table Modeling #1387

Closed
amontanez24 opened this issue Apr 22, 2023 · 0 comments · Fixed by #1403
Closed

Refactor Multi Table Modeling #1387

amontanez24 opened this issue Apr 22, 2023 · 0 comments · Fixed by #1403
Assignees
Labels
internal The issue doesn't change the API or functionality
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

As an engineer, it would be nice if there were clear abstractions in multi table modeling that could be used to add new multi table synthesizers.

Screen Shot 2023-04-22 at 4 47 21 PM

The image above lays out the generalized common modeling steps all multi table synthesizers should use. Currently, the HMASynthesizer breaks this abstraction as it models each table in the middle box while doing its extending. This can be seen on the following lines of HMASynthesizer._model_table.

SDV/sdv/multi_table/hma.py

Lines 198 to 199 in 904aa2d

if not table.empty:
self._table_synthesizers[table_name].fit_processed_data(table)

Expected behavior

To match the desired abstraction we should take the following steps:

  1. Add an abstract method to the BaseMultiTableSynthesizer class called _augment_tables. This method should be implemented in any child class and be responsible for augmenting tables with the necessary information to model them.
  2. Add another abstract method called _model_tables to the base. This method should just be used to loop through and model tables.
  3. Refactor the HSASynthesizer to break up modeling into these two methods.
  4. BaseMultiTableSynthesizer.fit will now call preprocess -> _augment_tables -> _model_tables

Additional context

  • In HMA, we only need to model the root table. Currently we model every table. This change can be made in the new _model_tables implementation.
@amontanez24 amontanez24 added the internal The issue doesn't change the API or functionality label Apr 22, 2023
@amontanez24 amontanez24 added this to the 1.1.0 milestone May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal The issue doesn't change the API or functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants