-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Problem Description
SDGym is designed to be able to benchmark synthesizers. SDV synthesizers are natively supported, while external, 3rd party synthesizers can be integrated as a custom synthesizer.
Currently, the main publicly-available SDV synthesizers are natively supported in SDGym. However, this does not apply to any synthesizers in SDV Enterprise or bundles (eg. the SegmentSynthesizer from the XSynthesizers bundle). The user would have to integrate these manually as custom synthesizers.
Expected behavior
I expect that SDGym should be able to automatically discover any single- or multi-table SDV synthesizer that I have access to in my Python environment.
Single table synthesizers: SDGym should be able to search for the synthesizer name in the sdv.single_table namespace. If that synthesizer name exists, then it should be able to load it into the appropriate format (eg. see this base class). As a result, a user should be able to benchmark any SDV single-table synthesizer that they have access to by providing the string of its name.
# assuming I have SDV Enterprise and the XSynthesizers bundle already installed in my environment
# I should be able to benchmark the synthesizers by inputting their names
sdgym.benchmark_single_table(
synthesizers=['SegmentSynthesizer', 'XGCSynthesizer']
)Additionally, I should be able to create variants of these types of synthesizers following this guide.
from sdgym import create_sdv_synthesizer_variant
BiSegmentSyntheiszer = create_sdv_synthesizer_variant(
synthesizer_class='SegmentSynthesizer',
synthesizer_parameters={ 'n_segments': 2 }
display_name='BiSegmentSynthesizer'
)Multi table synthesizers: SDGym should also be able to search for synthesizer names in the sdv.multi_table namespace. In this case, it should be able to find and load in the synthesizer in a similar format to single-table but with some modifications:
_get_trained_synthesizerfunction: The input parameterdatashould be a dictionary of dataframes instead of a single pandas DataFrame_sample_from_synthesizerfunction: Instead ofn_samples, the parameter it accepts should bescale.
Note that multi-table benchmarking is not currently supported in SDGym but we hope to add support for it in the future.
Additional context
We should also generally clean up this file when making the changes.
- The nomenclature is "single-table" and "multi-table". Get rid of any references to "tabular" or "relational" (these were the old names)
- None of the SDV synthesizers should be hard-coded (eg. GaussianCopulaSynthesizer, CTANSynthesizers, etc.). All of them should be able to be dynamically discovered from
sdv.single_tableandsdv.multi_tablemodules - We can remove the sequential code for now.