Skip to content

Add benchmark_multi_table function #486

@amontanez24

Description

@amontanez24

Problem Description

As a user I'd like to have a reliable method and set of metrics to use to compare multi table synthesizers.

We want to add a multi table version of benchmark_single_table

Expected behavior

Add new function to the sdgym.benchmark module

def benchmark_multi_table(
    synthesizers=['HMASynthesizer', 'MultiTableUniformSynthesizer'],
    custom_synthesizers=None,
    sdv_datasets=['NBA', 'financial', 'Student_loan', 'Biodegradability', 'fake_hotels', 'restbase', 'airbnb-simplified'],
    additional_datasets_folder=None,
    limit_dataset_size=False,
    compute_quality_score=True,
    compute_diagnostic_score=True,
    timeout=None,
    output_destination=None,
    show_progress=False
):
    """
    Args:
        synthesizers (list[string] | sdgym.synthesizer.BaselineSynthesizer): List of synthesizers to use.
        custom_synthesizers (list[class] or ``None``): Same as single table.
        sdv_datasets (list[str] or ``None``):Names of the SDV demo datasets to use for the benchmark. 
        additional_datasets_folder (str or ``None``): The path to a local folder. Datasets found in this folder are
            run in addition to the SDV datasets. If ``None``, no additional datasets are used.
        limit_dataset_size (bool):
            We should still limit the dataset to 10 columns per table (not including primary/foreign keys). 
            But as for the # of rows: The overall dataset needs to be subsampled with referential integrity.
            We should use the [get_random_subset](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/cleaning-your-data#get_random_subset) function to perform the subsample.
            For the main table, select the table with the larges # of rows; and for num rows, set it to 1000.
        compute_quality_score (bool):
            Whether or not to evaluate an overall quality score. In this case we should use the MultiTableQualityReport.
        compute_diagnostic_score (bool):
            Whether or not to evaluate an overall diagnostic score. In this case we should use the MultiTableDiagnosticReport.
        timeout (int or ``None``):
            The maximum number of seconds to wait for synthetic data creation. If ``None``, no
            timeout is enforced.
        output_destination (str or ``None``):
            The path to the output directory where results will be saved. If ``None``, no
            output is saved.
        show_progress (bool):
            Whether to use tqdm to keep track of the progress. Defaults to ``False``.

    """
    

Changes to storage and artifacts

We should store the artifacts in this new folder structure

output_destination/

|-- single-table 
    |-- SDGym_results_06_24_2025/
          |--- census_06_24_2025/
               |--- CTGANSynthesizer/  
                    |--- CTGANSynthesizer.pkl
                    |--- CTGANSynthesizer_synthetic_data.csv
                    |--- CTGANSynthesizer_benchmark_result.csv
               |--- TVAEynthesizer/  
                    |--- <artifacts>
          |--- expedia_hotel_logs_06_24_2025/
               |--- ...
          |--- meta.yaml
          |--- results.csv
     |--- SDGym_results_07_24_2025/
          |--- ...
|--- multi_table
     |--- SDGym_results_06_24_2025/
          |--- berka_06_24_2025/
               |--- HMASynthesizer/  
                    |--- HMASynthesizer.pkl
                    |--- HMASynthesizer_synthetic_data.zip
                    |--- HMASynthesizer_benchmark_result.csv
               |--- HSASynthesizer/  
                    |--- <artifacts>
          |--- synthea_06_24_2025/
               |--- ...
          |--- meta.yaml
          |--- results.csv
     |--- SDGym_results_07_24_2025/

The main difference is that everything will now be nested in a folder for modality.

Changes to results

  • The result columns should be the same as in the single table case.
  • We should still add adjusted total time and quality score only it should use the MultiTableUniformSynthesizer results instead.

Additional context

  • Don't worry about AWS yet. That will be in Add benchmark_multi_table_aws #487
  • A lot of code will need to be adapted to support the multi-table case.
  • Most functions in this file can be generalized to work for single or multi table synthesizers.
    • We should re-use as much as possible and not just copy it all over and replace single table with multi table everywhere.
  • Code like the following snippet needs to be restructured or abstracted out so that we can easily replace the modality.

    SDGym/sdgym/benchmark.py

    Lines 1073 to 1079 in 95f770e

    if synthesizer not in SDV_SINGLE_TABLE_SYNTHESIZERS:
    ext_lib = EXTERNAL_SYNTHESIZER_TO_LIBRARY.get(synthesizer)
    if ext_lib:
    library_version = version(ext_lib)
    metadata[f'{ext_lib}_version'] = library_version
    elif 'sdv' not in metadata.keys():
    metadata['sdv_version'] = version('sdv')
  • We may end up making classes in the future to benchmark and view results. Imagine someone initializes a benchmark class with whether or not it's single table, if it's going to be run on a cloud, etc.
  • Note that there will be no support for metrics outside of the Quality and Diagnostic Reports

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions