The `IndependentSynthesizer` should follow the sdtypes in the metadata (not the data's dtypes) #249

npatki · 2023-06-05T15:49:49Z

Environment Details

SDGym version: 0.6.0 (latest)

What is expected

The IndependentSynthesizer is expected to independently model each column.

For numerical or datetime sdtypes, it should learn a univariate GMM during fit. Then during sample, it can create data from it.
For categorical or boolean sdtypes, it should learn the frequencies of each category. Then during sample, it can create data using those frequencies as weights.
For other sdtypes (such as id, pii, etc.), it can simply use the RegexGenerator or AnonymizedFaker to generate values from scratch (no learning is expected)

How does this synthesizer know which type is which? It should use the provided metadata as the ground source of truth.

What is actually observed

Similar to the UniformSynthesizer (see #248), this synthesizer just lets the RDT HyperTransformer decide which column is which sdtype (based on the data).

It should be referencing the metadata, since the metadata is the source of truth.

The text was updated successfully, but these errors were encountered:

npatki added the bug Something isn't working label Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `IndependentSynthesizer` should follow the sdtypes in the metadata (not the data's dtypes) #249

The `IndependentSynthesizer` should follow the sdtypes in the metadata (not the data's dtypes) #249

npatki commented Jun 5, 2023 •

edited

Loading

The IndependentSynthesizer should follow the sdtypes in the metadata (not the data's dtypes) #249

The IndependentSynthesizer should follow the sdtypes in the metadata (not the data's dtypes) #249

Comments

npatki commented Jun 5, 2023 • edited Loading

Environment Details

What is expected

What is actually observed

The `IndependentSynthesizer` should follow the sdtypes in the metadata (not the data's dtypes) #249

The `IndependentSynthesizer` should follow the sdtypes in the metadata (not the data's dtypes) #249

npatki commented Jun 5, 2023 •

edited

Loading