HMASynthesizer diagnostic score is not 1.0 when using 'truncnorm'
distribution
#1831
Labels
Milestone
'truncnorm'
distribution
#1831
Environment Details
Error Description
If I update the default distribution to
'truncnorm'
, then the HMASynthesizer creates synthetic data that is not completely valid. When running the diagnostic report, the Data Validity score is not 100% -- because there are extra NaN/NaT values that appear in the synthetic data.Steps to reproduce
Replicate this using the attached metadata and data.
test_data.zip
test_metadata.json
OUTPUT:
At first, you'll see many warnings originating by truncated gaussian during modeling:
Then during sampling, there are more warnings that the transformed data (coming directly from ML models) contain null values and therefore overall synthetic data (after reverse sampling) will also have null values.
Finally, the diagnostic is not 100%:
Additional Context
This was first observed in #1755
The text was updated successfully, but these errors were encountered: