Environment Details
Please indicate the following details about the environment in which you found the bug:
- SDV version:
- Python version:
- Operating System:
Error Description
Sampling can sometimes crash with a FixedCombinations constraint due to a pandas IndexingError. This is triggered when we filter out sampled rows that are invalid for the constraint after reverse transforming (link).
The index of the is_valid Series is meant to match the sampled data index. For FixedCombinations however, we reset the index on the Series, which triggers the error in sample.
Steps to reproduce
In the below example, FixedNullCombinations drops a few invalid rows after reverse transforming. FixedCombinations then causes the error with its is_valid Series because the indices no longer align.
data, metadata = download_demo('single_table', 'expedia_hotel_logs')
constraints = [
FixedCombinations(column_names=['hotel_continent', 'hotel_country'], table_name='expedia_hotel_logs'),
FixedNullCombinations(column_names=['user_location_city', 'orig_destination_distance'], table_name='expedia_hotel_logs'),
]
synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints(constraint_list)
synthesizer.fit(data)
sample = synthesizer.sample(num_rows=len(data))
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Sampling can sometimes crash with a
FixedCombinationsconstraint due to a pandasIndexingError. This is triggered when we filter out sampled rows that are invalid for the constraint after reverse transforming (link).The index of the
is_validSeries is meant to match the sampled data index. ForFixedCombinationshowever, we reset the index on the Series, which triggers the error in sample.Steps to reproduce
In the below example,
FixedNullCombinationsdrops a few invalid rows after reverse transforming.FixedCombinationsthen causes the error with itsis_validSeries because the indices no longer align.