Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use custom constraint transforms for certain columns (inconsistent ordering in forward vs. reverse) #1476

Closed
npatki opened this issue Jun 22, 2023 · 0 comments · Fixed by #1511
Assignees
Labels
bug Something isn't working feature:constraints Related to inputting rules or business logic
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jun 22, 2023

Environment Details

  • SDV version: 1.2.0 (latest)
  • Python version: Any
  • Operating System: Any

Error Description

Sometimes, I may have a primary key or PII column that is involved in a constraint. If I try to apply a custom constraint for it, it always falls back to the reject sampling strategy (using is_valid). It never uses the transform strategy (using transform and reverse_transform).

Root Cause

During the Data Processor's reverse transform, we are handling constraints first -- even before the primary keys and anonymized columns are created. This means that the constraints cannot access primary keys or PII columns.

for constraint in reversed(self._constraints_to_reverse):
reversed_data = constraint.reverse_transform(reversed_data)
num_rows = len(reversed_data)
sampled_columns = list(reversed_data.columns)
missing_columns = [
column
for column in self.metadata.columns.keys() - set(sampled_columns + self._keys)
if self._hyper_transformer.field_transformers.get(column)
]
if missing_columns and num_rows:
anonymized_data = self._hyper_transformer.create_anonymized_columns(
num_rows=num_rows,
column_names=missing_columns
)
sampled_columns.extend(missing_columns)
if self._keys and num_rows:
generated_keys = self.generate_keys(num_rows, reset_keys)
sampled_columns.extend(self._keys)

The expected order is to do the constraints last.

Note that the reverse should always be in the opposite order as the forward transform. The forward happens in this order: (constraints -> primary key -> PII columns). See code

@npatki npatki added bug Something isn't working new Automatic label applied to new issues feature:constraints Related to inputting rules or business logic and removed new Automatic label applied to new issues labels Jun 22, 2023
@amontanez24 amontanez24 added this to the 1.2.2 milestone Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature:constraints Related to inputting rules or business logic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants