Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update transformer assignment logic for handling pii #1775

Closed
R-Palazzo opened this issue Feb 6, 2024 · 0 comments · Fixed by #1782
Closed

Update transformer assignment logic for handling pii #1775

R-Palazzo opened this issue Feb 6, 2024 · 0 comments · Fixed by #1782
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@R-Palazzo
Copy link
Contributor

Problem Description

Inside the metadata, a user can add a boolean information indicating wether a column is a pii. The current logic is to assign AnonymizedFaker whenever this field is True.

Expected behavior

Now that there are some pii sdtypes that have premium transformers (email/phone number...), those transformers should be assigned regardless on the pii field.
The new logic for pii sdtypes:

  • If a user has access to the premium transformer for this sdtype, assign it.
  • If a user doesn't have access to a premium transformer, then
    • If pii=True, assign AnonymizedFaker
    • If pii=False, assign the default categorical transformer, currently UniformEncoder.

Additional context

After the changes, the code below should assign the premium transformer for premium users and AnonymizedFaker otherwise.

data = pd.DataFrame({
    'email': ['sdv@sdv.dev', 'info@datacebo.com', 'info@gmail.co.uk', None],
    'numerical': [0, 1, 2, 3],
})

metadata = SingleTableMetadata().load_from_dict({
    'columns': {
        'email': {'sdtype': 'email', 'pii': True},
        'numerical': {'sdtype': 'numerical'},
    }
})

# Run
synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
@R-Palazzo R-Palazzo added feature request Request for a new feature new Automatic label applied to new issues and removed new Automatic label applied to new issues labels Feb 6, 2024
@amontanez24 amontanez24 added this to the 1.10.0 milestone Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants