Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contextual Anonymization transformers shouldn't be used for primary keys #1807

Closed
npatki opened this issue Feb 21, 2024 · 0 comments · Fixed by #1841
Closed

Contextual Anonymization transformers shouldn't be used for primary keys #1807

npatki opened this issue Feb 21, 2024 · 0 comments · Fixed by #1841
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 21, 2024

Environment Details

  • SDV version: 1.10.0 (+ SDV Enterprise 0.10.0)
  • Python version: Any
  • Operating System: Any

Error Description

My metadata might list a semantic, PII type as a primary key -- for example an email.

{
  "primary_key": "user_email",
  "columns": {
    "user_email": { "sdtype": "email" },
    ...
  }
}

If I am an SDV Enterprise user and my primary keys are of premium sdtypes (email, phone_number, ...), then SDV will assign the primary key column to a contextual anonymization transformer such as DomainBasedAnonymizer or AnonymizedGeoExtractor. Such transformers are not designed to keep uniqueness in mind, meaning that it may be possible that some primary keys repeat.

Solution

As a quick solution, we should ensure that we do not assign Contextual Anonymization transformers to primary keys. Instead, we should fall back on AnonymizedFaker like we do for public SDV.

(In the future, we can think about how Contextual Anonymization transformers can support uniqueness. We'd likely need to think through a different algorithm for each one.)

@npatki npatki added the bug Something isn't working label Feb 21, 2024
@amontanez24 amontanez24 added this to the 1.12.0 milestone Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants