Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to anonimize data #71

Closed
ManuelAlvarezC opened this issue Nov 16, 2018 · 0 comments
Closed

Add option to anonimize data #71

ManuelAlvarezC opened this issue Nov 16, 2018 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@ManuelAlvarezC
Copy link
Contributor

Sometimes transfomers are used with sensitive data we don't want the reverse_transform to take the values from nor keep them after extracting its distribution.

In order to do so, we should anonimize data before creating CatTransformer.probability_map, so its mapped to new values taken from faker but preserving its original distribution.

To set this option, there would be two flags on rdt.transformers.CatTransformer.fit:

  • anonimize: bool
  • category: str

If the flag pii is set, then the values should be mapped with values generated from faker, before generating the probability_map. The supported values of category should be one of the attributes of faker object, that list includes, but is not limited to:

  • first_name
  • last_name
  • name
  • ssn
  • phone_number
  • email
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant