Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enterprise Usage] Unable to assign generic PII transformers (eg. AnonymizedFaker) #674

Closed
npatki opened this issue Jul 27, 2023 · 0 comments · Fixed by #676
Closed

[Enterprise Usage] Unable to assign generic PII transformers (eg. AnonymizedFaker) #674

npatki opened this issue Jul 27, 2023 · 0 comments · Fixed by #676
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jul 27, 2023

Enterprise users have access to additional RDTs that offer them features such as contextual anonymization.

Environment Details

  • SDV-Enterprise version: 0.4.0 (dev release)
    • SDV version: 1.2.1
    • RDT version: 1.6.0

Error Description

As an SDV Enterprise user, I may have access to enterprise sdtypes such as phone_number and email. A synthesizer will automatically assign the enterprise transformers to these columns.

However, for the strictest privacy protection, I may want the option to use pure anonymization instead. I should be able to do this by assigning the generic, publicly available PII transformers such as AnonymizedFaker or PseudoAnonymizedFaker. But when I try to do this, there is an error.

Steps to reproduce

import pandas as pd
import numpy as np
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
from rdt.transformers.pii import AnonymizedFaker

data = pd.DataFrame(data={
    'id': [0, 1, 2],
    'age': [29, 45, 31],
    'phone_number': ['+1(617) 253-3400', '+1(617) 495-1000', np.nan]
})

metadata = SingleTableMetadata.load_from_dict({
    'primary_key': 'id',
    'columns': {
        'id': { 'sdtype': 'id' },
        'age': { 'sdtype': 'numerical' },
        'phone_number': { 'sdtype': 'phone_number' }
    }
})

synth = GaussianCopulaSynthesizer(metadata)
synth.auto_assign_transformers(data)
synth.update_transformers({
    'phone_number': AnonymizedFaker(
        provider_name='phone_number',
        function_name='phone_number')
})
InvalidConfigError: Column 'phone_number' is a phone_number column, which is incompatible with the 'AnonymizedFaker' transformer.

Fix

Change the update transformer logic in the HyperTransformer: If an sdtype is NOT one of ('numerical', 'datetime', 'categorical', 'boolean' or 'text'), then allow the user an option to pass any transformer in the pii module.

@npatki npatki added bug Something isn't working new Label applied to new issues and removed new Label applied to new issues labels Jul 27, 2023
@amontanez24 amontanez24 added this to the 1.6.1 milestone Jul 28, 2023
@amontanez24 amontanez24 self-assigned this Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants