Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra anonymized columns are created during reverse_transform_subset #545

Closed
npatki opened this issue Aug 30, 2022 · 1 comment · Fixed by #549
Closed

Extra anonymized columns are created during reverse_transform_subset #545

npatki opened this issue Aug 30, 2022 · 1 comment · Fixed by #549
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Aug 30, 2022

Environment Details

  • RDT version: 1.2.0

Error Description

The reverse_transform_subset method is meant to only reverse transform the columns that are inputted. However, it appears to be creating additional columns from AnonymizedFaker and RegexGenerator transformers.

This is likely happening because these two transformers drop columns on the forward transform.

Steps to reproduce

from rdt import get_demo
from rdt import HyperTransformer

from rdt.transformers.pii import AnonymizedFaker
from rdt.transformers.text import RegexGenerator

# create some data
customers = get_demo()
customers['id'] = ['ID_a', 'ID_b', 'ID_c', 'ID_d', 'ID_e']

# create a config
ht = HyperTransformer()
ht.detect_initial_config(customers)

# credit_card and id are pii and text columns
ht.update_sdtypes({
    'credit_card': 'pii',
    'id': 'text'
})

ht.update_transformers({
    'credit_card': AnonymizedFaker(),
    'id': RegexGenerator(regex_format='id_[a-z]')
})

# transform the data
# this will drop the credit card and id columns
ht.fit(customers)
transformed = ht.transform(customers)

# try to reverse transform only the login column
ht.reverse_transform_subset(transformed[['last_login.value']])

Output

The credit_card and id columns are recreated during reverse_transform_subset even though I only asked to reverse transform the last_login column.

image

I expect that only the last_login column will be created.

@npatki npatki added the bug Something isn't working label Aug 30, 2022
@npatki npatki added this to the 1.3.0 milestone Aug 30, 2022
@npatki
Copy link
Contributor Author

npatki commented Aug 30, 2022

For more info about how to create only anonymized columns, see issue #546

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants