Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymization mappings leak #187

Closed
csala opened this issue Sep 4, 2020 · 0 comments · Fixed by #188
Closed

Anonymization mappings leak #187

csala opened this issue Sep 4, 2020 · 0 comments · Fixed by #188
Assignees
Milestone

Comments

@csala
Copy link
Contributor

csala commented Sep 4, 2020

  • SDV version: 0.4.0

Description

In version 0.4.0, the dictionaries of value equivalences used for data tabular anonymization are stored within the Table metadata, which can lead to a disclosure of the original values if the model is saved in a pickle file and shipped to the synthetic data recipients.

In order to fix this, the anonymization mappings should be stored in a dictionary that is stored somewhere outside from the Table instance, so it is erased and lost once the Python process in which the training process took place ends.

Steps to reproduce

In [1]: from sdv.demo import load_tabular_demo                                                                                        
In [2]: demo = load_tabular_demo()                                                                                                    
In [3]: from sdv.tabular import GaussianCopula                                                                                        
In [4]: model = GaussianCopula(anonymize_fields={'name': 'name'})                                                                     
In [5]: model.fit(demo)                                                                                                               
In [6]: model.save('model.pkl')                                                                                                       
In [7]: loaded_model = GaussianCopula.load('model.pkl')                                                                               
In [8]: metadata = loaded_model.get_metadata()                                                                                        
In [9]: metadata._anonymization_mappings                                                                                              
Out[9]: 
{'name': {'Dr. Tammy White': 'Philip Gould',
  'Susan Brock DDS': 'Alexandra Long',
  'Dr. Mary Warren': 'Ronald Cox',
  'Kristine Garner': 'Wendy Sharp',
  'Eric Clark': 'Sharon Smith',
  'Ariel Peterson': 'Dr. Jerry Anderson',
  'Terry Vargas': 'Mary Payne',
  'Ethan Palmer': 'Ashley Carter',
  'Steven Evans': 'Kelsey Jimenez',
  'Jesse Freeman MD': 'Melanie Meyers',
  'Judith Garcia': 'Jessica Olsen',
  'Cindy Hendricks': 'William Martinez'}}
@csala csala self-assigned this Sep 4, 2020
@csala csala added this to the 0.4.1 milestone Sep 4, 2020
@csala csala closed this as completed in #188 Sep 4, 2020
JonathanDZiegler pushed a commit to JonathanDZiegler/SDV that referenced this issue Feb 7, 2022
* Add working addons

* Add eradicate

* Add dlint

* Decrease complexity (sdv-dev#184)

* Add addon (sdv-dev#186)

* Add `pytest-style` (sdv-dev#192)

* Add addon

* Fix randomized error message

* Add addon (sdv-dev#188)

* Add addon (#191)

* Add `pandas-vet` (sdv-dev#190)

* Add addon

* noqa torch.stack

* remove double quotes (sdv-dev#187)

* Add addon (sdv-dev#185)

* Add `flake8-docstrings` (sdv-dev#193)

* Add addon

* Fix D100

* Add more docstrings

* Fix docstrings

* Update docstrings

* Fix lint

* Add `flake8-builtins` (sdv-dev#189)

* Add addon

* Add variables-names

* Fix bug

* Fix mistakes

* Add `flake8-multiline-containers` (sdv-dev#183)

* Add addon

* Add addon

* Address feedback

* Fix lint

* Fix bugs

* Remove pydoclint

* Ignore D101 errors

* Update ignores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant