# User Story A
**Privacy Enhancing Data Generation**

This demo will show how to generate privacy-enhanced data using `PETsARD`.

In this demonstration, you, as the user, already possess a data file locally, and `PETsARD` will assist you in loading that file and then generating a privacy-enhanced version of it.

At the same time, privacy-enhancing algorithms often have format restrictions and require specific pre-processing and post-processing procedures to function correctly. However, `PETsARD` has taken this into account for the user. `PETsARD` offers both default and customizable preprocessing and postprocessing workflows to help users get started quickly.

本示範將展示如何使用 `PETsARD` 生成隱私強化資料。

在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案，而 `PETsARD` 將幫助您讀取該檔案、然後生成經隱私強化後的版本。

同時，隱私強化演算法通常都有格式的限制，必須經過特定的前處理 (Pre-processing) 與後處理 (Post-processing) 程序才能正確運作，但 `PETsARD` 已經為使用者考慮到這點，`PETsARD` 提供預設與可客製化的前後處理流程，幫助使用者快速上手。

---

## Environment

In [1]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
sys.path.append(path_petsard)
# setting for pretty priny YAML
pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)


from petsard import Executor

---

## User Story A-1
**Default Synthesizing**

Given an original dataset without specified algorithm, the pipeline will generate a list of privacy enhanced datasets using the default algorithms.

給定一個原始資料集、但未指定演算法，該流程會利用預設的演算法生成一組隱私強化資料集。

In [2]:
config_file = '../yaml/User Story A-1.yaml'

with open(config_file, 'r') as yaml_file:
    yaml_raw: dict = yaml.safe_load(yaml_file)
pp.pprint(yaml_raw)

{'Loader': {'adult': {'filepath': '../benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story A-1',
                            'source': 'Postprocessor'}}}


In [3]:
exec = Executor(config=config_file)
exec.run()

Now is Loader with adult...
Now is Preprocessor with demo...
[I 20241225 10:16:12] MediatorMissing is created.
[I 20241225 10:16:12] MediatorOutlier is created.
[I 20241225 10:16:12] MediatorEncoder is created.
[I 20241225 10:16:12] missing fitting done.
[I 20241225 10:16:12] <petsard.processor.mediator.MediatorMissing object at 0x328b88bb0> fitting done.
[I 20241225 10:16:12] outlier fitting done.
[I 20241225 10:16:12] <petsard.processor.mediator.MediatorOutlier object at 0x328b8ab30> fitting done.
[I 20241225 10:16:12] encoder fitting done.
[I 20241225 10:16:12] <petsard.processor.mediator.MediatorEncoder object at 0x328b8ab60> fitting done.
[I 20241225 10:16:12] scaler fitting done.
[I 20241225 10:16:12] missing transformation done.
[I 20241225 10:16:12] <petsard.processor.mediator.MediatorMissing object at 0x328b88bb0> transformation done.
[I 20241225 10:16:12] outlier transformation done.
[I 20241225 10:16:12] <petsard.processor.mediator.MediatorOutlier object at 0x328b8ab30> tran



[I 20241225 10:16:12] {'EVENT': 'Fit processed data', 'TIMESTAMP': datetime.datetime(2024, 12, 25, 10, 16, 12, 802322), 'SYNTHESIZER CLASS NAME': 'GaussianCopulaSynthesizer', 'SYNTHESIZER ID': 'GaussianCopulaSynthesizer_1.17.1_8fbf2fb0a0114021a3fade00955475dc', 'TOTAL NUMBER OF TABLES': 1, 'TOTAL NUMBER OF ROWS': 26933, 'TOTAL NUMBER OF COLUMNS': 15}
[I 20241225 10:16:12] Fitting GaussianMultivariate(distribution="{'age': <class 'copulas.univariate.beta.BetaUnivariate'>, 'workclass': <class 'copulas.univariate.beta.BetaUnivariate'>, 'fnlwgt': <class 'copulas.univariate.beta.BetaUnivariate'>, 'education': <class 'copulas.univariate.beta.BetaUnivariate'>, 'educational-num': <class 'copulas.univariate.beta.BetaUnivariate'>, 'marital-status': <class 'copulas.univariate.beta.BetaUnivariate'>, 'occupation': <class 'copulas.univariate.beta.BetaUnivariate'>, 'relationship': <class 'copulas.univariate.beta.BetaUnivariate'>, 'race': <class 'copulas.univariate.beta.BetaUnivariate'>, 'gender': <cl

In [4]:
pp.pprint(exec.get_result())

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]':        age         workclass  fnlwgt     education  educational-num  \
0       57           Private  229368       HS-grad               15   
1       32           Private  174050       HS-grad               11   
2       45         State-gov  219287     Bachelors                6   
3       41           Private  194702  Some-college               10   
4       36         Local-gov   63387     Assoc-voc               12   
...    ...               ...     ...           ...              ...   
48837   37           Private   58509     Bachelors                8   
48838   42               nan  116218       HS-grad               11   
48839   22           Private  229743       HS-grad               10   
48840   54  Self-emp-not-inc   45656  Some-college               12   
48841   38               nan   81947    Assoc-acd

---

## User Story A-2
**Customized Synthesizing**

Given an original dataset, specified privacy enhancing data generation algorithms and parameters, the pipeline will generate a privacy enhanced dataset.

給定一個原始資料集，並指定隱私強化技術生成演算法與參數，該流程會依此產生隱私強化資料集。

In [5]:
config_file = '../yaml/User Story A-2.yaml'

with open(config_file, 'r') as yaml_file:
    yaml_raw: dict = yaml.safe_load(yaml_file)
pp.pprint(yaml_raw)

{'Loader': {'adult': {'filepath': '../benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'sdv-gaussian': {'method': 'sdv-single_table-gaussiancopula'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story A-1',
                            'source': 'Postprocessor'}}}


In [6]:
exec = Executor(config=config_file)
exec.run()

Now is Loader with adult...
Now is Preprocessor with demo...
[I 20241225 10:16:15] MediatorMissing is created.
[I 20241225 10:16:15] MediatorOutlier is created.
[I 20241225 10:16:15] MediatorEncoder is created.
[I 20241225 10:16:15] missing fitting done.
[I 20241225 10:16:15] <petsard.processor.mediator.MediatorMissing object at 0x328bc8670> fitting done.
[I 20241225 10:16:15] outlier fitting done.
[I 20241225 10:16:15] <petsard.processor.mediator.MediatorOutlier object at 0x328b89810> fitting done.
[I 20241225 10:16:15] encoder fitting done.
[I 20241225 10:16:15] <petsard.processor.mediator.MediatorEncoder object at 0x328b89420> fitting done.
[I 20241225 10:16:15] scaler fitting done.
[I 20241225 10:16:15] missing transformation done.
[I 20241225 10:16:15] <petsard.processor.mediator.MediatorMissing object at 0x328bc8670> transformation done.
[I 20241225 10:16:15] outlier transformation done.
[I 20241225 10:16:15] <petsard.processor.mediator.MediatorOutlier object at 0x328b89810> tran



[I 20241225 10:16:15] No rounding scheme detected for column 'income'. Data will not be rounded.
[I 20241225 10:16:15] {'EVENT': 'Fit processed data', 'TIMESTAMP': datetime.datetime(2024, 12, 25, 10, 16, 15, 934240), 'SYNTHESIZER CLASS NAME': 'GaussianCopulaSynthesizer', 'SYNTHESIZER ID': 'GaussianCopulaSynthesizer_1.17.1_bf7766ecdab34fc18306a96e8ff6bd84', 'TOTAL NUMBER OF TABLES': 1, 'TOTAL NUMBER OF ROWS': 26933, 'TOTAL NUMBER OF COLUMNS': 15}
[I 20241225 10:16:15] Fitting GaussianMultivariate(distribution="{'age': <class 'copulas.univariate.beta.BetaUnivariate'>, 'workclass': <class 'copulas.univariate.beta.BetaUnivariate'>, 'fnlwgt': <class 'copulas.univariate.beta.BetaUnivariate'>, 'education': <class 'copulas.univariate.beta.BetaUnivariate'>, 'educational-num': <class 'copulas.univariate.beta.BetaUnivariate'>, 'marital-status': <class 'copulas.univariate.beta.BetaUnivariate'>, 'occupation': <class 'copulas.univariate.beta.BetaUnivariate'>, 'relationship': <class 'copulas.univaria

In [7]:
pp.pprint(exec.get_result())

{'Loader[adult]_Preprocessor[demo]_Synthesizer[sdv-gaussian]_Postprocessor[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[sdv-gaussian]_Postprocessor[demo]':        age         workclass  fnlwgt     education  educational-num  \
0       49           Private   93877       HS-grad               15   
1       26           Private  120365       HS-grad               11   
2       34           Private   75685     Bachelors                7   
3       45           Private  162432       HS-grad               10   
4       27           Private  329521    Assoc-acdm               13   
...    ...               ...     ...           ...              ...   
48837   46  Self-emp-not-inc  260922  Some-college                7   
48838   52           Private  281446       HS-grad               10   
48839   32         State-gov  275351       HS-grad               10   
48840   49           Private  202259  Some-college               12   
48841   47           Private  288