# User Story B
**Privacy Enhancing Data Generation and Evaluation**

This demo will show how to generate and evaluate privacy-enhanced data using `PETsARD`.

In this demonstration, you, as the user, already possess a data file locally, and `PETsARD` will assist you in loading that file and then generating a privacy-enhanced version of it.

本示範將展示如何使用 `PETsARD` 生成與評測隱私強化資料。

在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案，而 `PETsARD` 將幫助您讀取該檔案、生成經隱私強化後的版本、最終評測。

---

## Environment

In [1]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
print(path_petsard)
sys.path.append(path_petsard)
# setting for pretty priny YAML
pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)


from PETsARD import Executor

/Users/justyn.chen/Dropbox/310_Career_工作/20231016_NICS_資安院/41_PETsARD/PETsARD


---

## User Story B-1
**Default Evaluating**

Following User Story A, if users enable the "evaluate" step ,  the evaluation module will create a report covering default privacy risk and utility metrics.

根據用戶故事 A，如果使用者啟用了 "evaluate" 步驟，評估模組會產生涵蓋預設的隱私風險與效用指標的報告。

In [2]:
config_file = '../yaml/User Story B-1.yaml'

with open(config_file, 'r') as yaml_file:
    yaml_raw: dict = yaml.safe_load(yaml_file)
pp.pprint(yaml_raw)

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Evaluator': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story B-1',
                            'source': 'Postprocessor'},
              'save_report_global': {'method': 'save_report',
                                     'output': 'User Story B-1',
                                     'eval': 'demo',
                                     'granularity': 'global'}}}


In [3]:
exec = Executor(config=config_file)
exec.run()

Now is Loader with adult...
Now is Preprocessor with demo...
[I 20241108 10:30:38] MediatorMissing is created.
[I 20241108 10:30:38] MediatorOutlier is created.
[I 20241108 10:30:38] MediatorEncoder is created.
[I 20241108 10:30:38] missing fitting done.
[I 20241108 10:30:38] <PETsARD.processor.mediator.MediatorMissing object at 0x323c7ea10> fitting done.
[I 20241108 10:30:38] outlier fitting done.
[I 20241108 10:30:38] <PETsARD.processor.mediator.MediatorOutlier object at 0x3255047f0> fitting done.
[I 20241108 10:30:38] encoder fitting done.
[I 20241108 10:30:38] <PETsARD.processor.mediator.MediatorEncoder object at 0x325504820> fitting done.
[I 20241108 10:30:38] scaler fitting done.
[I 20241108 10:30:38] missing transformation done.
[I 20241108 10:30:38] <PETsARD.processor.mediator.MediatorMissing object at 0x323c7ea10> transformation done.
[I 20241108 10:30:38] outlier transformation done.
[I 20241108 10:30:38] <PETsARD.processor.mediator.MediatorOutlier object at 0x3255047f0> tran



[I 20241108 10:30:39] {'EVENT': 'Fit processed data', 'TIMESTAMP': datetime.datetime(2024, 11, 8, 10, 30, 39, 287299), 'SYNTHESIZER CLASS NAME': 'GaussianCopulaSynthesizer', 'SYNTHESIZER ID': 'GaussianCopulaSynthesizer_1.17.1_426eaa16f120441093dd06129c8247d5', 'TOTAL NUMBER OF TABLES': 1, 'TOTAL NUMBER OF ROWS': 26933, 'TOTAL NUMBER OF COLUMNS': 15}
[I 20241108 10:30:39] Fitting GaussianMultivariate(distribution="{'age': <class 'copulas.univariate.beta.BetaUnivariate'>, 'workclass': <class 'copulas.univariate.beta.BetaUnivariate'>, 'fnlwgt': <class 'copulas.univariate.beta.BetaUnivariate'>, 'education': <class 'copulas.univariate.beta.BetaUnivariate'>, 'educational-num': <class 'copulas.univariate.beta.BetaUnivariate'>, 'marital-status': <class 'copulas.univariate.beta.BetaUnivariate'>, 'occupation': <class 'copulas.univariate.beta.BetaUnivariate'>, 'relationship': <class 'copulas.univariate.beta.BetaUnivariate'>, 'race': <class 'copulas.univariate.beta.BetaUnivariate'>, 'gender': <cla

  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).

(2/2) Evaluating Column Pair Trends: |██████████| 105/105 [00:00<00:00, 283.07it/s]|
Column Pair Trends Score: 60.43%

Overall Score (Average): 77.46%

Now is Reporter with save_data...
Now is User Story B-1_Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo] save to csv...


  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).

Now is Reporter with save_report_global...
Now is User Story B-1[Report]_demo_[global] save to csv...


In [4]:
pp.pprint(exec.get_result())

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]':        age         workclass  fnlwgt     education  educational-num  \
0       57           Private  228611  Some-college               15   
1       37         Local-gov  186314       HS-grad               10   
2       37         State-gov  225306       HS-grad                7   
3       38           Private  186673       HS-grad               10   
4       42           Private   76036          10th               12   
...    ...               ...     ...           ...              ...   
48837   28           Private   52763       HS-grad                8   
48838   33           Private  149830       HS-grad               11   
48839   47           Private  206575          10th                9   
48840   49  Self-emp-not-inc   48361       HS-grad               13   
48841   22           Private   83

---

## User Story B-2
**Customized Evaluating**

Following User Story B-1, if specific types of metrics are set or a customized evaluation script is provided, the module will create a customized evaluation report.

根據用戶故事 B-1，如果指定特定的指標、或是提供用戶自定義的評估腳本，模組會產生客製化的評估報告。

In [5]:
config_file = '../yaml/User Story B-2.yaml'

with open(config_file, 'r') as yaml_file:
    yaml_raw: dict = yaml.safe_load(yaml_file)
pp.pprint(yaml_raw)

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Evaluator': {'custom': {'method': 'custom_method', 'custom_method': {...}}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story B-2',
                                     'eval': 'custom',
                                     'granularity': 'global'},
              'save_report_columnwise': {'method': 'save_report',
                                         'output': 'User Story B-2',
                                         'eval': 'custom',
                                         'granularity': 'columnwise'},
              'save_report_pairwise': {'method': 'save_report',
                                       'output': 'User Story B-2',
                      

In [6]:
exec = Executor(config=config_file)
exec.run()

Now is Loader with adult...
Now is Preprocessor with demo...
[I 20241108 10:30:42] MediatorMissing is created.
[I 20241108 10:30:42] MediatorOutlier is created.
[I 20241108 10:30:42] MediatorEncoder is created.
[I 20241108 10:30:42] missing fitting done.
[I 20241108 10:30:42] <PETsARD.processor.mediator.MediatorMissing object at 0x32553f8b0> fitting done.
[I 20241108 10:30:42] outlier fitting done.
[I 20241108 10:30:42] <PETsARD.processor.mediator.MediatorOutlier object at 0x32553ee60> fitting done.
[I 20241108 10:30:42] encoder fitting done.
[I 20241108 10:30:42] <PETsARD.processor.mediator.MediatorEncoder object at 0x32553ef50> fitting done.
[I 20241108 10:30:42] scaler fitting done.
[I 20241108 10:30:42] missing transformation done.
[I 20241108 10:30:42] <PETsARD.processor.mediator.MediatorMissing object at 0x32553f8b0> transformation done.
[I 20241108 10:30:42] outlier transformation done.
[I 20241108 10:30:42] <PETsARD.processor.mediator.MediatorOutlier object at 0x32553ee60> tran



[I 20241108 10:30:42] No rounding scheme detected for column 'income'. Data will not be rounded.
[I 20241108 10:30:42] {'EVENT': 'Fit processed data', 'TIMESTAMP': datetime.datetime(2024, 11, 8, 10, 30, 42, 825036), 'SYNTHESIZER CLASS NAME': 'GaussianCopulaSynthesizer', 'SYNTHESIZER ID': 'GaussianCopulaSynthesizer_1.17.1_6e56dd12425b4f42983c7b24dd8e2b99', 'TOTAL NUMBER OF TABLES': 1, 'TOTAL NUMBER OF ROWS': 26933, 'TOTAL NUMBER OF COLUMNS': 15}
[I 20241108 10:30:42] Fitting GaussianMultivariate(distribution="{'age': <class 'copulas.univariate.beta.BetaUnivariate'>, 'workclass': <class 'copulas.univariate.beta.BetaUnivariate'>, 'fnlwgt': <class 'copulas.univariate.beta.BetaUnivariate'>, 'education': <class 'copulas.univariate.beta.BetaUnivariate'>, 'educational-num': <class 'copulas.univariate.beta.BetaUnivariate'>, 'marital-status': <class 'copulas.univariate.beta.BetaUnivariate'>, 'occupation': <class 'copulas.univariate.beta.BetaUnivariate'>, 'relationship': <class 'copulas.univariat

In [7]:
pp.pprint(exec.get_result())

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[custom]_Reporter[save_report_global]': {'custom_[global]':                                            full_expt_name Loader Preprocessor  \
result  Loader[adult]_Preprocessor[demo]_Synthesizer[d...  adult         demo   

       Synthesizer Postprocessor        Evaluator  custom_score  
result        demo          demo  custom_[global]           100  },
 'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[custom]_Reporter[save_report_columnwise]': {'custom_[columnwise]':                                        full_expt_name Loader Preprocessor  \
0   Loader[adult]_Preprocessor[demo]_Synthesizer[d...  adult         demo   
1   Loader[adult]_Preprocessor[demo]_Synthesizer[d...  adult         demo   
2   Loader[adult]_Preprocessor[demo]_Synthesizer[d...  adult         demo   
3   Loader[adult]_Preprocessor[demo]_Synthesizer[d...  adult         demo   
4   Loader[adult]_Preproce