# User Story B
**Privacy Enhancing Data Generation and Evaluation**

This demo will show how to generate and evaluate privacy-enhanced data using `PETsARD`.

In this demonstration, you, as the user, already possess a data file locally, and `PETsARD` will assist you in loading that file and then generating a privacy-enhanced version of it.

本示範將展示如何使用 `PETsARD` 生成與評測隱私強化資料。

在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案，而 `PETsARD` 將幫助您讀取該檔案、生成經隱私強化後的版本、最終評測。

---

## Environment

In [1]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
print(path_petsard)
sys.path.append(path_petsard)
# setting for pretty priny YAML
pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)


from PETsARD import Executor

d:\Dropbox\89_other_application\GitHub\PETsARD


---

## User Story B-1
**Default Evaluating**

Following User Story A, if users enable the "evaluate" step ,  the evaluation module will create a report covering default privacy risk and utility metrics.

根據用戶故事 A，如果使用者啟用了 "evaluate" 步驟，評估模組會產生涵蓋預設的隱私風險與效用指標的報告。

In [2]:
config_file = '../yaml/User Story B-1.yaml'

with open(config_file, 'r') as yaml_file:
    yaml_raw: dict = yaml.safe_load(yaml_file)
pp.pprint(yaml_raw)

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Evaluator': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story B-1',
                            'source': 'Postprocessor'},
              'save_report_global': {'method': 'save_report',
                                     'output': 'User Story B-1',
                                     'eval': 'demo',
                                     'granularity': 'global'}}}


In [3]:
exec = Executor(config=config_file)
exec.run()

Now is Loader with adult...
Now is Preprocessor with demo...
Now is Synthesizer with demo...
Synthesizer (SDV - SingleTable): Metafile loading time: 0.0342 sec.
Synthesizer (SDV - SingleTable): Fitting GaussianCopula.




Synthesizer (SDV - SingleTable): Fitting  GaussianCopula spent 8.9661 sec.
Synthesizer (SDV - SingleTable): Sampling GaussianCopula # 26933 rows (same as raw) in 2.3417 sec.
Now is Postprocessor with demo...
Now is Evaluator with demo...
Generating report ...
(1/2) Evaluating Column Shapes: : 100%|██████████| 15/15 [00:00<00:00, 75.45it/s]
(2/2) Evaluating Column Pair Trends: : 100%|██████████| 105/105 [00:05<00:00, 18.90it/s]

Overall Score: 77.91%

Properties:
- Column Shapes: 94.17%
- Column Pair Trends: 61.65%
Now is Reporter with save_data...
Now is User Story B-1_Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo] save to csv...
Now is Reporter with save_report_global...
Now is User Story B-1[Report]_demo_[global] save to csv...


In [4]:
pp.pprint(exec.get_result())

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]':              age     workclass         fnlwgt   education  educational-num  \
0      41.492932  Self-emp-inc  178964.002280        10th         8.948864   
1      46.614884           NaN  229992.412898   Bachelors         7.279040   
2      38.168388       Private  236256.520834     HS-grad        11.865700   
3      31.367659       Private  151499.780972     HS-grad         8.067272   
4      31.586065           NaN  122882.387591  Assoc-acdm        13.751945   
...          ...           ...            ...         ...              ...   
26928  31.265317       Private  167305.279996   Assoc-voc        12.833674   
26929  36.116559       Private  111753.192300     HS-grad        11.743427   
26930  57.438484       Private   15476.459230     HS-grad         8.800826   
26931  44.738893           NaN   8

---

## User Story B-2
**Customized Evaluating**

Following User Story B-1, if specific types of metrics are set or a customized evaluation script is provided, the module will create a customized evaluation report.

根據用戶故事 B-1，如果指定特定的指標、或是提供用戶自定義的評估腳本，模組會產生客製化的評估報告。

In [5]:
config_file = '../yaml/User Story B-2.yaml'

with open(config_file, 'r') as yaml_file:
    yaml_raw: dict = yaml.safe_load(yaml_file)
pp.pprint(yaml_raw)

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Evaluator': {'custom': {'method': 'custom_method', 'custom_method': {...}}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story B-2',
                                     'eval': 'custom',
                                     'granularity': 'global'},
              'save_report_columnwise': {'method': 'save_report',
                                         'output': 'User Story B-2',
                                         'eval': 'custom',
                                         'granularity': 'columnwise'},
              'save_report_pairwise': {'method': 'save_report',
                                       'output': 'User Story B-2',
                      

In [6]:
exec = Executor(config=config_file)
exec.run()

Now is Loader with adult...
Now is Preprocessor with demo...
Now is Synthesizer with demo...
Synthesizer (SDV - SingleTable): Metafile loading time: 0.0312 sec.
Synthesizer (SDV - SingleTable): Fitting GaussianCopula.




Synthesizer (SDV - SingleTable): Fitting  GaussianCopula spent 9.7033 sec.
Synthesizer (SDV - SingleTable): Sampling GaussianCopula # 26933 rows (same as raw) in 1.8968 sec.
Now is Postprocessor with demo...
Now is Evaluator with custom...
Now is Reporter with save_report_global...
Now is User Story B-2[Report]_custom_[global] save to csv...
Now is Reporter with save_report_columnwise...
Now is User Story B-2[Report]_custom_[columnwise] save to csv...
Now is Reporter with save_report_pairwise...
Now is User Story B-2[Report]_custom_[pairwise] save to csv...


In [7]:
pp.pprint(exec.get_result())

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[custom]_Reporter[save_report_global]': {'custom_[global]':                                            full_expt_name  score
result  Loader[adult]_Preprocessor[demo]_Synthesizer[d...    100},
 'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[custom]_Reporter[save_report_columnwise]': {'custom_[columnwise]':                                        full_expt_name           column  score
0   Loader[adult]_Preprocessor[demo]_Synthesizer[d...              age    100
1   Loader[adult]_Preprocessor[demo]_Synthesizer[d...        workclass    100
2   Loader[adult]_Preprocessor[demo]_Synthesizer[d...           fnlwgt    100
3   Loader[adult]_Preprocessor[demo]_Synthesizer[d...        education    100
4   Loader[adult]_Preprocessor[demo]_Synthesizer[d...  educational-num    100
5   Loader[adult]_Preprocessor[demo]_Synthesizer[d...   marital-status    100
6   Loader[adult]_Preprocessor