# User Story C
**Privacy Enhancing Data Evaluation**


This demo will show how to evaluate privacy-enhanced data using `PETsARD`.

In this demonstration, you, as the user, already have a data file on your local machine, as well as its corresponding synthetic data results, which are likely from your existing privacy protection service. `PETsARD` will assist you in reading these files and evaluating the results, helping you compare your current solution with other technologies.

本示範將展示如何使用 `PETsARD` 生成與評測隱私強化資料。

在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案、以及其對應的合成資料結果，這很可能是來自於您現有的隱私保護服務，而 `PETsARD` 將幫助您讀取這些檔案、評測結果，幫助您針對現有的解決方案跟其他技術做比較。

---

## Environment

In [7]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
print(path_petsard)
sys.path.append(path_petsard)


from PETsARD import Executor

d:\Dropbox\89_other_application\GitHub\PETsARD


> `Demo_UserStory` is a function created for demonstrating details and can be ignored.
>
> `Demo_UserStory` 是為了展現示範細節而建立的函式，可忽略。

In [8]:
def Demo_UserStory(config_file: str):
    print(f"YAML file {config_file} is ...")
    print(f"")
    pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)
    with open(config_file, 'r') as yaml_file:
        yaml_raw: dict = yaml.safe_load(yaml_file)
    pp.pprint(yaml_raw)

    print(f"")
    print(f"")
    print(f"Here's the execution...")
    print(f"")
    exec = Executor(config=config_file)
    exec.run()

    print(f"")
    print(f"")
    print(f"Here's the result...")
    print(f"")
    pp.pprint(exec.get_result())

---

## User Story C-1
**Default Describing**

Given a dataset as an input, the pipeline can go through the "describe" module to get a summary of the dataset.

給定一個資料集做輸入，該流程可以藉由調用 "describe" 模組而得到該資料集的摘要

In [9]:
config_file = '../yaml/User Story C-1.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-1.yaml is ...

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Describer': {'demo': {'method': 'default'}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story C-1',
                                     'eval': 'demo',
                                     'granularity': 'global'},
              'save_report_columnwise': {'method': 'save_report',
                                         'output': 'User Story C-1',
                                         'eval': 'demo',
                                         'granularity': 'columnwise'},
              'save_report_pairwise': {'method': 'save_report',
                                       'output': 'User Story C-1',
                                       'eval': 'demo',
                                       'granularity': 'pairwise'}}}


Here's the execution...

Now is Lo

Now is Describer with demo...
Now is Reporter with save_report_global...
Now is User Story C-1[Report]_demo_[global] save to csv...
Now is Reporter with save_report_columnwise...
Now is User Story C-1[Report]_demo_[columnwise] save to csv...
Now is Reporter with save_report_pairwise...
Now is User Story C-1[Report]_demo_[pairwise] save to csv...


Here's the result...

{'Loader[adult]_Describer[demo]_Reporter[save_report_global]': {'demo_[global]':                            full_expt_name  row_count  col_count  na_count
0  Loader[adult]_Describer[demo_[global]]      48842         15      3620},
 'Loader[adult]_Describer[demo]_Reporter[save_report_columnwise]': {'demo_[columnwise]':                                 full_expt_name           column  \
0   Loader[adult]_Describer[demo_[columnwise]]              age   
1   Loader[adult]_Describer[demo_[columnwise]]           fnlwgt   
2   Loader[adult]_Describer[demo_[columnwise]]  educational-num   
3   Loader[adult]_Describer[demo_[column

---

## User Story C-2
**Given data evaluating**

Given an original dataset and a privacy enhanced dataset to the evaluation module, the pipeline will create a report covering default/general metrics of privacy risk and utility.

給定原始資料集與對應的隱私強化資料集到評估模組中，該流程會產生一份涵蓋預設/一般指標的隱私風險與效用的報告。

### User Story C-2a

C-2a demonstrates the evaluation approach of the Evaluator as comparing "original data" with "synthetic data," for instance, using`method = 'default'` or tools starting with `'sdmetrics-'` from SDMetrics.

C-2a 展示的是 Evaluator 的評測方式是「原始資料」對照「合成資料」，例如 `method = 'default'` 或 `'sdmetrics-'` 開頭的 SDMetrics 評測工具。

In [10]:
config_file = '../yaml/User Story C-2a.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-2a.yaml is ...

{'Loader': {'adult_ori': {'filepath': 'benchmark/adult-income_ori.csv'}},
 'Synthesizer': {'custom': {'method': 'custom_data',
                            'filepath': 'benchmark/adult-income_syn.csv'}},
 'Evaluator': {'demo': {'method': 'default'}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story C-2a',
                                     'eval': 'demo',
                                     'granularity': 'global'}}}


Here's the execution...

Now is Loader with adult_ori...
Now is Synthesizer with custom...
Now is Evaluator with demo...
Generating report ...
(1/2) Evaluating Column Shapes: : 100%|██████████| 15/15 [00:00<00:00, 60.45it/s]
(2/2) Evaluating Column Pair Trends: : 100%|██████████| 105/105 [00:06<00:00, 17.11it/s]

Overall Score: 78.22%

Properties:
- Column Shapes: 95.42%
- Column Pair Trends: 61.03%
Now is Reporter with save_report_global...
Now is 

### User Story C-2b


C-2b demonstrates the evaluation method of the Evaluator as involving "original data used in synthesis" (abbreviated as ori), "original data not used in synthesis" (abbreviated as control), and "synthesized data" (abbreviated as syn), for example, using tools starting with `method ='anonymeter-'` from Anonymeter.

C-2b 展示的是 Evaluator 的評測方式是「參與合成的原始資料」(original data, 縮寫為 ori)、「不參與合成的原始資料」(control data, 縮寫為 control)、與「合成資料」(synthesized data, 縮寫為 syn)，例如 `method ='anonymeter-'` 開頭的 Anonymeter 評測工具。

In [11]:
config_file = '../yaml/User Story C-2b.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-2b.yaml is ...



{'Splitter': {'custom': {'method': 'custom_data', 'filepath': {...}}},
 'Synthesizer': {'custom': {'method': 'custom_data',
                            'filepath': 'benchmark/adult-income_syn.csv'}},
 'Evaluator': {'demo': {'method': 'default'},
               'anony-singling': {'method': 'anonymeter-singlingout_univariate'}},
 'Reporter': {'save_report_global_demo': {'method': 'save_report',
                                          'output': 'User Story C-2b',
                                          'eval': 'demo',
                                          'granularity': 'global'},
              'save_report_global_anony': {'method': 'save_report',
                                           'output': 'User Story C-2b',
                                           'eval': 'anony-singling',
                                           'granularity': 'global'}}}


Here's the execution...

Now is Splitter with custom_[1-1]...
Now is Synthesizer with custom...
Now is Evaluator with demo...


---

## User Story C-3 - Given data customized evaluating

> Following User Story C-2, if specific types of metrics are set or a customized evaluation script is provided, the module will create a customized evaluation report.
>
>
> 根據用戶故事 C-2，如果指定特定的指標、或是提供用戶自定義的評估腳本，模組會產生客製化的評估報告。

> Aligning with User Story 7 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 7

In [12]:
config_file = '../yaml/User Story C-3.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-3.yaml is ...

{'Loader': {'adult_ori': {'filepath': 'benchmark/adult-income_ori.csv'}},
 'Synthesizer': {'custom': {'method': 'custom_data',
                            'filepath': 'benchmark/adult-income_syn.csv'}},
 'Evaluator': {'custom': {'method': 'custom_method', 'custom_method': {...}}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story C-3',
                                     'eval': 'custom',
                                     'granularity': 'global'},
              'save_report_columnwise': {'method': 'save_report',
                                         'output': 'User Story C-3',
                                         'eval': 'custom',
                                         'granularity': 'columnwise'},
              'save_report_pairwise': {'method': 'save_report',
                                       'output': 'User Story C-3',
                            