# User Story

This demo is a presentation of "User Stories" of PETsARD Spec. Also it can help users to set their own config file. Enjoy : )

本示例為【資料生成演算法分析與評估系統 Spec】當中【用戶故事 User Stories】的呈現。它還可以幫助用戶設置自己的配置文件。祝您使用愉快 : )

---

## User Story B - Privacy Enhancing Data Generation and Evaluation


> This demo will show how to generate and evaluate privacy-enhanced data using `PETsARD`.
>
> In this demonstration, you, as the user, already possess a data file locally, and `PETsARD` will assist you in loading that file and then generating a privacy-enhanced version of it.
>
>
> 本示範將展示如何使用 `PETsARD` 生成與評測隱私強化資料。
>
> 在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案，而 `PETsARD` 將幫助您讀取該檔案、生成經隱私強化後的版本、最終評測。

---

## YAML

> The basic format of YAML is as follows:
>
> YAML 的基礎格式如下：

```
{module name}:
    {experiment name}:
        {config of module}: ...
```

> - module name：A module that performs specific tasks. The modules required for this user story include:
>     - `Loader`: Data loading
>     - `Preprocessor`: Data pre-processing
>     - `Synthesizer`: Data synthesizing
>     - `Postprocessor`: Data post-processing
>     - `Evaluator`: Data Evaluating
>     - `Reporter`: Data/Report output
> - experiment name: A custom name for a single experimental parameter for that module. Mandatory.
> - config of module：For complete parameters, please refer to the descriptions of each module in the manual.
>
>
> - 模組名稱：執行特定工作的模組。本用戶故事所需的模組包含：
>     - `Loader`: 資料讀取
>     - `Preprocessor`: 資料前處理
>     - `Synthesizer`: 資料合成
>     - `Postprocessor`: 資料後處理
>     - `Evaluator`: 資料評估
>     - `Reporter`: 資料/報表輸出
> - 實驗名稱：對於該模組，單一個實驗參數的自訂名稱。必填。
> - 模組的設定：完整參數請參考各模組於手冊上的說明。

### YAML - Evaluator

> `method` specifies the desired evaluate method (see the manual for complete options). Mandatory.
>
> `method = 'default'` will use the default method for evaluate (currently SDMetrics' QualityReport).
>
>
> `method` 指定所希望使用的評估方法（完整選項見手冊）。必填。
>
> `method = 'default'` 將使用預設的方式做評估（目前是 SDMetrics 的 QualityReport）。

### YAML - Evaluator (custom_method)

> `method = 'custom_method'` performed according to the user-provided Python code path (`filepath`) and class (`method` specifies the class name) to evaluating
>
> `method = 'custom_method'` 則依照使用者給定的 Python 程式碼路徑 (`filepath`) 與類別 (`method` 指定類別名稱) 做計分。

## Python - Evaluator (custom_method)

> Custom evaluations require users to define a Python class that conforms to a specific format. 
>
> This class should include an __init__ method that accepts settings (`config`), a `.create()` method that takes a dictionary named `data` for input of evaluation data, and `.get_global()`, `.get_columnwise()`, `.get_pairwiser()` methods to output results at different levels of granularity for the entire dataset, individual fields, and between fields, respectively.
>
> We recommend directly inheriting the `EvaluatorBase` class to meet the requirements. Its location is
>
>
> 自訂評測需要使用者自訂一個符合格式的 Python 類別。
>
> 該類別應該在 `__init__` 時接受設定 (`config`)，提供 `.create()` 方法接受名為 `data` 的字典做評測資料的輸入，以及 `.get_global()`, `.get_columnwise()`, `.get_pairwiser()` 方法以分別輸出全資料集、個別欄位、與欄位與欄位間不同報告粒度的結果。
>
> 我們建議直接繼承 `EvaluatorBase` 類別來滿足要求。它的位置在


```Python
from PETsARD.evaluator.evaluator_base import EvaluatorBase
```

### YAML - Reporter (save_report)

> `method` specifies the desired reporting method. When `method = 'save_report'`, it will capture and output the result data from the `Evaluator`/`Describer` module.
>
> `eva;` is a parameter unique to `method = 'save_report'`, specifying which experiment results to output by their experiment name. Specifying `'demo'` means wishing to obtain the results from the Evaluator named `'demo'`.
>
>
> `method` 指定所希望使用的報告方法，當 `method = 'save_report'`，則會擷取 `Evaluator`/`Describer` 模組評測的結果資料做輸出。
>
> `eval` 是 `method = 'save_data'` 特有的參數，藉由實驗名稱指定哪個實驗的結果做輸出。這邊指定為 `'demo'` 代表希望拿名為 `'demo'` 的 Evaluator 的結果。


> `granularity` is a parameter unique to `method = 'save_report'`,  specifying the level of detail, or granularity, of the result data. Specifying 'global' means that the granularity of the score obtained covers the entire dataset as a whole.
>
>  Depending on the evaluation methods of different `Evaluator`/`Describer`, scoring might be based on calculating a comprehensive score for the entire dataset, or it might involve calculating scores for each field individually, or even calculating scores between fields.
>
> However, regardless of the evaluation method used, for users, it is usually most practical to understand the "overall score of the entire dataset". Therefore, we have conducted preliminary academic research on different Evaluators/Describers and have appropriately averaged or weighted different scores to provide a `'global' level of scoring granularity that covers the entire dataset.
>
>
> `granularity` 是 `method = 'save_report'` 特有的參數，指定結果資料的細節程度、我們稱為粒度。這邊指定為 `'global'` 代表取得的是整個資料集一個總體評分的粒度。
>
> 根據不同 `Evaluator`/`Describer` 的評測方式，其評分可能是基於整個資料集計算出一個總體分數，或者可能是針對每個欄位單獨計算分數，甚至是欄位與欄位間計算分數。
>
> 但無論使用哪種評測，對使用者而言，通常最實用的是了解「整個資料集的總體評分」。因此，我們預先針對不同的 `Evaluator`/`Describer` 進行了學術研究，並對不同評分做適當的平均或加權處理，以便能夠提供以全資料集為單位、`'global'` 的評分粒度。

---

## Environment


> In `PETsARD`, the only thing you need to do is prepare a YAML file following the example and execute the `Executor`.
>
> Assuming your YAML file is named config.yaml, your Python code would be:
>
>
> 在 `PETsARD` 中，您唯一所需要做的，便是參考範例準備 YAML 檔案，並執行 `Executor`。
>
> 假設您的 YAML 檔案名稱為 `config.yaml`，則您的 Python 程式碼為：


> ```Python
> exec = Executor(config='config.yaml')
> exec.run()
> ```

In [1]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
print(path_petsard)
sys.path.append(path_petsard)


from PETsARD import Executor

d:\Dropbox\89_other_application\GitHub\PETsARD


> `Demo_UserStory` is a function created for demonstrating details and can be ignored.
>
> `Demo_UserStory` 是為了展現示範細節而建立的函式，可忽略。

In [2]:
def Demo_UserStory(config_file: str):
    print(f"YAML file {config_file} is ...")
    print(f"")
    pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)
    with open(config_file, 'r') as yaml_file:
        yaml_raw: dict = yaml.safe_load(yaml_file)
    pp.pprint(yaml_raw)

    print(f"")
    print(f"")
    print(f"Here's the execution...")
    print(f"")
    exec = Executor(config=config_file)
    exec.run()

    print(f"")
    print(f"")
    print(f"Here's the result...")
    print(f"")
    pp.pprint(exec.get_result())

---

## User Story B-1 - Default Evaluating

> Following User Story A, if users enable the "evaluate" step ,  the evaluation module will create a report covering default privacy risk and utility metrics.
>
> 根據用戶故事 A，如果使用者啟用了 "evaluate" 步驟，評估模組會產生涵蓋預設的隱私風險與效用指標的報告。

> Aligning with User Story 4 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 4

In [3]:
config_file = 'user_story/User Story B-1.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story B-1.yaml is ...

{'Loader': {'adult': {'filepath': 'benchmark/adult_uci.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Evaluator': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story B-1',
                            'source': 'Postprocessor'},
              'save_report_global': {'method': 'save_report',
                                     'output': 'User Story B-1',
                                     'eval': 'demo',
                                     'granularity': 'global'}}}


Here's the execution...

Now is Loader with adult...
Now is Preprocessor with demo...
Now is Synthesizer with demo...
Synthesizer (SDV - SingleTable): Metafile loading time: 0.0313 sec.
Synthesizer (SDV - SingleTable): Fitting GaussianCopul



Synthesizer (SDV - SingleTable): Fitting  GaussianCopula spent 6.5947 sec.
Synthesizer (SDV - SingleTable): Sampling GaussianCopula # 18997 rows (same as raw) in 1.1426 sec.
Now is Postprocessor with demo...
Now is Evaluator with demo...
Generating report ...
(1/2) Evaluating Column Shapes: : 100%|██████████| 15/15 [00:00<00:00, 112.60it/s]
(2/2) Evaluating Column Pair Trends: : 100%|██████████| 105/105 [00:04<00:00, 24.96it/s]

Overall Score: 77.75%

Properties:
- Column Shapes: 94.08%
- Column Pair Trends: 61.43%
Now is Reporter with save_data...
Now is User Story B-1_Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo] save to csv...
Now is Reporter with save_report_global...
Now is User Story B-1[Report]_demo_[global] save to csv...


Here's the result...

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]':              age         

---

## User Story B-2 - Customized Evaluating

> Following User Story B-1, if specific types of metrics are set or a customized evaluation script is provided, the module will create a customized evaluation report.
>
> 根據用戶故事 B-1，如果指定特定的指標、或是提供用戶自定義的評估腳本，模組會產生客製化的評估報告。

> Aligning with User Story 5 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 5

In [4]:
config_file = 'user_story/User Story B-2.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story B-2.yaml is ...

{'Loader': {'adult': {'filepath': 'benchmark/adult_uci.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Evaluator': {'custom': {'method': 'custom_method', 'custom_method': {...}}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story B-2',
                                     'eval': 'custom',
                                     'granularity': 'global'},
              'save_report_columnwise': {'method': 'save_report',
                                         'output': 'User Story B-2',
                                         'eval': 'custom',
                                         'granularity': 'columnwise'},
              'save_report_pairwise': {'method': 'save_report',
                                       'out



Synthesizer (SDV - SingleTable): Fitting  GaussianCopula spent 6.8178 sec.
Synthesizer (SDV - SingleTable): Sampling GaussianCopula # 18997 rows (same as raw) in 1.2315 sec.
Now is Postprocessor with demo...
Now is Evaluator with custom...
Now is Reporter with save_report_global...
Now is User Story B-2[Report]_custom_[global] save to csv...
Now is Reporter with save_report_columnwise...
Now is User Story B-2[Report]_custom_[columnwise] save to csv...
Now is Reporter with save_report_pairwise...
Now is User Story B-2[Report]_custom_[pairwise] save to csv...


Here's the result...

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[custom]_Reporter[save_report_global]': {'custom_[global]':                                            full_expt_name  score
result  Loader[adult]_Preprocessor[demo]_Synthesizer[d...    100},
 'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Evaluator[custom]_Reporter[save_report_columnwise]': {'custom_[co