# User Story

This demo is a presentation of "User Stories" of PETsARD Spec. Also it can help users to set their own config file. Enjoy : )

本示例為【資料生成演算法分析與評估系統 Spec】當中【用戶故事 User Stories】的呈現。它還可以幫助用戶設置自己的配置文件。祝您使用愉快 : )

---

## User Story D - Research on Benchmark datasets


> This demo will show how to evaluate privacy-enhanced data using `PETsARD`.
>
> In this demonstration, you, as the user, already have a data file on your local machine, as well as its corresponding synthetic data results, which are likely from your existing privacy protection service. `PETsARD` will assist you in reading these files and evaluating the results, helping you compare your current solution with other technologies.
>
>
> 本示範將展示如何使用 `PETsARD` 的基準資料集來生成與評測隱私強化資料。
>
> 在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案、以及其對應的合成資料結果，這很可能是來自於您現有的隱私保護服務，而 `PETsARD` 將幫助您讀取這些檔案、評測結果，幫助您針對現有的解決方案跟其他技術做比較。

---

## YAML

> The basic format of YAML is as follows:
>
> YAML 的基礎格式如下：

```
{module name}:
    {experiment name}:
        {config of module}: ...
```

> - module name：A module that performs specific tasks. The modules required for this user story include:
>     - `Loader`: Data loading
>     - `Preprocessor`: Data pre-processing
>     - `Synthesizer`: Data synthesizing
>     - `Postprocessor`: Data post-processing
>     - `Evaluator`: Data Evaluating
>     - `Describer`: Data Describing
>     - `Reporter`: Data/Report output
> - experiment name: A custom name for a single experimental parameter for that module. Mandatory.
> - config of module：For complete parameters, please refer to the descriptions of each module in the manual.
>
>
> - 模組名稱：執行特定工作的模組。本用戶故事所需的模組包含：
>     - `Loader`: 資料讀取
>     - `Preprocessor`: 資料前處理
>     - `Synthesizer`: 資料合成
>     - `Postprocessor`: 資料後處理
>     - `Evaluator`: 資料評估
>     - `Describer`: 資料描述
>     - `Reporter`: 資料/報表輸出
> - 實驗名稱：對於該模組，單一個實驗參數的自訂名稱。必填。
> - 模組的設定：完整參數請參考各模組於手冊上的說明。

### YAML - Describer

> `method` specifies the desired describing method (see the manual for complete options). Mandatory.
>
> `method = 'default'` will use the default method for describe.
>
>
> `method` 指定所希望使用的描述方法（完整選項見手冊）。必填。
>
> `method = 'default'` 將使用預設的方式做描述。

### YAML - Loader, Splitter, Synthesizer (custom_data)

> `method = 'custom_data'` requires you to decide the placement of your pre-prepared dataset in the analysis process based on the Evaluator you are using.
>
> This part of the explanation is provided together with User Stories C-2a and C-2b for a clearer understanding, especially when paired with the configuration file.
>
>
> `method = 'custom_data'` 必須依照你所使用的 Evaluator，來決定你要把預先準備的資料集放在分析流程的哪個位置。
>
> 這部份的說明放在用戶故事 C-2a 跟 C-2b，搭配設定檔一併解釋會更清楚。

---

## Environment


> In `PETsARD`, the only thing you need to do is prepare a YAML file following the example and execute the `Executor`.
>
> Assuming your YAML file is named config.yaml, your Python code would be:
>
>
> 在 `PETsARD` 中，您唯一所需要做的，便是參考範例準備 YAML 檔案，並執行 `Executor`。
>
> 假設您的 YAML 檔案名稱為 `config.yaml`，則您的 Python 程式碼為：


> ```Python
> exec = Executor(config='config.yaml')
> exec.run()
> ```

In [1]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
print(path_petsard)
sys.path.append(path_petsard)


from PETsARD import Executor

d:\Dropbox\89_other_application\GitHub\PETsARD


> `Demo_UserStory` is a function created for demonstrating details and can be ignored.
>
> `Demo_UserStory` 是為了展現示範細節而建立的函式，可忽略。

In [2]:
def Demo_UserStory(config_file: str):
    print(f"YAML file {config_file} is ...")
    print(f"")
    pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)
    with open(config_file, 'r') as yaml_file:
        yaml_raw: dict = yaml.safe_load(yaml_file)
    pp.pprint(yaml_raw)

    print(f"")
    print(f"")
    print(f"Here's the execution...")
    print(f"")
    exec = Executor(config=config_file)
    exec.run()

    print(f"")
    print(f"")
    print(f"Here's the result...")
    print(f"")
    pp.pprint(exec.get_result())

---

## User Story C-1 - Default Describing

> Given a dataset as an input, the pipeline can go through the "describe" module to get a summary of the dataset.
>
>
> 給定一個資料集做輸入，該流程可以藉由調用 "describe" 模組而得到該資料集的摘要

> Aligning with User Story 1 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 1

In [3]:
config_file = 'user_story/User Story C-1.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-1.yaml is ...

{'Loader': {'adult': {'filepath': 'benchmark/adult_uci.csv',
                      'na_values': {...}}},
 'Describer': {'demo': {'method': 'default'}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story C-1',
                                     'eval': 'demo',
                                     'granularity': 'global'},
              'save_report_columnwise': {'method': 'save_report',
                                         'output': 'User Story C-1',
                                         'eval': 'demo',
                                         'granularity': 'columnwise'},
              'save_report_pairwise': {'method': 'save_report',
                                       'output': 'User Story C-1',
                                       'eval': 'demo',
                                       'granularity': 'pairwise'}}}


Here's the execution...

Now is Loade

---

## User Story C-2 - Given data evaluating

> Given an original dataset and a privacy enhanced dataset to the evaluation module, the pipeline will create a report covering default/general metrics of privacy risk and utility.
>
>
> 給定原始資料集與對應的隱私強化資料集到評估模組中，該流程會產生一份涵蓋預設/一般指標的隱私風險與效用的報告。

> Aligning with User Story 6 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 6

> The concept of custom data is that, in the complete User Story B which involves "generating privacy-enhanced data" + "evaluating data", wherever the synthesized data or the partitioned data is generated in the process, that particular module is where `method = 'custom_data'` is applied.
>
>
> 自訂資料的概念是，原本完整的 User Story B 「生成隱私強化資料」+「評測資料」，這個流程當中，合成後的資料、切分後的資料會產生在流程的哪個模組，便對哪個模組使用 `method = 'custom_data'`。

> C-2a demonstrates the evaluation approach of the Evaluator as comparing "original data" with "synthetic data," for instance, using`method = 'default'` or tools starting with `'sdmetrics-'` from SDMetrics.
>
> The "original data" can be directly loaded using the Loader. At this point, the "synthetic data" needs to be placed in the Synthesizer, using `method = 'custom_data'` to specify custom data.
>
> After using `method = 'custom_data'`, similar to the Loader, the file location is specified using `filepath`.
>
>
> C-2a 展示的是 Evaluator 的評測方式是「原始資料」對照「合成資料」，例如 `method = 'default'` 或 `'sdmetrics-'` 開頭的 SDMetrics 評測工具。
>
>「原始資料」可以直接用 Loader 讀入，此時「合成資料」需要放到 Synthesizer 當中、使用 `method = 'custom_data'` 來指定自訂資料，
>
> 當使用 `method = 'custom_data'` 之後，跟 Loader 一樣，使用 `filepath` 指定檔案位置。

In [4]:
config_file = 'user_story/User Story C-2a.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-2a.yaml is ...

{'Loader': {'adult_ori': {'filepath': 'benchmark/adult_uci_ori.csv'}},
 'Synthesizer': {'custom': {'method': 'custom_data',
                            'filepath': 'benchmark/adult_uci_syn.csv'}},
 'Evaluator': {'demo': {'method': 'default'}},
 'Reporter': {'save_report_global': {'method': 'save_report',
                                     'output': 'User Story C-2a',
                                     'eval': 'demo',
                                     'granularity': 'global'}}}


Here's the execution...

Now is Loader with adult_ori...
Now is Synthesizer with custom...
Now is Evaluator with demo...
Generating report ...
(1/2) Evaluating Column Shapes: : 100%|██████████| 15/15 [00:00<00:00, 73.09it/s]
(2/2) Evaluating Column Pair Trends: : 100%|██████████| 105/105 [00:04<00:00, 23.94it/s]

Overall Score: 78.22%

Properties:
- Column Shapes: 95.42%
- Column Pair Trends: 61.03%
Now is Reporter with save_report_global...
Now is User S

> C-2b demonstrates the evaluation method of the Evaluator as involving "original data used in synthesis" (abbreviated as ori), "original data not used in synthesis" (abbreviated as control), and "synthesized data" (abbreviated as syn), for example, using tools starting with `method ='anonymeter-'` from Anonymeter.
>
> (Original data) "Participating in synthesis" and "not participating in synthesis" are achieved by using the Splitter module to divide the data. Therefore, please apply `method = 'custom_data'` to the Splitter, where `filepath` requires two inputs: `'ori'` corresponds to "original data used in synthesis," and `'control'` corresponds to "original data not used in synthesis." The setting method for "synthesized data" in the Synthesizer remains the same as C-2a.
>
> Here, we specifically also demonstrate the evaluation with `method = 'default'`. For scenarios directly comparing "original data" and "synthesized data," C-2b automatically considers the `'ori'` in the Splitter as "original data" for comparison, obtaining results from both SDMetrics and Anonymeter. Users should evaluate their own data partitioning method to ensure it has sufficient representativeness of the original data.
>
>
> C-2b 展示的是 Evaluator 的評測方式是「參與合成的原始資料」(original data, 縮寫為 ori)、「不參與合成的原始資料」(control data, 縮寫為 control)、與「合成資料」(synthesized data, 縮寫為 syn)，例如 `method ='anonymeter-'` 開頭的 Anonymeter 評測工具。
>
> 「參與合成」跟「不參與合成」是利用了 Splitter 模組進行切割，所以請對 Splitter 使用 `method = 'custom_data'`，此時 `filepath` 需要兩個輸入，`'ori'` 對應了「參與合成的原始資料」，`'control'` 對應了「不參與合成的原始資料」。「合成資料」在 Synthesizer 的設定方法與 C-2a 一樣，
>
> 這裡我們特意同時展現了 `method = 'default'` 的評測。針對直接比對「原始資料」與「合成資料」的 C-2a 情境，C-2b 會自動地將 Splitter 當中的 `'ori'` 視作「原始資料」來比對，同時得到 SDMetrics 跟 Anonymeter 的結果。使用者應自行評估自己的資料切分方式，是否具有足夠的原始資料代表性。

In [5]:
config_file = 'user_story/User Story C-2b.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story C-2b.yaml is ...

{'Splitter': {'custom': {'method': 'custom_data', 'filepath': {...}}},
 'Synthesizer': {'custom': {'method': 'custom_data',
                            'filepath': 'benchmark/adult_uci_syn.csv'}},
 'Evaluator': {'demo': {'method': 'default'},
               'anony-singling': {'method': 'anonymeter-singlingout_univariate'}},
 'Reporter': {'save_report_global_demo': {'method': 'save_report',
                                          'output': 'User Story C-2b',
                                          'eval': 'demo',
                                          'granularity': 'global'},
              'save_report_global_anony': {'method': 'save_report',
                                           'output': 'User Story C-2b',
                                           'eval': 'anony-singling',
                                           'granularity': 'global'}}}


Here's the execution...

Now is Splitter with custom_[1-1]...
Now is Synthesize

---

## User Story C-3 - Given data customized evaluating

> Following User Story C-2, if specific types of metrics are set or a customized evaluation script is provided, the module will create a customized evaluation report.
>
>
> 根據用戶故事 C-2，如果指定特定的指標、或是提供用戶自定義的評估腳本，模組會產生客製化的評估報告。

> Aligning with User Story 7 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 7

In [None]:
config_file = 'user_story/User Story C-3.yaml'

Demo_UserStory(config_file = config_file)