# User Story

This demo is a presentation of "User Stories" of PETsARD Spec. Also it can help users to set their own config file. Enjoy : )

本示例為【資料生成演算法分析與評估系統 Spec】當中【用戶故事 User Stories】的呈現。它還可以幫助用戶設置自己的配置文件。祝您使用愉快 : )

---

## User Story A - Privacy Enhancing Data Generation


> This demo will show how to generate privacy-enhanced data using `PETsARD`.
>
> In this demonstration, you, as the user, already possess a data file locally, and `PETsARD` will assist you in loading that file and then generating a privacy-enhanced version of it.
>
> At the same time, privacy-enhancing algorithms often have format restrictions and require specific pre-processing and post-processing procedures to function correctly. However, `PETsARD` has taken this into account for the user. `PETsARD` offers both default and customizable preprocessing and postprocessing workflows to help users get started quickly.
>
>
> 本示範將展示如何使用 `PETsARD` 生成隱私強化資料。
>
> 在這個示範中，您作為使用者，在本機上已經擁有一份資料檔案，而 `PETsARD` 將幫助您讀取該檔案、然後生成經隱私強化後的版本。
>
> 同時，隱私強化演算法通常都有格式的限制，必須經過特定的前處理 (Pre-processing) 與後處理 (Post-processing) 程序才能正確運作，但 `PETsARD` 已經為使用者考慮到這點，`PETsARD` 提供預設與可客製化的前後處理流程，幫助使用者快速上手。

---

## YAML

> The basic format of YAML is as follows:
>
> YAML 的基礎格式如下：

```
{module name}:
    {experiment name}:
        {config of module}: ...
```

> - module name：A module that performs specific tasks. The modules required for this user story include:
>     - `Loader`: Data loading
>     - `Preprocessor`: Data pre-processing
>     - `Synthesizer`: Data synthesizing
>     - `Postprocessor`: Data post-processing
>     - `Reporter`: Data/Report output
> - experiment name: A custom name for a single experimental parameter for that module. Mandatory.
> - config of module：For complete parameters, please refer to the descriptions of each module in the manual.
>
>
> - 模組名稱：執行特定工作的模組。本用戶故事所需的模組包含：
>     - `Loader`: 資料讀取
>     - `Preprocessor`: 資料前處理
>     - `Synthesizer`: 資料合成
>     - `Postprocessor`: 資料後處理
>     - `Reporter`: 資料/報表輸出
> - 實驗名稱：對於該模組，單一個實驗參數的自訂名稱。必填。
> - 模組的設定：完整參數請參考各模組於手冊上的說明。

### YAML - Synthesizer (default)

> `method` specifies the desired synthesis method (see the manual for complete options). Mandatory.
>
> `method = 'default'` will use the default method for synthesis (currently SDV's Gaussian Copula).
>
>
> `method` 指定所希望使用的合成方法（完整選項見手冊）。必填。
>
> `method = 'default'` 將使用預設的方式做合成（目前是 SDV 的 Gaussian Copula）。

### YAML - Reporter (save_data)

> `method` specifies the desired reporting method. When `method = 'save_data'`, it will capture and output the result data of the module.
>
> `source` is a parameter unique to `method = 'save_data'`, specifying which module(s) results to output. Specifying `'Postprocessor'` means wishing to obtain the results of the Postprocessor, that is, data that has undergone preprocessing, synthesis, and postprocessing, which retains the data's privacy-enhanced characteristics and ensures the data format matches the original.
>
>
> `method` 指定所希望使用的報告方法，當 `method = 'save_data'`，則會擷取模組的結果資料做輸出。
>
> `source` 是 `method = 'save_data'` 特有的參數，指定哪個/哪些模組的結果做輸出。這邊指定為 `'Postprocessor'` 代表希望拿 Postprocessor 的結果，即經過前處理、合成、後處理的資料，其保有隱私強化的資料特性、且資料樣態將符合原始資料。

---

## Environment


> In `PETsARD`, the only thing you need to do is prepare a YAML file following the example and execute the `Executor`.
>
> Assuming your YAML file is named config.yaml, your Python code would be:
>
>
> 在 `PETsARD` 中，您唯一所需要做的，便是參考範例準備 YAML 檔案，並執行 `Executor`。
>
> 假設您的 YAML 檔案名稱為 `config.yaml`，則您的 Python 程式碼為：


> ```Python
> exec = Executor(config='config.yaml')
> exec.run()
> ```

In [1]:
import os
import pprint
import sys

import yaml


# Setting up the path to the PETsARD package
path_petsard = os.path.dirname(os.getcwd())
print(path_petsard)
sys.path.append(path_petsard)


from PETsARD import Executor

d:\Dropbox\89_other_application\GitHub\PETsARD


> `Demo_UserStory` is a function created for demonstrating details and can be ignored.
>
> `Demo_UserStory` 是為了展現示範細節而建立的函式，可忽略。

In [2]:
def Demo_UserStory(config_file: str):
    print(f"YAML file {config_file} is ...")
    print(f"")
    pp = pprint.PrettyPrinter(depth=3, sort_dicts=False)
    with open(config_file, 'r') as yaml_file:
        yaml_raw: dict = yaml.safe_load(yaml_file)
    pp.pprint(yaml_raw)

    print(f"")
    print(f"")
    print(f"Here's the execution...")
    print(f"")
    exec = Executor(config=config_file)
    exec.run()

    print(f"")
    print(f"")
    print(f"Here's the result...")
    print(f"")
    pp.pprint(exec.get_result())

---

## User Story A-1 - Default Synthesizing

> Given an original dataset without specified algorithm, the pipeline will generate a list of privacy enhanced datasets using the default algorithms.
>
> 給定一個原始資料集、但未指定演算法，該流程會利用預設的演算法生成一組隱私強化資料集。


> Aligning with User Story 3 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 3

In [3]:
config_file = 'user_story/User Story A-1.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story A-1.yaml is ...

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'demo': {'method': 'default'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story A-1',
                            'source': 'Postprocessor'}}}


Here's the execution...

Now is Loader with adult...
Now is Preprocessor with demo...
Now is Synthesizer with demo...
Synthesizer (SDV - SingleTable): Metafile loading time: 0.0319 sec.
Synthesizer (SDV - SingleTable): Fitting GaussianCopula.




Synthesizer (SDV - SingleTable): Fitting  GaussianCopula spent 9.8262 sec.
Synthesizer (SDV - SingleTable): Sampling GaussianCopula # 26933 rows (same as raw) in 1.9807 sec.
Now is Postprocessor with demo...
Now is Reporter with save_data...
Now is User Story A-1_Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo] save to csv...


Here's the result...

{'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[demo]_Postprocessor[demo]':              age         workclass         fnlwgt     education  \
0      49.484372           Private   86179.188495     Bachelors   
1      56.010534           Private  169010.152257  Some-college   
2      26.624428           Private   77573.431448       HS-grad   
3      31.929920           Private  142551.891799       HS-grad   
4      26.376426           Private  352129.794194       7th-8th   
...          ...               ...            ...     

---

## User Story A-2 - Customized Synthesizing

> Given an original dataset, specified privacy enhancing data generation algorithms and parameters, the pipeline will generate a privacy enhanced dataset.
>
> 給定一個原始資料集，並指定隱私強化技術生成演算法與參數，該流程會依此產生隱私強化資料集。


> Aligning with User Story 2 of the Spec
>
>
> 對標規格說明 (Spec) 的 User Story 2

In [4]:
config_file = 'user_story/User Story A-2.yaml'

Demo_UserStory(config_file = config_file)

YAML file user_story/User Story A-2.yaml is ...

{'Loader': {'adult': {'filepath': 'benchmark/adult-income.csv',
                      'na_values': {...}}},
 'Preprocessor': {'demo': {'method': 'default'}},
 'Synthesizer': {'sdv-gaussian': {'method': 'sdv-single_table-gaussiancopula'}},
 'Postprocessor': {'demo': {'method': 'default'}},
 'Reporter': {'save_data': {'method': 'save_data',
                            'output': 'User Story A-1',
                            'source': 'Postprocessor'}}}


Here's the execution...

Now is Loader with adult...
Now is Preprocessor with demo...
Now is Synthesizer with sdv-gaussian...
Synthesizer (SDV - SingleTable): Metafile loading time: 0.0332 sec.
Synthesizer (SDV - SingleTable): Fitting GaussianCopula.




Synthesizer (SDV - SingleTable): Fitting  GaussianCopula spent 8.6138 sec.
Synthesizer (SDV - SingleTable): Sampling GaussianCopula # 26933 rows (same as raw) in 1.7461 sec.
Now is Postprocessor with demo...
Now is Reporter with save_data...
Now is User Story A-1_Loader[adult]_Preprocessor[demo]_Synthesizer[sdv-gaussian]_Postprocessor[demo] save to csv...


Here's the result...

{'Loader[adult]_Preprocessor[demo]_Synthesizer[sdv-gaussian]_Postprocessor[demo]_Reporter[save_data]': {'Loader[adult]_Preprocessor[demo]_Synthesizer[sdv-gaussian]_Postprocessor[demo]':              age     workclass         fnlwgt     education  educational-num  \
0      47.051976       Private   81974.996782     Bachelors         9.206849   
1      23.493460     Local-gov   75678.112916       HS-grad         8.146377   
2      44.317820       Private   60386.273558  Some-college        11.128939   
3      41.473428       Private  186845.113325  Some-college         7.974582   
4      40.922568  Self-emp-inc  