# Environment setting / 環境設定

In [1]:
import os  # noqa: I001
import sys
from pathlib import Path


# Handle utils.py for Colab
if "COLAB_GPU" in os.environ:
    import urllib.request

    demo_utils_url = (
        "https://raw.githubusercontent.com/nics-tw/petsard/main/demo/demo_utils.py"
    )
    exec(urllib.request.urlopen(demo_utils_url).read().decode("utf-8"))
else:
    # demo_utils.py search for local
    for p in [Path.cwd()] + list(Path.cwd().parents)[:10]:
        utils_path = p / "demo_utils.py"
        if utils_path.exists() and "demo" in str(utils_path):
            sys.path.insert(0, str(p))
            exec(open(utils_path).read())
            break

📂 Current working directory: demo/petsard-yaml/evaluator-yaml
✅ PETsARD demo_utils loaded. Use quick_setup() to initialize.


## Quick setup / 快速設定: Evaluation YAML

In [2]:
from demo_utils import display_results, display_yaml_info, quick_setup  # noqa: I001
from petsard import Executor  # noqa: I001


is_colab, branch, yaml_path = quick_setup(
    config_file=[
        "evaluator-data-release.yaml",
        "evaluator-specific-task-application.yaml",
    ],
    benchmark_data=None,
    petsard_branch="main",
)

✅ Changed working directory to demo: petsard/demo
   📁 Notebook location: demo/petsard-yaml/evaluator-yaml/
   🔍 YAML search priority: 
      1. demo/petsard-yaml/evaluator-yaml/
      2. demo/
   💾 Output files will be saved in: demo/
🚀 PETsARD v1.7.0
📅 2025-10-14 10:13:01 UTC+8
🔧 Added to Python path: petsard/demo/petsard-yaml/evaluator-yaml
📁 Processing configuration files from subfolder: petsard-yaml/evaluator-yaml
✅ Found configuration (1/2): petsard/demo/petsard-yaml/evaluator-yaml/evaluator-data-release.yaml
✅ Found configuration (2/2): petsard/demo/petsard-yaml/evaluator-yaml/evaluator-specific-task-application.yaml


# Execution and Result / 執行與結果

## Recommended Evaluation Workflow: Data Release / 建議評估流程：資料釋出

In [3]:
display_yaml_info(yaml_path[0])
exec = Executor(yaml_path[0])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: evaluator-data-release.yaml
📁 Path: petsard/demo/petsard-yaml/evaluator-yaml/evaluator-data-release.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Splitter:
  external_split:
    method: custom_data
    filepath:
      ori: benchmark://adult-income_ori
      control: benchmark://adult-income_control
    schema:
      ori: benchmark://adult-income_schema
      control: benchmark://adult-income_schema
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Evaluator:
  # Step 1: Data validity diagnosis (should be close to 1.0)
  validity_check:
    method: sdmetrics-diagnosticreport
  # Step 2: Privacy protection assessment (risk should be < 0.09)
  singling_out_risk:
    method: anonymeter-singlingout
    n_attacks: 400
    max_attempts: 4000
  linkability_risk:
    method: anonymeter-linkability
    aux_cols:
      -
     

Found 342 failed queries out of 400. Check DEBUG messages for more details.
Reached maximum number of attempts 4000 when generating singling out queries. Returning 1 instead of the requested 400.To avoid this, increase the number of attempts or set it to ``None`` to disable The limitation entirely.
Attack `multivariate` could generate only 1 singling out queries out of the requested 400. This can probably lead to an underestimate of the singling out risk.
  self._sanity_check()


Generating report ...

(1/2) Evaluating Column Shapes: |██████████| 15/15 [00:00<00:00, 64.30it/s]|
Column Shapes Score: 99.85%

(2/2) Evaluating Column Pair Trends: |██████████| 105/105 [00:00<00:00, 152.39it/s]|
Column Pair Trends Score: 99.64%

Overall Score (Average): 99.75%

📊 Execution Results / 執行結果

[1] Splitter[external_split_[1-1]]_Synthesizer[external_data]_Evaluator[validity_check]
------------------------------------------------------------
📦 Dictionary with 3 keys / 包含 3 個鍵的字典

  • global: DataFrame (1 rows × 3 columns)
    📋 Showing first 1 rows / 顯示前 1 行:
            Score  Data Validity  Data Structure
    result    1.0            1.0             1.0
    📝 Columns / 欄位: Score, Data Validity, Data Structure

  • columnwise: DataFrame (15 rows × 3 columns)
    📋 Showing first 3 rows / 顯示前 3 行:
                    Property             Metric  Score
    age        Data Validity  BoundaryAdherence    1.0
    workclass  Data Validity  CategoryAdherence    1.0
    fnlwgt     

## Recommended Evaluation Workflow: Specific Task Application / 建議評估流程：特定任務應用

In [4]:
display_yaml_info(yaml_path[1])
exec = Executor(yaml_path[1])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: evaluator-specific-task-application.yaml
📁 Path: petsard/demo/petsard-yaml/evaluator-yaml/evaluator-specific-task-application.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Splitter:
  external_split:
    method: custom_data
    filepath:
      ori: benchmark://adult-income_ori
      control: benchmark://adult-income_control
    schema:
      ori: benchmark://adult-income_schema
      control: benchmark://adult-income_schema
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Evaluator:
  # Step 1: Data validity diagnosis (should be close to 1.0)
  validity_check:
    method: sdmetrics-diagnosticreport
  # Step 2: Privacy protection assessment (risk should be < 0.09)
  singling_out_risk:
    method: anonymeter-singlingout
    n_attacks: 400
    max_attempts: 4000
  linkability_risk:
    method: anonymeter-linkability
 

Found 334 failed queries out of 400. Check DEBUG messages for more details.
Reached maximum number of attempts 4000 when generating singling out queries. Returning 0 instead of the requested 400.To avoid this, increase the number of attempts or set it to ``None`` to disable The limitation entirely.
Attack `multivariate` could generate only 0 singling out queries out of the requested 400. This can probably lead to an underestimate of the singling out risk.
  self._sanity_check()


Generating report ...

(1/2) Evaluating Column Shapes: |██████████| 15/15 [00:00<00:00, 70.93it/s]|
Column Shapes Score: 99.85%

(2/2) Evaluating Column Pair Trends: |██████████| 105/105 [00:00<00:00, 174.43it/s]|
Column Pair Trends Score: 99.64%

Overall Score (Average): 99.75%

📊 Execution Results / 執行結果

[1] Splitter[external_split_[1-1]]_Synthesizer[external_data]_Evaluator[validity_check]
------------------------------------------------------------
📦 Dictionary with 3 keys / 包含 3 個鍵的字典

  • global: DataFrame (1 rows × 3 columns)
    📋 Showing first 1 rows / 顯示前 1 行:
            Score  Data Validity  Data Structure
    result    1.0            1.0             1.0
    📝 Columns / 欄位: Score, Data Validity, Data Structure

  • columnwise: DataFrame (15 rows × 3 columns)
    📋 Showing first 3 rows / 顯示前 3 行:
                    Property             Metric  Score
    age        Data Validity  BoundaryAdherence    1.0
    workclass  Data Validity  CategoryAdherence    1.0
    fnlwgt     