# Environment setting / 環境設定

In [1]:
import os  # noqa: I001
import sys
from pathlib import Path


# Handle utils.py for Colab
if "COLAB_GPU" in os.environ:
    import urllib.request

    demo_utils_url = (
        "https://raw.githubusercontent.com/nics-tw/petsard/main/demo/demo_utils.py"
    )
    exec(urllib.request.urlopen(demo_utils_url).read().decode("utf-8"))
else:
    # demo_utils.py search for local
    for p in [Path.cwd()] + list(Path.cwd().parents)[:10]:
        utils_path = p / "demo_utils.py"
        if utils_path.exists() and "demo" in str(utils_path):
            sys.path.insert(0, str(p))
            exec(open(utils_path).read())
            break

📂 Current working directory: demo/petsard-yaml/describer-yaml
✅ PETsARD demo_utils loaded. Use quick_setup() to initialize.


## Quick setup / 快速設定: Describer YAML

In [2]:
from demo_utils import display_results, display_yaml_info, quick_setup  # noqa: I001
from petsard import Executor  # noqa: I001


is_colab, branch, yaml_path = quick_setup(
    config_file=[
        "describer-describe.yaml",
        "describer-compare.yaml",
        "describer-custom-compare.yaml",
    ],
    benchmark_data=None,
    petsard_branch="main",
)

✅ Changed working directory to demo: petsard/demo
   📁 Notebook location: demo/petsard-yaml/describer-yaml/
   🔍 YAML search priority: 
      1. demo/petsard-yaml/describer-yaml/
      2. demo/
   💾 Output files will be saved in: demo/
🚀 PETsARD v1.7.0
📅 2025-10-16 15:31:22 UTC+8
🔧 Added to Python path: petsard/demo/petsard-yaml/describer-yaml
📁 Processing configuration files from subfolder: petsard-yaml/describer-yaml
✅ Found configuration (1/3): petsard/demo/petsard-yaml/describer-yaml/describer-describe.yaml
✅ Found configuration (2/3): petsard/demo/petsard-yaml/describer-yaml/describer-compare.yaml
✅ Found configuration (3/3): petsard/demo/petsard-yaml/describer-yaml/describer-custom-compare.yaml


# Execution and Result / 執行與結果

## Single Dataset Description (describe mode) / 資料有效性診斷

In [3]:
display_yaml_info(yaml_path[0])
exec = Executor(yaml_path[0])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: describer-describe.yaml
📁 Path: petsard/demo/petsard-yaml/describer-yaml/describer-describe.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income_syn
    schema: benchmark://adult-income_schema
Describer:
  describer-describe:
    method: default    # Auto-detects as describe (single source)
    source: Synthesizer
...
📊 Execution Results / 執行結果

[1] Synthesizer[external_data]_Describer[describer-describe]
------------------------------------------------------------
📦 Dictionary with 3 keys / 包含 3 個鍵的字典

  • global: DataFrame (1 rows × 3 columns)
    📋 Showing first 1 rows / 顯示前 1 行:
       row_count  col_count  na_count
    0      39073         15      4078
    📝 Columns / 欄位: row_count, col_count, na_count

  • columnwise: DataFrame (15 rows × 11 columns)
    📋 Showing first 3 rows / 顯示前 3 行:
                       

## Dataset Comparison (compare mode) / 資料集比較 (compare 模式)

In [4]:
display_yaml_info(yaml_path[1])
exec = Executor(yaml_path[1])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: describer-compare.yaml
📁 Path: petsard/demo/petsard-yaml/describer-yaml/describer-compare.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Splitter:
  external_split:
    method: custom_data
    filepath:
      ori: benchmark://adult-income_ori
      control: benchmark://adult-income_control
    schema:
      ori: benchmark://adult-income_schema
      control: benchmark://adult-income_schema
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income_syn
    schema: benchmark://adult-income_schema
Describer:
  describer-compare:
    method: default           # Auto-detects as compare (two sources)
    source:
      base: Splitter.train    # Use Splitter's train output as base
      target: Synthesizer     # Compare with Synthesizer's output
...
📊 Execution Results / 執行結果

[1] Splitter[external_split_[1-1]]_Synthesizer[external_data]_Describer[describer-compare]
----------

## Custom Comparison Method / 自訂比較方法

In [5]:
display_yaml_info(yaml_path[2])
exec = Executor(yaml_path[2])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: describer-custom-compare.yaml
📁 Path: petsard/demo/petsard-yaml/describer-yaml/describer-custom-compare.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Synthesizer:
  external_data:
    method: custom_data
    filepath: benchmark://adult-income_syn
    schema: benchmark://adult-income_schema
Describer:
  custom_comparison:
    method: compare
    source:
      base: Loader
      target: Synthesizer
    stats_method:
      - mean
      - std
      - nunique
      - jsdivergence
    compare_method: diff  # Use difference instead of percentage change
    aggregated_method: mean
    summary_method: mean
...
📊 Execution Results / 執行結果

[1] Loader[load_benchmark_with_schema]_Synthesizer[external_data]_Describer[custom_comparison]
------------------------------------------------------------
