# Environment setting / 環境設定

In [1]:
import os  # noqa: I001
import sys
from pathlib import Path


# Handle utils.py for Colab
if "COLAB_GPU" in os.environ:
    import urllib.request

    demo_utils_url = (
        "https://raw.githubusercontent.com/nics-tw/petsard/main/demo/demo_utils.py"
    )
    exec(urllib.request.urlopen(demo_utils_url).read().decode("utf-8"))
else:
    # demo_utils.py search for local
    for p in [Path.cwd()] + list(Path.cwd().parents)[:10]:
        utils_path = p / "demo_utils.py"
        if utils_path.exists() and "demo" in str(utils_path):
            sys.path.insert(0, str(p))
            exec(open(utils_path).read())
            break

📂 Current working directory: demo/petsard-yaml/splitter-yaml
✅ PETsARD demo_utils loaded. Use quick_setup() to initialize.


## Quick setup / 快速設定: Loader YAML

In [2]:
from demo_utils import display_results, display_yaml_info, quick_setup  # noqa: I001
from petsard import Executor  # noqa: I001


is_colab, branch, yaml_path = quick_setup(
    yaml_file=[
        "basic-splitting.yaml",
        "controlled-overlap.yaml",
        "no-overlap-splitting.yaml",
    ],
    benchmark_data=[
        "adult-income",
        "adult-income_schema",
    ],
    petsard_branch="main",
)

✅ Changed working directory to demo: petsard/demo
   📁 Notebook location: demo/petsard-yaml/splitter-yaml/
   🔍 YAML search priority: 
      1. demo/petsard-yaml/splitter-yaml/
      2. demo/
   💾 Output files will be saved in: demo/
🚀 PETsARD v1.7.0
📅 2025-10-09 00:46:42 UTC+8
✅ Loaded benchmark dataset: adult-income
✅ Downloaded benchmark schema: adult-income_schema
📁 Processing YAML files from subfolder: petsard-yaml/splitter-yaml
✅ Found YAML (1/3): petsard/demo/petsard-yaml/splitter-yaml/basic-splitting.yaml
✅ Found YAML (2/3): petsard/demo/petsard-yaml/splitter-yaml/controlled-overlap.yaml
✅ Found YAML (3/3): petsard/demo/petsard-yaml/splitter-yaml/no-overlap-splitting.yaml


# Execution and Result / 執行與結果

## Basic Splitting / 基本分割

In [3]:
display_yaml_info(yaml_path[0])
exec = Executor(yaml_path[0])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: basic-splitting.yaml
📁 Path: petsard/demo/petsard-yaml/splitter-yaml/basic-splitting.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Splitter:
  basic_split:
    num_samples: 3
    train_split_ratio: 0.8
📊 Execution Results / 執行結果

[1] Loader[load_benchmark_with_schema]_Splitter[basic_split_[3-1]]
------------------------------------------------------------
📦 Dictionary with 2 keys / 包含 2 個鍵的字典
  • train: DataFrame
  • validation: DataFrame

[2] Loader[load_benchmark_with_schema]_Splitter[basic_split_[3-2]]
------------------------------------------------------------
📦 Dictionary with 2 keys / 包含 2 個鍵的字典
  • train: DataFrame
  • validation: DataFrame

[3] Loader[load_benchmark_with_schema]_Splitter[basic_split_[3-3]]
------------------------------------------------------------
📦 Dictionar

## Controlled Overlap / 控制重疊比例

In [7]:
display_yaml_info(yaml_path[1])
exec = Executor(yaml_path[1])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: controlled-overlap.yaml
📁 Path: petsard/demo/petsard-yaml/splitter-yaml/controlled-overlap.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Splitter:
  controlled_overlap:
    num_samples: 10
    train_split_ratio: 0.8
    max_overlap_ratio: 0.8  # Allow 80% overlap
    max_attempts: 500
    random_state: 42
📊 Execution Results / 執行結果

[1] Loader[load_benchmark_with_schema]_Splitter[controlled_overlap_[10-01]]
------------------------------------------------------------
📦 Dictionary with 2 keys / 包含 2 個鍵的字典
  • train: DataFrame
  • validation: DataFrame

[2] Loader[load_benchmark_with_schema]_Splitter[controlled_overlap_[10-02]]
------------------------------------------------------------
📦 Dictionary with 2 keys / 包含 2 個鍵的字典
  • train: DataFrame
  • validation: DataFrame

[3] Loader[load_

## No Overlap Splitting / 無重疊分割

In [None]:
display_yaml_info(yaml_path[2])
exec = Executor(yaml_path[2])
exec.run()
display_results(exec.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: no-overlap-splitting.yaml
📁 Path: petsard/demo/petsard-yaml/splitter-yaml/no-overlap-splitting.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema
Splitter:
  no_overlap:
    num_samples: 2
    train_split_ratio: 0.01
    max_overlap_ratio: 0.0  # Completely non-overlapping
    max_attempts: 100000
    random_state: 42
📊 Execution Results / 執行結果

[1] Loader[load_benchmark_with_schema]_Splitter[no_overlap_[2-1]]
------------------------------------------------------------
📦 Dictionary with 2 keys / 包含 2 個鍵的字典
  • train: DataFrame
  • validation: DataFrame

[2] Loader[load_benchmark_with_schema]_Splitter[no_overlap_[2-2]]
------------------------------------------------------------
📦 Dictionary with 2 keys / 包含 2 個鍵的字典
  • train: DataFrame
  • validation: DataFrame

✅ Total results / 總結果數: 2
