# Environment setting / 環境設定

In [5]:
import os  # noqa: I001
import sys
from pathlib import Path


# Handle utils.py for Colab
if "COLAB_GPU" in os.environ:
    import urllib.request

    demo_utils_url = (
        "https://raw.githubusercontent.com/nics-tw/petsard/main/demo/demo_utils.py"
    )
    exec(urllib.request.urlopen(demo_utils_url).read().decode("utf-8"))
else:
    # demo_utils.py search for local
    for p in [Path.cwd()] + list(Path.cwd().parents)[:10]:
        utils_path = p / "demo_utils.py"
        if utils_path.exists() and "demo" in str(utils_path):
            sys.path.insert(0, str(p))
            exec(open(utils_path).read())
            break

📂 Current working directory: demo
✅ PETsARD demo_utils loaded. Use quick_setup() to initialize.


## Quick setup / 快速設定: Preprocessor YAML - Outlier Handling

In [6]:
from demo_utils import display_results, display_yaml_info, quick_setup  # noqa: I001
from petsard import Executor  # noqa: I001


is_colab, branch, yaml_path = quick_setup(
    config_file=[
        "preprocessor_outlier-field-specific.yaml",
        "preprocessor_outlier-global.yaml",
    ],
    benchmark_data=None,
    petsard_branch="main",
)

✅ Changed working directory to demo: petsard/demo
   📁 Notebook location: demo/petsard-yaml/preprocessor-yaml/
   🔍 YAML search priority: 
      1. demo/petsard-yaml/preprocessor-yaml/
      2. demo/
   💾 Output files will be saved in: demo/
🚀 PETsARD v1.8.0
📅 2025-10-20 02:16:37 UTC+8
📁 Processing configuration files from subfolder: petsard-yaml/preprocessor-yaml
✅ Found configuration (1/2): petsard/demo/petsard-yaml/preprocessor-yaml/preprocessor_outlier-field-specific.yaml
✅ Found configuration (2/2): petsard/demo/petsard-yaml/preprocessor-yaml/preprocessor_outlier-global.yaml


# Execution and Result / 執行與結果

## Field-Specific Outlier Methods / 欄位特定離群值方法

In [7]:
display_yaml_info(yaml_path[0])
exec_now = Executor(yaml_path[0])
exec_now.run()
display_results(exec_now.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: preprocessor_outlier-field-specific.yaml
📁 Path: petsard/demo/petsard-yaml/preprocessor-yaml/preprocessor_outlier-field-specific.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Loader:
  load_benchmark_with_schema:
    filepath: benchmark://adult-income
    schema: benchmark://adult-income_schema

Preprocessor:
  outlier-field-specific:
    sequence:
      - outlier
    config:
      outlier:
        age: 'outlier_zscore'       # Uses Z-Score method for age field
        fnlwgt: 'outlier_iqr'       # Uses IQR method for fnlwgt field
        education-num: None         # Skip outlier handling for this field

        # Note:
        # - Fields set to None: No outlier handling will be applied
        # - Fields not listed (e.g., capital-gain, capital-loss, hours-per-week):
        #   Will use the default method (outlier_iqr)

Reporter:
  save_data:
    method: save_data
    source:
      - Preprocessor
  

## Global Outlier Method / 全域離群值方法

In [8]:
display_yaml_info(yaml_path[1])
exec_now = Executor(yaml_path[1])
exec_now.run()
display_results(exec_now.get_result())

📋 YAML Configuration Files / YAML 設定檔案

📄 File: preprocessor_outlier-global.yaml
📁 Path: petsard/demo/petsard-yaml/preprocessor-yaml/preprocessor_outlier-global.yaml

⚙️ Configuration content / 設定內容:
----------------------------------------
---
Loader:
  load_benchmark_with_schema:
    filepath: 'benchmark://adult-income'
    schema: 'benchmark://adult-income_schema'

Preprocessor:
  outlier-global-simple:
    sequence:
      - outlier
    config:
      # Simplified syntax: apply one global method to ALL numerical fields
      outlier: 'outlier_isolationforest'

      # Available global methods:
      # - 'outlier_isolationforest': Isolation Forest algorithm
      # - 'outlier_lof': Local Outlier Factor algorithm

      # This is equivalent to:
      # outlier:
      #   age: 'outlier_isolationforest'
      #   fnlwgt: 'outlier_isolationforest'
      #   education-num: 'outlier_isolationforest'
      #   capital-gain: 'outlier_isolationforest'
      #   capital-loss: 'outlier_isolation