# Environment setting
環境設定

In [1]:
import os
import requests
from pathlib import Path


# determine branch, default is main
branch = 'main'

# Check if running in Google Colab
is_colab = 'COLAB_GPU' in os.environ

if is_colab:
    # Download the utils.py file from GitHub
    utils_url = f"https://raw.githubusercontent.com/nics-tw/petsard/{branch}/demo/utils.py"
    response = requests.get(utils_url)

    if response.status_code == 200:
        # Save the utils.py file
        with open('utils.py', 'w') as f:
            f.write(response.text)

        # Create an empty __init__.py
        Path('__init__.py').touch()
    else:
        raise RuntimeError(f"Failed to download utils.py. Status code: {response.status_code}")

In [2]:
# Now import and run the setup
from utils import (
    get_yaml_path,
    setup_environment,
)


setup_environment(
    is_colab,
    branch,
    benchmark_data=[
        'adult-income_ori',
        'adult-income_control',
        'adult-income_syn',
    ]
)

Obtaining file:///Users/justyn.chen/Dropbox/310_Career_%E5%B7%A5%E4%BD%9C/20231016_NICS_%E8%B3%87%E5%AE%89%E9%99%A2/41_PETsARD/petsard
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: petsard
  Building editable for petsard (pyproject.toml): started
  Building editable for petsard (pyproject.toml): finished with status 'done'
  Created wheel for petsard: filename=petsard-1.0.0-py3-none-any.whl size=6548 

In [3]:
from petsard import Executor

# YAML Configuration for PETsARD
PETsARD 的 YAML 設定

## External Synthesis with Default Evaluation
外部合成與預設評測

In [4]:
yaml_file_case: str = 'external-synthesis-default-evaluation.yaml'

yaml_path_case: str = get_yaml_path(
    is_colab=is_colab,
    yaml_file=yaml_file_case,
    branch=branch,
)

Configuration content:
---
Splitter:
  custom:
    method: 'custom_data'
    filepath:
      ori: 'benchmark/adult-income_ori.csv'
      control: 'benchmark/adult-income_control.csv'
Synthesizer:
  custom:
    method: 'custom_data'
    filepath: 'benchmark/adult-income_syn.csv'
Evaluator:
  demo-diagnostic:
    method: 'sdmetrics-diagnosticreport'
  demo-quality:
    method: 'sdmetrics-qualityreport'
  demo-singlingout:
    method: 'anonymeter-singlingout'
  demo-linkability:
    method: 'anonymeter-linkability'
    aux_cols:
      -
        - 'age'
        - 'marital-status'
        - 'relationship'
        - 'gender'
      -
        - 'workclass'
        - 'educational-num'
        - 'occupation'
        - 'income'
  demo-inference:
    method: 'anonymeter-inference'
    secret: 'income'
  demo-classification:
    method: 'mlutility-classification'
    target: 'income'
Reporter:
  save_report_global:
    method: 'save_report'
    granularity: 'global'
...


### Execution and Result
執行與結果

In [5]:
exec_case = Executor(config=yaml_path_case)
exec_case.run()

INFO:root:workclass changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:education changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:educational-num changes data dtype from int8 to float32 for metadata alignment.
INFO:root:marital-status changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:occupation changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:relationship changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:race changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:gender changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:capital-gain changes data dtype from int8 to float32 for metadata alignment.
INFO:root:capital-loss changes data dtype from int8 to float32 for metadata alignment.


Generating report ...

(1/2) Evaluating Data Validity: |██████████| 15/15 [00:00<00:00, 1374.73it/s]|
Data Validity Score: 100.0%

(2/2) Evaluating Data Structure: |██████████| 1/1 [00:00<00:00, 570.73it/s]|
Data Structure Score: 100.0%



INFO:PETsARD.Evaluator:Completed Evaluator execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Reporter with save_report_global
INFO:PETsARD.Reporter:Starting Reporter execution
INFO:PETsARD.Reporter:Completed Reporter execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Evaluator with demo-quality
INFO:PETsARD.Evaluator:Starting Evaluator execution
INFO:root:workclass changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:education changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:educational-num changes data dtype from int8 to float32 for metadata alignment.
INFO:root:marital-status changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:occupation changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:relationship changes data dtype from category[object] to category[object] for metadata alignment.
INFO:ro

Overall Score (Average): 100.0%

Now is petsard[Report]_[global] save to csv...
Generating report ...

(1/2) Evaluating Column Shapes: |██████████| 15/15 [00:00<00:00, 178.36it/s]|
Column Shapes Score: 95.27%

(2/2) Evaluating Column Pair Trends: |          | 0/105 [00:00<?, ?it/s]|

  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).

(2/2) Evaluating Column Pair Trends: |██▉       | 31/105 [00:00<00:00, 301.60it/s]|

  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(


(2/2) Evaluating Column Pair Trends: |██████▋   | 70/105 [00:00<00:00, 351.74it/s]|

  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).

(2/2) Evaluating Column Pair Trends: |██████████| 105/105 [00:00<00:00, 313.16it/s]|
Column Pair Trends Score: 61.55%



  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
  contingency_real = real.groupby(list(columns), dropna=False).size() / len(real)
  contingency_synthetic = synthetic.groupby(list(columns), dropna=False).size() / len(
INFO:PETsARD.Evaluator:Completed Evaluator execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Reporter with save_report_global
INFO:PETsARD.Reporter:Starting Reporter execution
INFO:PETsARD.Reporter:Completed Reporter execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Evaluator with demo-singlingout
INF

Overall Score (Average): 78.41%

Now is petsard[Report]_[global] save to csv...


INFO:PETsARD.Evaluator:Completed Evaluator execution (elapsed: 0:12:04)
INFO:PETsARD.Executor:Executing Reporter with save_report_global
INFO:PETsARD.Reporter:Starting Reporter execution
INFO:PETsARD.Reporter:Completed Reporter execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Evaluator with demo-linkability
INFO:PETsARD.Evaluator:Starting Evaluator execution
INFO:root:workclass changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:education changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:educational-num changes data dtype from int8 to float32 for metadata alignment.
INFO:root:marital-status changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:occupation changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:relationship changes data dtype from category[object] to category[object] for metadata alignment.
INF

Now is petsard[Report]_[global] save to csv...


  self._sanity_check()
INFO:PETsARD.Evaluator:Completed Evaluator execution (elapsed: 0:00:04)
INFO:PETsARD.Executor:Executing Reporter with save_report_global
INFO:PETsARD.Reporter:Starting Reporter execution
INFO:PETsARD.Reporter:Completed Reporter execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Evaluator with demo-inference
INFO:PETsARD.Evaluator:Starting Evaluator execution
INFO:root:workclass changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:education changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:educational-num changes data dtype from int8 to float32 for metadata alignment.
INFO:root:marital-status changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:occupation changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:relationship changes data dtype from category[object] to category[object] for me

Now is petsard[Report]_[global] save to csv...


INFO:PETsARD.Evaluator:Completed Evaluator execution (elapsed: 0:00:02)
INFO:PETsARD.Executor:Executing Reporter with save_report_global
INFO:PETsARD.Reporter:Starting Reporter execution
INFO:PETsARD.Reporter:Completed Reporter execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Executing Evaluator with demo-classification
INFO:PETsARD.Evaluator:Starting Evaluator execution
INFO:root:workclass changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:education changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:educational-num changes data dtype from int8 to float32 for metadata alignment.
INFO:root:marital-status changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:occupation changes data dtype from category[object] to category[object] for metadata alignment.
INFO:root:relationship changes data dtype from category[object] to category[object] for metadata alignment.


Now is petsard[Report]_[global] save to csv...


INFO:PETsARD.Evaluator:Completed Evaluator execution (elapsed: 0:03:08)
INFO:PETsARD.Executor:Executing Reporter with save_report_global
INFO:PETsARD.Reporter:Starting Reporter execution
INFO:PETsARD.Reporter:Completed Reporter execution (elapsed: 0:00:00)
INFO:PETsARD.Executor:Completed PETsARD execution workflow (elapsed: 0:15:18)


Now is petsard[Report]_[global] save to csv...


In [12]:
exec_case.get_result()[
    'Splitter[custom_[1-1]]_Synthesizer[custom]_Evaluator[demo-classification]_Reporter[save_report_global]'
]['[global]']

Unnamed: 0,full_expt_name,Splitter,Synthesizer,Evaluator,demo-diagnostic_Score,demo-diagnostic_Data Validity,demo-diagnostic_Data Structure,demo-quality_Score,demo-quality_Column Shapes,demo-quality_Column Pair Trends,...,demo-inference_attack_rate_err,demo-inference_baseline_rate,demo-inference_baseline_rate_err,demo-inference_control_rate,demo-inference_control_rate_err,demo-classification_ori_mean,demo-classification_ori_std,demo-classification_syn_mean,demo-classification_syn_std,demo-classification_diff
0,Splitter[custom_[1-1]]_Synthesizer[custom]_Eva...,custom_[1-1],custom,[global],1.0,1.0,1.0,0.784126,0.952705,0.615548,...,0.019882,0.647716,0.020913,0.69263,0.020199,0.851418,0.007994,0.777024,0.007439,-0.074394
