 数据集 BT-TWD 可行性实验

本 notebook 按步骤运行：加载配置 → 读取数据 → 预处理 → 桶树划分 → 基线与 BTTWD k 折实验 → 桶级分析。

In [1]:
# 步骤0：环境与路径设置
import os, sys
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
plt.rcParams['axes.unicode_minus'] = False

# 将项目根目录加入路径，便于导入 bttwdlib
root_path = os.path.abspath(os.path.join(os.getcwd(), '..'))
if root_path not in sys.path:
    sys.path.append(root_path)

from bttwdlib import (
    load_yaml_cfg,
    show_cfg,
    load_dataset,
    prepare_features_and_labels,
    BucketTree,
    run_kfold_experiments,
    log_info,
    set_global_seed,
)

cfg_path = os.path.join(root_path, 'configs', 'adult_bttwd.yaml')
cfg = load_yaml_cfg(cfg_path)
set_global_seed(cfg.get('SEED', {}).get('global_seed', 42))
log_info('【步骤0摘要】环境准备完毕，路径与随机种子已设置。')


【INFO】【2025-12-20 21:50:14】【配置加载】已读取 e:\yan\组\三支决策\机器学习\BT_TWD\configs\adult_bttwd.yaml
【INFO】【2025-12-20 21:50:22】【步骤0摘要】环境准备完毕，路径与随机种子已设置。


In [2]:
# 步骤1：加载配置
show_cfg(cfg)
log_info('【步骤1摘要】配置文件加载完成，关键参数检查通过。')

【INFO】【2025-12-20 21:50:22】【配置-数据】数据集=adult, k折=5, 目标列=income, 正类=">50K"
【INFO】【2025-12-20 21:50:22】【配置-BTTWD】阈值模式=None, 全局模型=xgb, 桶内模型=none, 后验估计器(兼容字段)=logreg
【INFO】【2025-12-20 21:50:22】【配置-基线】LogReg启用=True, RandomForest启用=True, KNN启用=True, XGBoost启用=True
【INFO】【2025-12-20 21:50:22】【步骤1摘要】配置文件加载完成，关键参数检查通过。


In [3]:
# 步骤2：加载原始数据
df_raw, target_col_model = load_dataset(cfg)  # 这里返回的是用于建模的标签列，例如 "label"

display(df_raw.head())
print("用于建模的标签列:", target_col_model)

# 1）画 0/1 标签（延误/不延误）的比例
class_counts = df_raw[target_col_model].value_counts(normalize=True)
ax = class_counts.plot(kind='bar', title='延误 vs 未延误比例')
plt.ylabel('比例')

fig_path = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'class_distribution.png')
os.makedirs(os.path.dirname(fig_path), exist_ok=True)
plt.savefig(fig_path, bbox_inches='tight')
plt.close()

# 2）如果想看原始 DepDelay 的分布，可以另外单独分析：
raw_target_col = cfg['DATA']['target_col']  # 这里是 "DepDelay"
print("原始目标列:", raw_target_col)
print(df_raw[raw_target_col].describe())

log_info('【步骤2摘要】Airlines 原始数据加载与基本统计完成。')


【INFO】【2025-12-20 21:50:23】【数据加载完毕】样本数=32561，特征数=14，正类比例=0.24
【INFO】【2025-12-20 21:50:23】【数据加载】24720 条标签无法映射，未指定负类且未开启 dropna_target，已按 0 处理
【INFO】【2025-12-20 21:50:23】【数据加载】标签列 income 已处理完成：dropna_target=False, 丢弃样本=0, 最终样本数=32561, 正类比例=24.08%
【INFO】【2025-12-20 21:50:23】【数据集信息】名称=adult，样本数=32561，目标列=income，正类比例=24.08%


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,0
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,0
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,0
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,0
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,0


用于建模的标签列: income
原始目标列: income
count    32561.000000
mean         0.240810
std          0.427581
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: income, dtype: float64
【INFO】【2025-12-20 21:50:23】【步骤2摘要】Airlines 原始数据加载与基本统计完成。


In [4]:
# 步骤3：预处理与特征工程
X, y, meta = prepare_features_and_labels(df_raw, cfg)
log_info(f'【预处理】编码特征维度={X.shape[1]}，样本数={X.shape[0]}')
log_info(f"【步骤3摘要】特征预处理完成：连续={len(meta['continuous_cols'])}，类别={len(meta['categorical_cols'])}，编码维度={X.shape[1]}。")

【INFO】【2025-12-20 21:50:23】【预处理】连续特征=6个，类别特征=8个
【INFO】【2025-12-20 21:50:23】【预处理】编码后维度=100
【INFO】【2025-12-20 21:50:23】【预处理】编码特征维度=100，样本数=32561
【INFO】【2025-12-20 21:50:23】【步骤3摘要】特征预处理完成：连续=6，类别=8，编码维度=100。


In [5]:
# 步骤4：构建桶树并检查划分
bucket_tree = BucketTree(cfg['BTTWD']['bucket_levels'], feature_names=df_raw.drop(columns=[cfg['DATA']['target_col']]).columns.tolist())
bucket_ids_full = bucket_tree.assign_buckets(df_raw.drop(columns=[cfg['DATA']['target_col']]))
bucket_df = bucket_ids_full.value_counts().reset_index()
bucket_df.columns = ['bucket_id', 'count']
bucket_df['pos_rate'] = df_raw.groupby(bucket_ids_full)[cfg['DATA']['target_col']].apply(lambda s: (s == cfg['DATA']['positive_label']).mean()).values
display(bucket_df.head())
bucket_df.set_index('bucket_id')['count'].plot(kind='bar', figsize=(12,4), title='桶样本数分布')
fig_bucket = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'bucket_metrics_bar.png')
plt.savefig(fig_bucket, bbox_inches='tight')
plt.close()
log_info(f'【步骤4摘要】桶树划分完成，共有 {bucket_ids_full.nunique()} 个叶子桶。')

【INFO】【2025-12-20 21:50:23】【桶树】已为样本生成桶ID，共 48 个组合


Unnamed: 0,bucket_id,count,pos_rate
0,L1_age=old|L2_education=high|L3_hours-per-week...,3211,0.0
1,L1_age=old|L2_education=mid|L3_hours-per-week=...,2713,0.0
2,L1_age=mid|L2_education=high|L3_hours-per-week...,2584,0.0
3,L1_age=mid|L2_education=mid|L3_hours-per-week=...,2321,0.0
4,L1_age=very_old|L2_education=mid|L3_hours-per-...,1654,0.0


【INFO】【2025-12-20 21:50:24】【步骤4摘要】桶树划分完成，共有 48 个叶子桶。


In [6]:
# 步骤5：运行基线模型 k 折实验
# 基线部分在 run_kfold_experiments 内统一调度
log_info('【步骤5】基线模型将在整体交叉验证中一并运行。')
log_info('【步骤5摘要】基线模型性能将作为后续对比基准。')

【INFO】【2025-12-20 21:50:24】【步骤5】基线模型将在整体交叉验证中一并运行。
【INFO】【2025-12-20 21:50:24】【步骤5摘要】基线模型性能将作为后续对比基准。


In [7]:
import numpy as np
import pandas as pd

print("y 全局标签分布：", np.unique(y, return_counts=True))

print("原始 income 列分布：")
print(df_raw[cfg['DATA']['target_col']].value_counts())


y 全局标签分布： (array([0, 1]), array([24720,  7841], dtype=int64))
原始 income 列分布：
income
0    24720
1     7841
Name: count, dtype: int64


In [8]:
# 步骤6：运行 BTTWD k 折实验（含基线）
results = run_kfold_experiments(X, y, df_raw.drop(columns=[cfg['DATA']['target_col']]), cfg)
summary_df = pd.read_csv(os.path.join(root_path, cfg['OUTPUT']['results_dir'], 'metrics_kfold_summary.csv'))
display(summary_df)
summary_df.plot(x='model', kind='bar', figsize=(8,4), title='模型指标对比')
fig_compare = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'metrics_compare.png')
plt.savefig(fig_compare, bbox_inches='tight')
plt.close()
log_info('【步骤6摘要】BTTWD 与基线的 k 折结果已生成并保存。')

【INFO】【2025-12-20 21:50:25】【基线-LogReg】使用决策阈值=0.400（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-20 21:50:37】【基线-LogReg】整体指标：AUC_mean=0.907, AUC_std=0.002, BAC_mean=0.795, BAC_std=0.002, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.687, F1_std=0.004, Kappa_mean=0.586, Kappa_std=0.006, MCC_mean=0.586, MCC_std=0.005, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.679, Precision_std=0.008, Recall_mean=0.695, Recall_std=0.007, Regret_mean=0.170, Regret_std=0.002
【INFO】【2025-12-20 21:50:37】【基线-RF】使用决策阈值=0.400（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-20 21:50:50】【基线-RF】整体指标：AUC_mean=0.906, AUC_std=0.002, BAC_mean=0.804, BAC_std=0.006, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.696, F1_std=0.008, Kappa_mean=0.596, Kappa_std=0.010, MCC_mean=0.596, MCC_std=0.011, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.676, Precision_std=0.007, Recall_mean=0.717, Recall_std=0.011, Regret_mean=0.164, Regret_std=0.005
【INFO】【2025-12-20 21:50:50】【基线-KNN】使用决策阈值=0.400（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-20 21:50:57】【基线-KNN】整体指标：AUC_mean=0.869, AUC_std=0.006, BAC_mean=0.780, BAC_std=0.008, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.657, F1_std=0.011, Kappa_mean=0.543, Kappa_std=0.015, MCC_mean=0.544, MCC_std=0.015, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.631, Precision_std=0.011, Recall_mean=0.687, Recall_std=0.014, Regret_mean=0.186, Regret_std=0.007
【INFO】【2025-12-20 21:50:57】【基线-XGB】使用决策阈值=0.400（fixed 模式）


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:51:07】【基线-XGB】整体指标：AUC_mean=0.929, AUC_std=0.002, BAC_mean=0.825, BAC_std=0.004, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.730, F1_std=0.005, Kappa_mean=0.643, Kappa_std=0.007, MCC_mean=0.643, MCC_std=0.007, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.717, Precision_std=0.008, Recall_mean=0.744, Recall_std=0.006, Regret_mean=0.145, Regret_std=0.003
【INFO】【2025-12-20 21:51:07】【K折实验】正在执行第 1/5 折...
【INFO】【2025-12-20 21:51:07】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5
【INFO】【2025-12-20 21:51:09】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-20 21:51:09] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=26048
[INFO][BT][2025-12-20 21:51:09] 创建桶 bucket_id=L1_age=mid，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="mid"，n_samples=6816
[INFO][BT][2025-12-20 21:51:09] 创建桶 bucket_id=L1_age=old，level=1，parent_id=ROOT，split_name=L1_age，sp



【INFO】【2025-12-20 21:51:12】【阈值】桶 ROOT（n_val=1561，source=val) 使用本地阈值 α=0.4000, β=0.3000
[INFO][BT][2025-12-20 21:51:12] 桶 bucket_id=ROOT level=0：
    n_train=3645, n_val=1561,
    BAC=0.814, F1=0.723, AUC=0.929,
    Regret=0.131, BND_ratio=0.047, POS_coverage=0.240,
    Score(f1_regret_bnd )=0.569，threshold_source=val
【INFO】【2025-12-20 21:51:13】【阈值】桶 L1_age=mid 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:51:13] 桶 bucket_id=L1_age=mid level=1：
    n_train=953, n_val=448,
    BAC=0.721, F1=0.612, AUC=0.899,
    Regret=0.118, BND_ratio=0.109, POS_coverage=0.129,
    Score(f1_regret_bnd )=0.439，threshold_source=val
[INFO][BT][2025-12-20 21:51:13] 桶 bucket_id=L1_age=mid：
    parent_id=ROOT，parent_Score=0.569, bucket_Score=0.439,
    Gain=-0.131, is_weak=True
【INFO】【2025-12-20 21:51:13】【阈值】桶 L1_age=old 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:51:13] 桶 bucket_id=L1_age=old level=1：
    n_train=1256, n_val=591,
    BAC=0.747, F1=0.748, AUC=0.902,
    Regret=0.176, BND_ratio=0.132, PO

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:51:43】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-20 21:51:43】[BASELINE] 阈值搜索开始
【INFO】【2025-12-20 21:51:44】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.1385
【INFO】【2025-12-20 21:51:44】【桶树】已为样本生成桶ID，共 48 个组合
【INFO】【2025-12-20 21:51:44】[BASELINE] 测试集桶映射完成，共 48 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-20 21:51:47】[BASELINE] 桶 L1_age=old|L2_education=high|L3_hours-per-week=normal_hours: BAC=0.8065, Regret=0.1964, Precision=0.6955, Recall=0.8410
【INFO】【2025-12-20 21:51:47】[BASELINE] 桶 L1_age=old|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.6734, Regret=0.1985, Precision=0.5897, Recall=0.4220
【INFO】【2025-12-20 21:51:47】[BASELINE] 桶 L1_age=very_old|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.7280, Regret=0.2172, Precision=0.5974, Recall=0.5679
【INFO】【2025-12-20 21:51:47】[BASELINE] 桶 L1_age=mid|L2_education=high|L3_hours-per-week=normal_hours: BAC=0.7966, Regret=0.1373, Precision=0.6813, Recall=0.6667
【INFO】【2025-12-20 21:51:47】[BASELINE] 桶 L1_age=mid|L2_education=high|L3_hours-per-week=high_hours: BAC=0.7421, Regret=0.2535, Precision=0.6500, Recall=0.6842
【INFO】【2025-12-20 21:51:47】[BASELINE] 桶 L1_age=young|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.8000, Regret=0.0102, Precision=1.0000, Recall=0.6000
【INFO】【2025-12-20 21:51:47】[BASELINE] 

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-20 21:51:48】【K折实验】正在执行第 2/5 折...
【INFO】【2025-12-20 21:51:48】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:51:56】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-20 21:51:57] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=26049
[INFO][BT][2025-12-20 21:51:57] 创建桶 bucket_id=L1_age=mid，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="mid"，n_samples=6797
[INFO][BT][2025-12-20 21:51:57] 创建桶 bucket_id=L1_age=old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="old"，n_samples=8929
[INFO][BT][2025-12-20 21:51:57] 创建桶 bucket_id=L1_age=very_old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="very_old"，n_samples=5198
[INFO][BT][2025-12-20 21:51:57] 创建桶 bucket_id=L1_age=young，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="young"，n_samples=5125
[INFO][BT][2025-12-20 21:51:57] 创建桶 bucket_id=L1_age=mid|L2_education=high，level=2，parent_id=L1_age=mid，split_name=L2_education，split_type=categorical_group，split_rule="high"，n_samples=



【INFO】【2025-12-20 21:52:01】【阈值】桶 ROOT（n_val=1561，source=val) 使用本地阈值 α=0.5000, β=0.3000
[INFO][BT][2025-12-20 21:52:01] 桶 bucket_id=ROOT level=0：
    n_train=3645, n_val=1561,
    BAC=0.774, F1=0.745, AUC=0.932,
    Regret=0.129, BND_ratio=0.109, POS_coverage=0.208,
    Score(f1_regret_bnd )=0.561，threshold_source=val
【INFO】【2025-12-20 21:52:02】【阈值】桶 L1_age=mid 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:52:02] 桶 bucket_id=L1_age=mid level=1：
    n_train=954, n_val=448,
    BAC=0.680, F1=0.661, AUC=0.888,
    Regret=0.132, BND_ratio=0.172, POS_coverage=0.105,
    Score(f1_regret_bnd )=0.444，threshold_source=val
[INFO][BT][2025-12-20 21:52:02] 桶 bucket_id=L1_age=mid：
    parent_id=ROOT，parent_Score=0.561, bucket_Score=0.444,
    Gain=-0.118, is_weak=True
【INFO】【2025-12-20 21:52:02】【阈值】桶 L1_age=old（n_val=599，source=val) 使用本地阈值 α=0.4000, β=0.3000
[INFO][BT][2025-12-20 21:52:02] 桶 bucket_id=L1_age=old level=1：
    n_train=1241, n_val=599,
    BAC=0.804, F1=0.761, AUC=0.925,
    Regret=0

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:52:39】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-20 21:52:39】[BASELINE] 阈值搜索开始
【INFO】【2025-12-20 21:52:40】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.3000, regret=0.1401
【INFO】【2025-12-20 21:52:40】【桶树】已为样本生成桶ID，共 48 个组合
【INFO】【2025-12-20 21:52:40】[BASELINE] 测试集桶映射完成，共 48 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age=mid|L2_education=high|L3_hours-per-week=normal_hours: BAC=0.7813, Regret=0.1520, Precision=0.6000, Recall=0.6632
【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age=old|L2_education=top|L3_hours-per-week=normal_hours: BAC=0.7816, Regret=0.1733, Precision=0.8095, Recall=0.9140
【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age=mid|L2_education=low|L3_hours-per-week=normal_hours: BAC=1.0000, Regret=0.0000, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age=old|L2_education=high|L3_hours-per-week=normal_hours: BAC=0.8054, Regret=0.1844, Precision=0.6918, Recall=0.8543
【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age=mid|L2_education=high|L3_hours-per-week=high_hours: BAC=0.7643, Regret=0.2338, Precision=0.7089, Recall=0.6829
【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age=old|L2_education=high|L3_hours-per-week=high_hours: BAC=0.7238, Regret=0.2282, Precision=0.7302, Recall=0.8870
【INFO】【2025-12-20 21:52:43】[BASELINE] 桶 L1_age

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-20 21:52:44】【K折实验】正在执行第 3/5 折...
【INFO】【2025-12-20 21:52:44】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:52:47】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-20 21:52:48] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=26049
[INFO][BT][2025-12-20 21:52:48] 创建桶 bucket_id=L1_age=mid，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="mid"，n_samples=6803
[INFO][BT][2025-12-20 21:52:48] 创建桶 bucket_id=L1_age=old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="old"，n_samples=8961
[INFO][BT][2025-12-20 21:52:48] 创建桶 bucket_id=L1_age=very_old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="very_old"，n_samples=5160
[INFO][BT][2025-12-20 21:52:48] 创建桶 bucket_id=L1_age=young，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="young"，n_samples=5125
[INFO][BT][2025-12-20 21:52:48] 创建桶 bucket_id=L1_age=mid|L2_education=high，level=2，parent_id=L1_age=mid，split_name=L2_education，split_type=categorical_group，split_rule="high"，n_samples=



【INFO】【2025-12-20 21:52:52】【阈值】桶 ROOT（n_val=1562，source=val) 使用本地阈值 α=0.5000, β=0.3000
[INFO][BT][2025-12-20 21:52:52] 桶 bucket_id=ROOT level=0：
    n_train=3645, n_val=1562,
    BAC=0.761, F1=0.716, AUC=0.923,
    Regret=0.147, BND_ratio=0.099, POS_coverage=0.213,
    Score(f1_regret_bnd )=0.520，threshold_source=val
【INFO】【2025-12-20 21:52:52】【阈值】桶 L1_age=mid 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:52:52] 桶 bucket_id=L1_age=mid level=1：
    n_train=949, n_val=447,
    BAC=0.709, F1=0.653, AUC=0.924,
    Regret=0.131, BND_ratio=0.152, POS_coverage=0.150,
    Score(f1_regret_bnd )=0.446，threshold_source=val
[INFO][BT][2025-12-20 21:52:52] 桶 bucket_id=L1_age=mid：
    parent_id=ROOT，parent_Score=0.520, bucket_Score=0.446,
    Gain=-0.074, is_weak=True
【INFO】【2025-12-20 21:52:53】【阈值】桶 L1_age=old（n_val=595，source=val) 使用本地阈值 α=0.4000, β=0.3000
[INFO][BT][2025-12-20 21:52:53] 桶 bucket_id=L1_age=old level=1：
    n_train=1259, n_val=595,
    BAC=0.786, F1=0.738, AUC=0.915,
    Regret=0

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:53:22】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-20 21:53:22】[BASELINE] 阈值搜索开始
【INFO】【2025-12-20 21:53:22】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.3000, regret=0.1448
【INFO】【2025-12-20 21:53:22】【桶树】已为样本生成桶ID，共 47 个组合
【INFO】【2025-12-20 21:53:22】[BASELINE] 测试集桶映射完成，共 47 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1_age=very_old|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.7130, Regret=0.1830, Precision=0.6939, Recall=0.4857
【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1_age=mid|L2_education=top|L3_hours-per-week=high_hours: BAC=0.7771, Regret=0.2120, Precision=0.7778, Recall=0.8400
【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1_age=young|L2_education=high|L3_hours-per-week=low_hours: BAC=0.6667, Regret=0.0106, Precision=1.0000, Recall=0.3333
【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1_age=mid|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.6129, Regret=0.0837, Precision=0.6667, Recall=0.2353
【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1_age=young|L2_education=high|L3_hours-per-week=high_hours: BAC=0.8242, Regret=0.0388, Precision=0.6667, Recall=0.6667
【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1_age=old|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.6782, Regret=0.2047, Precision=0.6842, Recall=0.4127
【INFO】【2025-12-20 21:53:25】[BASELINE] 桶 L1

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:53:26】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-20 21:53:27] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=26049
[INFO][BT][2025-12-20 21:53:27] 创建桶 bucket_id=L1_age=mid，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="mid"，n_samples=6841
[INFO][BT][2025-12-20 21:53:27] 创建桶 bucket_id=L1_age=old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="old"，n_samples=8893
[INFO][BT][2025-12-20 21:53:27] 创建桶 bucket_id=L1_age=very_old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="very_old"，n_samples=5181
[INFO][BT][2025-12-20 21:53:27] 创建桶 bucket_id=L1_age=young，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="young"，n_samples=5134
[INFO][BT][2025-12-20 21:53:27] 创建桶 bucket_id=L1_age=mid|L2_education=high，level=2，parent_id=L1_age=mid，split_name=L2_education，split_type=categorical_group，split_rule="high"，n_samples=



【INFO】【2025-12-20 21:53:30】【阈值】桶 ROOT（n_val=1562，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-20 21:53:30] 桶 bucket_id=ROOT level=0：
    n_train=3645, n_val=1562,
    BAC=0.762, F1=0.724, AUC=0.922,
    Regret=0.146, BND_ratio=0.123, POS_coverage=0.260,
    Score(f1_regret_bnd )=0.516，threshold_source=val
【INFO】【2025-12-20 21:53:30】【阈值】桶 L1_age=mid 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:53:30] 桶 bucket_id=L1_age=mid level=1：
    n_train=946, n_val=468,
    BAC=0.713, F1=0.643, AUC=0.906,
    Regret=0.131, BND_ratio=0.115, POS_coverage=0.128,
    Score(f1_regret_bnd )=0.454，threshold_source=val
[INFO][BT][2025-12-20 21:53:30] 桶 bucket_id=L1_age=mid：
    parent_id=ROOT，parent_Score=0.516, bucket_Score=0.454,
    Gain=-0.062, is_weak=True
【INFO】【2025-12-20 21:53:30】【阈值】桶 L1_age=old 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:53:30] 桶 bucket_id=L1_age=old level=1：
    n_train=1242, n_val=598,
    BAC=0.737, F1=0.741, AUC=0.903,
    Regret=0.177, BND_ratio=0.145, PO

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:53:56】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-20 21:53:56】[BASELINE] 阈值搜索开始
【INFO】【2025-12-20 21:53:57】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.1405
【INFO】【2025-12-20 21:53:57】【桶树】已为样本生成桶ID，共 48 个组合
【INFO】【2025-12-20 21:53:57】[BASELINE] 测试集桶映射完成，共 48 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 L1_age=old|L2_education=high|L3_hours-per-week=normal_hours: BAC=0.7925, Regret=0.2050, Precision=0.6844, Recall=0.7759
【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 L1_age=old|L2_education=high|L3_hours-per-week=high_hours: BAC=0.7356, Regret=0.2374, Precision=0.7103, Recall=0.8588
【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 L1_age=very_old|L2_education=mid|L3_hours-per-week=low_hours: BAC=0.6331, Regret=0.0656, Precision=0.4000, Recall=0.2857
【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 L1_age=mid|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.5741, Regret=0.1041, Precision=0.6000, Recall=0.1579
【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 L1_age=old|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.7381, Regret=0.1584, Precision=0.6800, Recall=0.5312
【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 L1_age=very_old|L2_education=high|L3_hours-per-week=high_hours: BAC=0.7475, Regret=0.2257, Precision=0.7536, Recall=0.8814
【INFO】【2025-12-20 21:53:59】[BASELINE] 桶 

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-20 21:54:00】【K折实验】正在执行第 5/5 折...
【INFO】【2025-12-20 21:54:00】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:54:03】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-20 21:54:03] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=26049
[INFO][BT][2025-12-20 21:54:03] 创建桶 bucket_id=L1_age=mid，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="mid"，n_samples=6799
[INFO][BT][2025-12-20 21:54:03] 创建桶 bucket_id=L1_age=old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="old"，n_samples=8958
[INFO][BT][2025-12-20 21:54:03] 创建桶 bucket_id=L1_age=very_old，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="very_old"，n_samples=5159
[INFO][BT][2025-12-20 21:54:03] 创建桶 bucket_id=L1_age=young，level=1，parent_id=ROOT，split_name=L1_age，split_type=numeric_bin，split_rule="young"，n_samples=5133
[INFO][BT][2025-12-20 21:54:03] 创建桶 bucket_id=L1_age=mid|L2_education=high，level=2，parent_id=L1_age=mid，split_name=L2_education，split_type=categorical_group，split_rule="high"，n_samples=



【INFO】【2025-12-20 21:54:07】【阈值】桶 ROOT（n_val=1561，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-20 21:54:07] 桶 bucket_id=ROOT level=0：
    n_train=3645, n_val=1561,
    BAC=0.770, F1=0.732, AUC=0.928,
    Regret=0.144, BND_ratio=0.114, POS_coverage=0.257,
    Score(f1_regret_bnd )=0.531，threshold_source=val
【INFO】【2025-12-20 21:54:07】【阈值】桶 L1_age=mid（n_val=450，source=val) 使用本地阈值 α=0.5000, β=0.3000
[INFO][BT][2025-12-20 21:54:07] 桶 bucket_id=L1_age=mid level=1：
    n_train=952, n_val=450,
    BAC=0.756, F1=0.699, AUC=0.917,
    Regret=0.118, BND_ratio=0.076, POS_coverage=0.129,
    Score(f1_regret_bnd )=0.544，threshold_source=val
[INFO][BT][2025-12-20 21:54:07] 桶 bucket_id=L1_age=mid：
    parent_id=ROOT，parent_Score=0.531, bucket_Score=0.544,
    Gain=+0.013, is_weak=False
【INFO】【2025-12-20 21:54:08】【阈值】桶 L1_age=old 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-20 21:54:08] 桶 bucket_id=L1_age=old level=1：
    n_train=1246, n_val=606,
    BAC=0.734, F1=0.740, AUC=0.911,
    Regret=

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-20 21:54:41】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-20 21:54:41】[BASELINE] 阈值搜索开始
【INFO】【2025-12-20 21:54:43】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.1410
【INFO】【2025-12-20 21:54:43】【桶树】已为样本生成桶ID，共 48 个组合
【INFO】【2025-12-20 21:54:43】[BASELINE] 测试集桶映射完成，共 48 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=old|L2_education=high|L3_hours-per-week=low_hours: BAC=0.7753, Regret=0.2024, Precision=0.6875, Recall=0.6875
【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=old|L2_education=mid|L3_hours-per-week=low_hours: BAC=0.6380, Regret=0.0563, Precision=0.6667, Recall=0.2857
【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=young|L2_education=mid|L3_hours-per-week=low_hours: BAC=1.0000, Regret=0.0000, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=old|L2_education=mid|L3_hours-per-week=high_hours: BAC=0.7097, Regret=0.2549, Precision=0.6066, Recall=0.5873
【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=old|L2_education=top|L3_hours-per-week=high_hours: BAC=0.6660, Regret=0.1055, Precision=0.8990, Recall=0.9570
【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=very_old|L2_education=mid|L3_hours-per-week=normal_hours: BAC=0.7381, Regret=0.1955, Precision=0.5857, Recall=0.5857
【INFO】【2025-12-20 21:54:49】[BASELINE] 桶 L1_age=very

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-20 21:54:51】【K折实验】所有结果已写入 results 目录


Unnamed: 0,model,Precision_mean,Precision_std,Recall_mean,Recall_std,F1_mean,F1_std,BAC_mean,BAC_std,AUC_mean,...,MCC_mean,MCC_std,Kappa_mean,Kappa_std,BND_ratio_mean,BND_ratio_std,POS_Coverage_mean,POS_Coverage_std,Regret_mean,Regret_std
0,BTTWD,0.668038,0.005068,0.797093,0.006025,0.726873,0.004709,0.835723,0.003455,0.927143,...,0.63444,0.00645,0.629897,0.006443,0.092657,0.01022,0.239489,0.010299,0.143845,0.002028
1,LogReg,0.679284,0.008348,0.694556,0.006725,0.686771,0.003809,0.795235,0.002476,0.906752,...,0.586058,0.005472,0.585947,0.005562,0.0,0.0,,,0.169597,0.002065
2,RandomForest,0.676177,0.006853,0.716619,0.011458,0.695783,0.0081,0.80388,0.006046,0.905703,...,0.596043,0.010509,0.595576,0.010402,0.0,0.0,,,0.164345,0.004596
3,KNN,0.630564,0.011008,0.686903,0.014062,0.657493,0.011481,0.779616,0.008167,0.868945,...,0.543573,0.015257,0.542658,0.015174,0.0,0.0,,,0.18579,0.006517
4,XGBoost,0.716983,0.007632,0.744166,0.006066,0.730296,0.005429,0.825481,0.003503,0.929265,...,0.642849,0.007367,0.642634,0.007398,0.0,0.0,,,0.145481,0.002938


【INFO】【2025-12-20 21:54:53】【步骤6摘要】BTTWD 与基线的 k 折结果已生成并保存。


In [9]:
# 步骤7：桶级别分析
bucket_metrics_path = os.path.join(root_path, cfg['OUTPUT']['results_dir'], 'bucket_metrics.csv')
if os.path.exists(bucket_metrics_path):
    bucket_metrics_df = pd.read_csv(bucket_metrics_path)
    display(bucket_metrics_df.head())
    bucket_metrics_df.plot(x='bucket_id', y='pos_rate_all', kind='bar', figsize=(12,4), title='桶正类比例')
    plt.ylabel('正类比例')
    plt.xticks(rotation=90)
    plt.tight_layout()
    plt.savefig(fig_bucket, bbox_inches='tight')
    plt.close()
log_info('【步骤7摘要】桶级指标已整理，可用于局部化分析。')

Unnamed: 0,bucket_id,layer,parent_bucket_id,n_train,n_val,pos_rate_train,pos_rate_val,alpha,beta,regret_val,...,use_gain_weak_backoff,threshold_data_source,parent_with_threshold,n_test,pos_rate_test,BND_ratio_test,POS_Coverage_test,regret_test,fold,pos_rate
0,ROOT,L1,,3645,1561,0.238409,0.213325,0.4,0.3,0.130685,...,True,val,,,,,,,1,0.240786
1,L1_age=old,L1,ROOT,1256,591,0.352707,0.333333,0.4,0.3,0.175973,...,True,val,,,,,,,1,0.361486
2,L1_age=mid,L1,ROOT,953,448,0.197272,0.140625,0.4,0.3,0.118304,...,True,val,,,,,,,1,0.186033
3,L1_age=very_old,L1,ROOT,717,308,0.338912,0.324675,0.4,0.3,0.191558,...,True,val,,,,,,,1,0.325554
4,L1_age=young,L1,ROOT,795,355,0.022642,0.011268,0.5,0.4,0.004225,...,True,val,,,,,,,1,0.017554


  plt.tight_layout()


【INFO】【2025-12-20 21:55:09】【步骤7摘要】桶级指标已整理，可用于局部化分析。


In [10]:
# 步骤8：结果汇总
log_info('【步骤8】检查结果文件与图表。')
print(os.listdir(os.path.join(root_path, cfg['OUTPUT']['results_dir'])))
print(os.listdir(os.path.join(root_path, cfg['OUTPUT']['figs_dir'])))
log_info('【全部步骤完成】Adult 数据集上的 BT-TWD 可行性实验结束。')

【INFO】【2025-12-20 21:55:09】【步骤8】检查结果文件与图表。
['baseline_bucket_metrics.csv', 'bucket_fallback_stats.csv', 'bucket_metrics.csv', 'bucket_metrics_gain.csv', 'bucket_metrics_gain_test_per_fold.csv', 'bucket_thresholds.csv', 'bucket_thresholds_per_fold.csv', 'bucket_tree_structure.csv', 'metrics_kfold_per_fold.csv', 'metrics_kfold_summary.csv', 'metrics_overview.csv']
['bank_class_distribution.png', 'bucket_metrics_bar.png', 'class_distribution.png', 'metrics_compare.png']
【INFO】【2025-12-20 21:55:09】【全部步骤完成】Adult 数据集上的 BT-TWD 可行性实验结束。


In [None]:
# 步骤9：t-SNE 可视化弱桶对比
from bttwdlib import visualize_fallback_with_tsne
from IPython.display import Image

# 调用 t-SNE 可视化函数，参数从 YAML 配置中读取
results = visualize_fallback_with_tsne(config_path=cfg_path)

# 显示保存的图片
Image(filename=results["figure_path"])

# 输出其他结果路径
print(f"t-SNE 嵌入结果保存路径：{results['embedding_path']}")
print(f"弱桶对比摘要保存路径：{results['summary_path']}")
print(f"可视化图片保存路径：{results['figure_path']}")
