weatherAUS 数据集 BT-TWD 可行性实验

本 notebook 按步骤运行：加载配置 → 读取数据 → 预处理 → 桶树划分 → 基线与 BTTWD k 折实验 → 桶级分析。

In [1]:
# 步骤0：环境与路径设置
import os, sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
plt.rcParams['axes.unicode_minus'] = False

# 将项目根目录加入路径，便于导入 bttwdlib
root_path = os.path.abspath(os.path.join(os.getcwd(), '..'))
if root_path not in sys.path:
    sys.path.append(root_path)

from bttwdlib import (
    load_yaml_cfg,
    show_cfg,
    load_dataset,
    prepare_features_and_labels,
    BucketTree,
    run_kfold_experiments,
    log_info,
    set_global_seed,
)

cfg_path = Path(root_path) / "configs" / "credit_default_bttwd.yaml"
cfg = load_yaml_cfg(cfg_path)
set_global_seed(cfg.get('SEED', {}).get('global_seed', 42))
log_info('【步骤0摘要】环境准备完毕，路径与随机种子已设置。')


【INFO】【2025-12-25 20:12:18】【配置加载】已读取 e:\yan\组\三支决策\机器学习\BT_TWD\configs\credit_default_bttwd.yaml
【INFO】【2025-12-25 20:12:21】【步骤0摘要】环境准备完毕，路径与随机种子已设置。


In [2]:
# 步骤1：加载配置
show_cfg(cfg)
log_info('【步骤1摘要】配置文件加载完成，关键参数检查通过。')

【INFO】【2025-12-25 20:12:21】【配置-数据】数据集=credit_default, k折=None, 目标列=default payment next month, 正类="1"
【INFO】【2025-12-25 20:12:21】【配置-BTTWD】阈值模式=bucket_wise, 全局模型=xgb, 桶内模型=none, 后验估计器(兼容字段)=logreg
【INFO】【2025-12-25 20:12:21】【配置-基线】LogReg启用=True, RandomForest启用=True, KNN启用=True, XGBoost启用=True
【INFO】【2025-12-25 20:12:21】【步骤1摘要】配置文件加载完成，关键参数检查通过。


In [3]:
# 步骤2：加载原始数据
df_raw, target_col_model = load_dataset(cfg)  # 这里返回的是用于建模的标签列，例如 "label"

display(df_raw.head())
print("用于建模的标签列:", target_col_model)


# 1）画 0/1 标签（流失/未流失）的比例
class_counts = df_raw[target_col_model].value_counts(normalize=True)
ax = class_counts.plot(kind='bar', title='流失 vs 未流失比例')
plt.ylabel('比例')

fig_path = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'class_distribution.png')
os.makedirs(os.path.dirname(fig_path), exist_ok=True)
plt.savefig(fig_path, bbox_inches='tight')
plt.close()

# 2）如果想看原始标签列的分布，可以另外单独分析：
raw_target_col = cfg['DATA']['target_col']  # 这里是原始标签列
print("原始目标列:", raw_target_col)
print(df_raw[raw_target_col].describe())

log_info('【步骤2摘要】Telco Churn 原始数据加载与基本统计完成。')


【INFO】【2025-12-25 20:12:23】【数据加载】目标列 default payment next month 已检测为 0/1 标签，跳过映射逻辑
【INFO】【2025-12-25 20:12:23】【数据集信息】名称=credit_default，样本数=30000，目标列=default payment next month，正类比例=22.12%


Unnamed: 0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default payment next month
0,1,20000,2,2,1,24,2,2,-1,-1,...,0,0,0,0,689,0,0,0,0,1
1,2,120000,2,2,2,26,-1,2,0,0,...,3272,3455,3261,0,1000,1000,1000,0,2000,1
2,3,90000,2,2,2,34,0,0,0,0,...,14331,14948,15549,1518,1500,1000,1000,1000,5000,0
3,4,50000,2,2,1,37,0,0,0,0,...,28314,28959,29547,2000,2019,1200,1100,1069,1000,0
4,5,50000,1,2,1,57,-1,0,-1,0,...,20940,19146,19131,2000,36681,10000,9000,689,679,0


用于建模的标签列: default payment next month
原始目标列: default payment next month
count    30000.000000
mean         0.221200
std          0.415062
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: default payment next month, dtype: float64
【INFO】【2025-12-25 20:12:23】【步骤2摘要】Telco Churn 原始数据加载与基本统计完成。


In [4]:
# 步骤3：预处理与特征工程
X, y, meta = prepare_features_and_labels(df_raw, cfg)
log_info(f'【预处理】编码特征维度={X.shape[1]}，样本数={X.shape[0]}')
log_info(f"【步骤3摘要】特征预处理完成：连续={len(meta['continuous_cols'])}，类别={len(meta['categorical_cols'])}，编码维度={X.shape[1]}。")

【INFO】【2025-12-25 20:12:23】【预处理】缺失值填充策略=median


【INFO】【2025-12-25 20:12:23】已生成 credit_default 派生特征：ever_delay / max_delay / max_delay_bin
【INFO】【2025-12-25 20:12:23】ever_delay 分布：
ever_delay
0    19931
1    10069
【INFO】【2025-12-25 20:12:23】max_delay_bin 分布：
max_delay_bin
0      19931
1-2     8876
3-4     1007
5+       186
【INFO】【2025-12-25 20:12:23】max_delay_bins=[-0.1, 0, 2, 4, 9], labels=['0', '1-2', '3-4', '5+']
【INFO】【2025-12-25 20:12:23】【预处理】连续特征=14个，类别特征=5个
【INFO】【2025-12-25 20:12:23】【预处理】编码后维度=33
【INFO】【2025-12-25 20:12:23】【预处理】编码特征维度=33，样本数=30000
【INFO】【2025-12-25 20:12:23】【步骤3摘要】特征预处理完成：连续=14，类别=5，编码维度=33。


In [5]:
# 步骤4：构建桶树并检查划分（✅与主流程一致：使用步骤3生成的 df_processed）
df_processed = meta.get("df_processed", df_raw)

# 只取桶层真正用到的列，避免把一堆无关列塞进分桶检查
bucket_cols = []
for lvl in cfg["BTTWD"]["bucket_levels"]:
    col_name = lvl.get("col") or lvl.get("feature")
    if col_name and col_name not in bucket_cols:
        bucket_cols.append(col_name)

bucket_tree = BucketTree(
    cfg["BTTWD"]["bucket_levels"],
    feature_names=bucket_cols
)

# ✅关键：用 df_processed（已包含 ever_delay / max_delay_bin）
bucket_ids_full = bucket_tree.assign_buckets(df_processed[bucket_cols])

bucket_df = bucket_ids_full.value_counts().reset_index()
bucket_df.columns = ["bucket_id", "count"]

# 正类比例也用 df_processed 对齐（目标列仍在 df_processed 里）
bucket_df["pos_rate"] = (
    df_processed.groupby(bucket_ids_full)[target_col_model]
    .apply(lambda s: (s == 1).mean())
    .values
)

display(bucket_df.head())

bucket_df.set_index("bucket_id")["count"].plot(
    kind="bar", figsize=(12, 4), title="桶样本数分布"
)
fig_bucket = os.path.join(root_path, cfg["OUTPUT"]["figs_dir"], "bucket_metrics_bar.png")
plt.savefig(fig_bucket, bbox_inches="tight")
plt.close()

log_info(f"【步骤4摘要】桶树划分完成，共有 {bucket_ids_full.nunique()} 个叶子桶。")


【INFO】【2025-12-25 20:12:23】【桶树】已为样本生成桶ID，共 12 个组合


Unnamed: 0,bucket_id,count,pos_rate
0,L1_ever_delay=no_delay|L2_max_delay_bin=none|L...,7565,0.697674
1,L1_ever_delay=no_delay|L2_max_delay_bin=none|L...,6468,0.6125
2,L1_ever_delay=no_delay|L2_max_delay_bin=none|L...,5898,0.646766
3,L1_ever_delay=has_delay|L2_max_delay_bin=mild|...,4044,0.317972
4,L1_ever_delay=has_delay|L2_max_delay_bin=mild|...,2879,0.447577


【INFO】【2025-12-25 20:12:24】【步骤4摘要】桶树划分完成，共有 12 个叶子桶。


In [6]:
# 步骤5：运行基线模型 k 折实验
# 基线部分在 run_kfold_experiments 内统一调度
log_info('【步骤5】基线模型将在整体交叉验证中一并运行。')
log_info('【步骤5摘要】基线模型性能将作为后续对比基准。')

【INFO】【2025-12-25 20:12:24】【步骤5】基线模型将在整体交叉验证中一并运行。
【INFO】【2025-12-25 20:12:24】【步骤5摘要】基线模型性能将作为后续对比基准。


In [7]:
import numpy as np
import pandas as pd

print("y 全局标签分布：", np.unique(y, return_counts=True))

print("原始标签列分布：")
print(df_raw[cfg['DATA']['target_col']].value_counts())


y 全局标签分布： (array([0, 1]), array([23364,  6636], dtype=int64))
原始标签列分布：
default payment next month
0    23364
1     6636
Name: count, dtype: int64


In [8]:
# 步骤6：运行 BTTWD k 折实验（含基线）——✅使用 df_processed 作为分桶输入
df_processed = meta.get("df_processed", df_raw)

# 训练/分桶用的 df 不能包含标签列
target_col = cfg["DATA"]["target_col"]
X_df_for_bucket = df_processed.drop(columns=[target_col], errors="ignore")

results = run_kfold_experiments(X, y, X_df_for_bucket, cfg)

summary_path = os.path.join(root_path, cfg["OUTPUT"]["results_dir"], "metrics_kfold_summary.csv")
summary_df = pd.read_csv(summary_path)
display(summary_df)


【INFO】【2025-12-25 20:12:24】【基线-LogReg】使用决策阈值=0.300（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-25 20:12:26】【基线-LogReg】整体指标：AUC_mean=0.745, AUC_std=0.006, BAC_mean=0.702, BAC_std=0.005, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.520, F1_std=0.006, Kappa_mean=0.353, Kappa_std=0.009, MCC_mean=0.363, MCC_std=0.009, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.445, Precision_std=0.007, Recall_mean=0.627, Recall_std=0.014, Regret_mean=0.421, Regret_std=0.007
【INFO】【2025-12-25 20:12:26】【基线-RF】使用决策阈值=0.300（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-25 20:12:37】【基线-RF】整体指标：AUC_mean=0.755, AUC_std=0.004, BAC_mean=0.698, BAC_std=0.007, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.517, F1_std=0.009, Kappa_mean=0.355, Kappa_std=0.011, MCC_mean=0.361, MCC_std=0.012, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.455, Precision_std=0.008, Recall_mean=0.599, Recall_std=0.014, Regret_mean=0.425, Regret_std=0.009
【INFO】【2025-12-25 20:12:37】【基线-KNN】使用决策阈值=0.300（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:12:39】【基线-KNN】整体指标：AUC_mean=0.690, AUC_std=0.007, BAC_mean=0.653, BAC_std=0.008, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.456, F1_std=0.009, Kappa_mean=0.246, Kappa_std=0.012, MCC_mean=0.263, MCC_std=0.013, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.363, Precision_std=0.007, Recall_mean=0.613, Recall_std=0.016, Regret_mean=0.495, Regret_std=0.011
【INFO】【2025-12-25 20:12:39】【基线-XGB】使用决策阈值=0.300（fixed 模式）


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:12:41】【基线-XGB】整体指标：AUC_mean=0.762, AUC_std=0.004, BAC_mean=0.701, BAC_std=0.007, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.523, F1_std=0.010, Kappa_mean=0.368, Kappa_std=0.013, MCC_mean=0.372, MCC_std=0.013, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.471, Precision_std=0.010, Recall_mean=0.589, Recall_std=0.013, Regret_mean=0.419, Regret_std=0.010
【INFO】【2025-12-25 20:12:41】【K折实验】正在执行第 1/5 折...
【INFO】【2025-12-25 20:12:41】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5
【INFO】【2025-12-25 20:12:41】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:12:41] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=24000
[INFO][BT][2025-12-25 20:12:41] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=7994
[INFO][BT][2025-12-25 20:12:41] 创建桶 bucket_id=L1_ever_delay=no_delay，level



【INFO】【2025-12-25 20:12:43】【阈值】桶 ROOT（n_val=959，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:12:43] 桶 bucket_id=ROOT level=0：
    n_train=3839, n_val=959,
    BAC=0.591, F1=0.523, AUC=0.793,
    Regret=0.392, BND_ratio=0.201, POS_coverage=0.192,
    Score(f1_regret_bnd )=0.030，threshold_source=val
【INFO】【2025-12-25 20:12:43】【阈值】桶 L1_ever_delay=has_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:12:43] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1369, n_val=472,
    BAC=0.329, F1=0.655, AUC=0.702,
    Regret=0.468, BND_ratio=0.519, POS_coverage=0.481,
    Score(f1_regret_bnd )=-0.072，threshold_source=val
[INFO][BT][2025-12-25 20:12:43] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=0.030, bucket_Score=-0.072,
    Gain=-0.103, is_weak=True
【INFO】【2025-12-25 20:12:43】【阈值】桶 L1_ever_delay=no_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:12:43] 桶 bucket_id=L1_ever_delay=no_delay level=1：
    n_train=2557, n_val=643,
    BAC=0

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:12:47】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-25 20:12:47】[BASELINE] 阈值搜索开始
【INFO】【2025-12-25 20:12:48】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.4002
【INFO】【2025-12-25 20:12:48】【桶树】已为样本生成桶ID，共 12 个组合
【INFO】【2025-12-25 20:12:48】[BASELINE] 测试集桶映射完成，共 12 个桶
【INFO】【2025-12-25 20:12:48】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=high_limit: BAC=0.5054, Regret=0.2733, Precision=0.3333, Recall=0.0137
【INFO】【2025-12-25 20:12:48】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=mid_limit: BAC=0.5986, Regret=0.5411, Precision=0.4658, Recall=0.6044
【INFO】【2025-12-25 20:12:48】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=low_limit: BAC=0.5661, Regret=0.5170, Precision=0.5383, Recall=0.7584
【INFO】【2025-12-25 20:12:48】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=high_limit: BAC=0.5900, Regret=0.5662, Precision=0.4248, Recall=0.3871
【INFO】【2025-12-25 20:12:48】[BASELI

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:12:48】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:12:49] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=24000
[INFO][BT][2025-12-25 20:12:49] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=8089
[INFO][BT][2025-12-25 20:12:49] 创建桶 bucket_id=L1_ever_delay=no_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="no_delay"，n_samples=15911
[INFO][BT][2025-12-25 20:12:49] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=medium，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="medium"，n_samples=780
[INFO][BT][2025-12-25 20:12:49] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=mild，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="mild"，n_samples=7158
[IN



【INFO】【2025-12-25 20:12:50】【阈值】桶 ROOT（n_val=959，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:12:50] 桶 bucket_id=ROOT level=0：
    n_train=3839, n_val=959,
    BAC=0.611, F1=0.539, AUC=0.788,
    Regret=0.355, BND_ratio=0.197, POS_coverage=0.199,
    Score(f1_regret_bnd )=0.085，threshold_source=val
【INFO】【2025-12-25 20:12:51】【阈值】桶 L1_ever_delay=has_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:12:51] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1389, n_val=467,
    BAC=0.432, F1=0.677, AUC=0.646,
    Regret=0.466, BND_ratio=0.266, POS_coverage=0.715,
    Score(f1_regret_bnd )=0.079，threshold_source=val
[INFO][BT][2025-12-25 20:12:51] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=0.085, bucket_Score=0.079,
    Gain=-0.006, is_weak=True
【INFO】【2025-12-25 20:12:51】【阈值】桶 L1_ever_delay=no_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:12:51] 桶 bucket_id=L1_ever_delay=no_delay level=1：
    n_train=2548, n_val=633,
    BAC=0.5

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:12:55】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-25 20:12:55】[BASELINE] 阈值搜索开始
【INFO】【2025-12-25 20:12:55】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.4074
【INFO】【2025-12-25 20:12:55】【桶树】已为样本生成桶ID，共 11 个组合
【INFO】【2025-12-25 20:12:55】[BASELINE] 测试集桶映射完成，共 11 个桶
【INFO】【2025-12-25 20:12:56】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=low_limit: BAC=0.5763, Regret=0.5408, Precision=0.4663, Recall=0.8343
【INFO】【2025-12-25 20:12:56】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=low_limit: BAC=0.5040, Regret=0.4349, Precision=0.4000, Recall=0.0109
【INFO】【2025-12-25 20:12:56】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=high_limit: BAC=0.5512, Regret=0.5787, Precision=0.4432, Recall=0.3000
【INFO】【2025-12-25 20:12:56】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=high_limit: BAC=0.5022, Regret=0.2709, Precision=0.2000, Recall=0.0073
【INFO】【2025-12-25 20:12:56】[BASELIN

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:12:56】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:12:56] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=24000
[INFO][BT][2025-12-25 20:12:56] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=8031
[INFO][BT][2025-12-25 20:12:56] 创建桶 bucket_id=L1_ever_delay=no_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="no_delay"，n_samples=15969
[INFO][BT][2025-12-25 20:12:56] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=medium，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="medium"，n_samples=812
[INFO][BT][2025-12-25 20:12:56] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=mild，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="mild"，n_samples=7073
[IN



【INFO】【2025-12-25 20:12:58】【阈值】桶 ROOT（n_val=960，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:12:58] 桶 bucket_id=ROOT level=0：
    n_train=3839, n_val=960,
    BAC=0.600, F1=0.517, AUC=0.759,
    Regret=0.387, BND_ratio=0.186, POS_coverage=0.204,
    Score(f1_regret_bnd )=0.037，threshold_source=val
【INFO】【2025-12-25 20:12:58】【阈值】桶 L1_ever_delay=has_delay（n_val=485，source=val) 使用本地阈值 α=0.4000, β=0.0000
[INFO][BT][2025-12-25 20:12:58] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1370, n_val=485,
    BAC=0.420, F1=0.672, AUC=0.665,
    Regret=0.476, BND_ratio=0.287, POS_coverage=0.713,
    Score(f1_regret_bnd )=0.053，threshold_source=val
[INFO][BT][2025-12-25 20:12:58] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=0.037, bucket_Score=0.053,
    Gain=+0.017, is_weak=False
【INFO】【2025-12-25 20:12:59】【阈值】桶 L1_ever_delay=no_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:12:59] 桶 bucket_id=L1_ever_delay=no_delay level=1：
    n_train=25

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:13:03】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-25 20:13:03】[BASELINE] 阈值搜索开始
【INFO】【2025-12-25 20:13:03】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.4002
【INFO】【2025-12-25 20:13:03】【桶树】已为样本生成桶ID，共 12 个组合
【INFO】【2025-12-25 20:13:03】[BASELINE] 测试集桶映射完成，共 12 个桶
【INFO】【2025-12-25 20:13:04】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=mid_limit: BAC=0.5032, Regret=0.3230, Precision=0.5000, Recall=0.0074
【INFO】【2025-12-25 20:13:04】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=low_limit: BAC=0.4988, Regret=0.4412, Precision=0.1111, Recall=0.0054
【INFO】【2025-12-25 20:13:04】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=high_limit: BAC=0.5045, Regret=0.2731, Precision=0.2222, Recall=0.0142
【INFO】【2025-12-25 20:13:04】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=low_limit: BAC=0.5805, Regret=0.5269, Precision=0.4750, Recall=0.7886
【INFO】【2025-12-25 20:13:04】[BASELINE]

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:13:04】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:13:04] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=24000
[INFO][BT][2025-12-25 20:13:04] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=8122
[INFO][BT][2025-12-25 20:13:04] 创建桶 bucket_id=L1_ever_delay=no_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="no_delay"，n_samples=15878
[INFO][BT][2025-12-25 20:13:04] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=medium，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="medium"，n_samples=829
[INFO][BT][2025-12-25 20:13:04] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=mild，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="mild"，n_samples=7145
[IN



【INFO】【2025-12-25 20:13:06】【阈值】桶 ROOT（n_val=960，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:13:06] 桶 bucket_id=ROOT level=0：
    n_train=3839, n_val=960,
    BAC=0.567, F1=0.475, AUC=0.765,
    Regret=0.415, BND_ratio=0.201, POS_coverage=0.198,
    Score(f1_regret_bnd )=-0.040，threshold_source=val
【INFO】【2025-12-25 20:13:06】【阈值】桶 L1_ever_delay=has_delay（n_val=474，source=val) 使用本地阈值 α=0.4000, β=0.1000
[INFO][BT][2025-12-25 20:13:06] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1386, n_val=474,
    BAC=0.419, F1=0.658, AUC=0.642,
    Regret=0.485, BND_ratio=0.283, POS_coverage=0.705,
    Score(f1_regret_bnd )=0.032，threshold_source=val
[INFO][BT][2025-12-25 20:13:06] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=-0.040, bucket_Score=0.032,
    Gain=+0.072, is_weak=False
【INFO】【2025-12-25 20:13:06】【阈值】桶 L1_ever_delay=no_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:13:06] 桶 bucket_id=L1_ever_delay=no_delay level=1：
    n_train=

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:13:10】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-25 20:13:10】[BASELINE] 阈值搜索开始
【INFO】【2025-12-25 20:13:11】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.4096
【INFO】【2025-12-25 20:13:11】【桶树】已为样本生成桶ID，共 12 个组合
【INFO】【2025-12-25 20:13:11】[BASELINE] 测试集桶映射完成，共 12 个桶
【INFO】【2025-12-25 20:13:11】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=mid_limit: BAC=0.6145, Regret=0.5514, Precision=0.5022, Recall=0.5743
【INFO】【2025-12-25 20:13:11】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=low_limit: BAC=0.5038, Regret=0.4249, Precision=0.3333, Recall=0.0116
【INFO】【2025-12-25 20:13:11】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=mid_limit: BAC=0.4987, Regret=0.3254, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-25 20:13:11】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=low_limit: BAC=0.5687, Regret=0.5127, Precision=0.5193, Recall=0.7903
【INFO】【2025-12-25 20:13:11】[BASELINE]

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:13:11】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:13:12] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=24000
[INFO][BT][2025-12-25 20:13:12] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=8040
[INFO][BT][2025-12-25 20:13:12] 创建桶 bucket_id=L1_ever_delay=no_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="no_delay"，n_samples=15960
[INFO][BT][2025-12-25 20:13:12] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=medium，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="medium"，n_samples=794
[INFO][BT][2025-12-25 20:13:12] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=mild，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="mild"，n_samples=7091
[IN



【INFO】【2025-12-25 20:13:13】【阈值】桶 ROOT（n_val=960，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:13:13] 桶 bucket_id=ROOT level=0：
    n_train=3840, n_val=960,
    BAC=0.616, F1=0.549, AUC=0.791,
    Regret=0.346, BND_ratio=0.198, POS_coverage=0.193,
    Score(f1_regret_bnd )=0.104，threshold_source=val
【INFO】【2025-12-25 20:13:13】【阈值】桶 L1_ever_delay=has_delay（n_val=479，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:13:13] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1383, n_val=479,
    BAC=0.473, F1=0.688, AUC=0.706,
    Regret=0.439, BND_ratio=0.251, POS_coverage=0.676,
    Score(f1_regret_bnd )=0.124，threshold_source=val
[INFO][BT][2025-12-25 20:13:13] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=0.104, bucket_Score=0.124,
    Gain=+0.020, is_weak=False
【INFO】【2025-12-25 20:13:14】【阈值】桶 L1_ever_delay=no_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:13:14] 桶 bucket_id=L1_ever_delay=no_delay level=1：
    n_train=25

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:13:18】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-25 20:13:18】[BASELINE] 阈值搜索开始
【INFO】【2025-12-25 20:13:18】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.3852
【INFO】【2025-12-25 20:13:18】【桶树】已为样本生成桶ID，共 12 个组合
【INFO】【2025-12-25 20:13:18】[BASELINE] 测试集桶映射完成，共 12 个桶




【INFO】【2025-12-25 20:13:18】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=high_limit: BAC=0.5282, Regret=0.2760, Precision=0.5000, Recall=0.0629
【INFO】【2025-12-25 20:13:18】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=mid_limit: BAC=0.5945, Regret=0.5537, Precision=0.4814, Recall=0.5992
【INFO】【2025-12-25 20:13:18】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=low_limit: BAC=0.4995, Regret=0.4592, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-25 20:13:18】[BASELINE] 桶 L1_ever_delay=no_delay|L2_max_delay_bin=none|L3_LIMIT_BAL=mid_limit: BAC=0.5022, Regret=0.3301, Precision=0.2500, Recall=0.0070
【INFO】【2025-12-25 20:13:18】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=high_limit: BAC=0.5637, Regret=0.6082, Precision=0.4059, Recall=0.3504
【INFO】【2025-12-25 20:13:18】[BASELINE] 桶 L1_ever_delay=has_delay|L2_max_delay_bin=mild|L3_LIMIT_BAL=low_limit: BAC=0.5708, Regret=0.5333, Precision=0.4909, R

Unnamed: 0,model,Precision_mean,Precision_std,Recall_mean,Recall_std,F1_mean,F1_std,BAC_mean,BAC_std,AUC_mean,...,MCC_mean,MCC_std,Kappa_mean,Kappa_std,BND_ratio_mean,BND_ratio_std,POS_Coverage_mean,POS_Coverage_std,Regret_mean,Regret_std
0,BTTWD,0.440813,0.010147,0.636826,0.013936,0.520967,0.01103,0.703664,0.008571,0.759398,...,0.362578,0.015455,0.351395,0.015195,0.157333,0.020253,0.210833,0.017541,0.403617,0.007816
1,LogReg,0.445227,0.007344,0.626579,0.013601,0.520454,0.006285,0.70235,0.005043,0.74527,...,0.362823,0.008697,0.353172,0.008633,0.0,0.0,,,0.4206,0.006671
2,RandomForest,0.455229,0.007578,0.598552,0.014403,0.517083,0.008805,0.697538,0.00673,0.755216,...,0.361081,0.011573,0.355029,0.011198,0.0,0.0,,,0.424867,0.008967
3,KNN,0.362512,0.00688,0.613016,0.01588,0.455562,0.009012,0.653409,0.007814,0.689636,...,0.263198,0.01313,0.245955,0.012017,0.0,0.0,,,0.495267,0.010505
4,XGBoost,0.470649,0.009512,0.589358,0.012818,0.523309,0.009693,0.700517,0.007042,0.762483,...,0.371966,0.012989,0.367809,0.012868,0.0,0.0,,,0.419167,0.009592


In [9]:
# 步骤7：桶级别分析
bucket_metrics_path = os.path.join(root_path, cfg['OUTPUT']['results_dir'], 'bucket_metrics.csv')
if os.path.exists(bucket_metrics_path):
    bucket_metrics_df = pd.read_csv(bucket_metrics_path)
    display(bucket_metrics_df.head())
    bucket_metrics_df.plot(x='bucket_id', y='pos_rate_all', kind='bar', figsize=(12,4), title='桶正类比例')
    plt.ylabel('正类比例')
    plt.xticks(rotation=90)
    plt.tight_layout()
    plt.savefig(fig_bucket, bbox_inches='tight')
    plt.close()
log_info('【步骤7摘要】桶级指标已整理，可用于局部化分析。')

Unnamed: 0,bucket_id,layer,parent_bucket_id,n_train,n_val,pos_rate_train,pos_rate_val,alpha,beta,regret_val,...,use_gain_weak_backoff,threshold_data_source,parent_with_threshold,n_test,pos_rate_test,BND_ratio_test,POS_Coverage_test,regret_test,fold,pos_rate
0,ROOT,L1,,3839,959,0.222454,0.23879,0.4,0.2,0.392075,...,True,val,,,,,,,1,0.221167
1,L1_ever_delay=no_delay,L1,ROOT,2557,643,0.116543,0.107309,0.4,0.2,0.298212,...,True,val,,,,,,,1,0.118268
2,L1_ever_delay=no_delay|L2_max_delay_bin=none,L2,L1_ever_delay=no_delay,2557,643,0.118107,0.116641,0.4,0.2,0.342535,...,True,val,,,,,,,1,0.118268
3,L1_ever_delay=has_delay,L1,ROOT,1369,472,0.447772,0.476695,0.4,0.2,0.467691,...,True,val,,,,,,,1,0.427195
4,L1_ever_delay=has_delay|L2_max_delay_bin=mild,L2,L1_ever_delay=has_delay,1130,277,0.39469,0.429603,0.4,0.2,0.518051,...,True,val,,,,,,,1,0.399886


  plt.tight_layout()


【INFO】【2025-12-25 20:13:20】【步骤7摘要】桶级指标已整理，可用于局部化分析。


In [10]:
# 步骤8：结果汇总
log_info('【步骤8】检查结果文件与图表。')
print(os.listdir(os.path.join(root_path, cfg['OUTPUT']['results_dir'])))
print(os.listdir(os.path.join(root_path, cfg['OUTPUT']['figs_dir'])))
log_info('【全部步骤完成】Telco Churn 数据集上的 BT-TWD 实验结束。')

【INFO】【2025-12-25 20:13:20】【步骤8】检查结果文件与图表。
['baseline_bucket_metrics.csv', 'bucket_fallback_stats.csv', 'bucket_metrics.csv', 'bucket_metrics_gain.csv', 'bucket_metrics_gain_test_per_fold.csv', 'bucket_thresholds.csv', 'bucket_thresholds_per_fold.csv', 'bucket_tree_structure.csv', 'fallback_off_tsne', 'fallback_on_tsne', 'metrics_kfold_per_fold.csv', 'metrics_kfold_summary.csv', 'metrics_overview.csv', 'per_sample_test_predictions.csv', 'tsne_fallback']
['bank_class_distribution.png', 'bucket_metrics_bar.png', 'class_distribution.png', 'metrics_compare.png']
【INFO】【2025-12-25 20:13:20】【全部步骤完成】Telco Churn 数据集上的 BT-TWD 实验结束。


In [11]:
# 步骤9：t-SNE 可视化弱桶对比
from bttwdlib import visualize_fallback_with_tsne
from IPython.display import Image

# 调用 t-SNE 可视化函数，参数从 YAML 配置中读取
tsne_output_dir = os.path.join(root_path, cfg["OUTPUT"]["results_dir"], "tsne_fallback")
os.makedirs(tsne_output_dir, exist_ok=True)
results = visualize_fallback_with_tsne(config_path=cfg_path, output_dir=tsne_output_dir)

# 显示保存的图片
Image(filename=results["figure_path"])

# 输出其他结果路径
print(f"t-SNE 嵌入结果保存路径：{results['embedding_path']}")
print(f"弱桶对比摘要保存路径：{results['summary_path']}")
print(f"可视化图片保存路径：{results['figure_path']}")


【INFO】【2025-12-25 20:13:20】【配置加载】已读取 e:\yan\组\三支决策\机器学习\BT_TWD\configs\credit_default_bttwd.yaml
【INFO】【2025-12-25 20:13:21】【数据加载】目标列 default payment next month 已检测为 0/1 标签，跳过映射逻辑
【INFO】【2025-12-25 20:13:21】【数据集信息】名称=credit_default，样本数=30000，目标列=default payment next month，正类比例=22.12%
【INFO】【2025-12-25 20:13:21】【预处理】缺失值填充策略=median
【INFO】【2025-12-25 20:13:21】已生成 credit_default 派生特征：ever_delay / max_delay / max_delay_bin
【INFO】【2025-12-25 20:13:21】ever_delay 分布：
ever_delay
0    19931
1    10069
【INFO】【2025-12-25 20:13:21】max_delay_bin 分布：
max_delay_bin
0      19931
1-2     8876
3-4     1007
5+       186
【INFO】【2025-12-25 20:13:21】max_delay_bins=[-0.1, 0, 2, 4, 9], labels=['0', '1-2', '3-4', '5+']
【INFO】【2025-12-25 20:13:21】【预处理】连续特征=14个，类别特征=5个
【INFO】【2025-12-25 20:13:21】【预处理】编码后维度=33
[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Indexed 30000 samples in 0.002s...
[t-SNE] Computed neighbors for 30000 samples in 0.971s...
[t-SNE] Computed conditional probabilities for sample 1000 / 300

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:14:50】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:14:51] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=30000
[INFO][BT][2025-12-25 20:14:51] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=10069
[INFO][BT][2025-12-25 20:14:51] 创建桶 bucket_id=L1_ever_delay=no_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="no_delay"，n_samples=19931
[INFO][BT][2025-12-25 20:14:51] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=medium，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="medium"，n_samples=1007
[INFO][BT][2025-12-25 20:14:51] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=mild，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="mild"，n_samples=8876
[



【INFO】【2025-12-25 20:14:53】【阈值】桶 ROOT（n_val=1200，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:14:53] 桶 bucket_id=ROOT level=0：
    n_train=4799, n_val=1200,
    BAC=0.610, F1=0.512, AUC=0.773,
    Regret=0.370, BND_ratio=0.174, POS_coverage=0.205,
    Score(f1_regret_bnd )=0.055，threshold_source=val
【INFO】【2025-12-25 20:14:53】【阈值】桶 L1_ever_delay=has_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:14:53] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1730, n_val=432,
    BAC=0.398, F1=0.631, AUC=0.673,
    Regret=0.507, BND_ratio=0.345, POS_coverage=0.655,
    Score(f1_regret_bnd )=-0.048，threshold_source=val
[INFO][BT][2025-12-25 20:14:53] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=0.055, bucket_Score=-0.048,
    Gain=-0.103, is_weak=True
【INFO】【2025-12-25 20:14:53】【阈值】桶 L1_ever_delay=no_delay 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-25 20:14:53] 桶 bucket_id=L1_ever_delay=no_delay level=1：
    n_train=3179, n_val=807,
    BAC

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-25 20:15:02】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-25 20:15:02] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=30000
[INFO][BT][2025-12-25 20:15:02] 创建桶 bucket_id=L1_ever_delay=has_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="has_delay"，n_samples=10069
[INFO][BT][2025-12-25 20:15:02] 创建桶 bucket_id=L1_ever_delay=no_delay，level=1，parent_id=ROOT，split_name=L1_ever_delay，split_type=numeric_bin，split_rule="no_delay"，n_samples=19931
[INFO][BT][2025-12-25 20:15:02] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=medium，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="medium"，n_samples=1007
[INFO][BT][2025-12-25 20:15:02] 创建桶 bucket_id=L1_ever_delay=has_delay|L2_max_delay_bin=mild，level=2，parent_id=L1_ever_delay=has_delay，split_name=L2_max_delay_bin，split_type=categorical_group，split_rule="mild"，n_samples=8876
[



【INFO】【2025-12-25 20:15:04】【阈值】桶 ROOT（n_val=1200，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-25 20:15:04] 桶 bucket_id=ROOT level=0：
    n_train=4799, n_val=1200,
    BAC=0.610, F1=0.512, AUC=0.773,
    Regret=0.370, BND_ratio=0.174, POS_coverage=0.205,
    Score(f1_regret_bnd )=nan，threshold_source=val
【INFO】【2025-12-25 20:15:04】【阈值】桶 L1_ever_delay=has_delay（n_val=432，source=val) 使用本地阈值 α=0.4000, β=0.0000
[INFO][BT][2025-12-25 20:15:04] 桶 bucket_id=L1_ever_delay=has_delay level=1：
    n_train=1730, n_val=432,
    BAC=0.398, F1=0.631, AUC=0.673,
    Regret=0.507, BND_ratio=0.345, POS_coverage=0.655,
    Score(f1_regret_bnd )=nan，threshold_source=val
[INFO][BT][2025-12-25 20:15:04] 桶 bucket_id=L1_ever_delay=has_delay：
    parent_id=ROOT，parent_Score=nan, bucket_Score=nan,
    Gain=+nan, is_weak=False
【INFO】【2025-12-25 20:15:04】【阈值】桶 L1_ever_delay=no_delay（n_val=807，source=val) 使用本地阈值 α=0.5000, β=0.3000
[INFO][BT][2025-12-25 20:15:04] 桶 bucket_id=L1_ever_delay=no_delay level=

  fig.tight_layout()


【INFO】【2025-12-25 20:15:14】[t-SNE] Comparison figure saved to: e:\yan\组\三支决策\机器学习\BT_TWD\results\credit_default_bttwd\tsne_fallback\tsne_fallback_compare.png
t-SNE 嵌入结果保存路径：e:\yan\组\三支决策\机器学习\BT_TWD\results\credit_default_bttwd\tsne_fallback\tsne_fallback_embedding.csv
弱桶对比摘要保存路径：e:\yan\组\三支决策\机器学习\BT_TWD\results\credit_default_bttwd\tsne_fallback\tsne_fallback_summary.csv
可视化图片保存路径：e:\yan\组\三支决策\机器学习\BT_TWD\results\credit_default_bttwd\tsne_fallback\tsne_fallback_compare.png
