weatherAUS 数据集 BT-TWD 可行性实验

本 notebook 按步骤运行：加载配置 → 读取数据 → 预处理 → 桶树划分 → 基线与 BTTWD k 折实验 → 桶级分析。

In [1]:
# 步骤0：环境与路径设置
import os, sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
plt.rcParams['axes.unicode_minus'] = False

# 将项目根目录加入路径，便于导入 bttwdlib
root_path = os.path.abspath(os.path.join(os.getcwd(), '..'))
if root_path not in sys.path:
    sys.path.append(root_path)

from bttwdlib import (
    load_yaml_cfg,
    show_cfg,
    load_dataset,
    prepare_features_and_labels,
    BucketTree,
    run_kfold_experiments,
    log_info,
    set_global_seed,
)

cfg_path = Path(root_path) / "configs" / "weatherAUS_bttwd.yaml"
cfg = load_yaml_cfg(cfg_path)
set_global_seed(cfg.get('SEED', {}).get('global_seed', 42))
log_info('【步骤0摘要】环境准备完毕，路径与随机种子已设置。')

【INFO】【2025-12-26 15:31:06】【配置加载】已读取 e:\yan\组\三支决策\机器学习\BT_TWD\configs\weatherAUS_bttwd.yaml
【INFO】【2025-12-26 15:31:09】【步骤0摘要】环境准备完毕，路径与随机种子已设置。


In [2]:
# 步骤1：加载配置
show_cfg(cfg)
log_info('【步骤1摘要】配置文件加载完成，关键参数检查通过。')

【INFO】【2025-12-26 15:31:11】【配置-数据】数据集=weatherAUS, k折=5, 目标列=RainTomorrow, 正类="Yes"
【INFO】【2025-12-26 15:31:11】【配置-BTTWD】阈值模式=bucket_wise, 全局模型=xgb, 桶内模型=none, 后验估计器(兼容字段)=logreg
【INFO】【2025-12-26 15:31:11】【配置-基线】LogReg启用=True, RandomForest启用=True, KNN启用=True, XGBoost启用=True
【INFO】【2025-12-26 15:31:11】【步骤1摘要】配置文件加载完成，关键参数检查通过。


In [3]:
# 步骤2：加载原始数据
df_raw, target_col_model = load_dataset(cfg)  # 这里返回的是用于建模的标签列，例如 "label"

display(df_raw.head())
print("用于建模的标签列:", target_col_model)

# 1）画 0/1 标签（流失/未流失）的比例
class_counts = df_raw[target_col_model].value_counts(normalize=True)
ax = class_counts.plot(kind='bar', title='流失 vs 未流失比例')
plt.ylabel('比例')

fig_path = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'class_distribution.png')
os.makedirs(os.path.dirname(fig_path), exist_ok=True)
plt.savefig(fig_path, bbox_inches='tight')
plt.close()

# 2）如果想看原始标签列的分布，可以另外单独分析：
raw_target_col = cfg['DATA']['target_col']  # 这里是原始标签列
print("原始目标列:", raw_target_col)
print(df_raw[raw_target_col].describe())

log_info('【步骤2摘要】Telco Churn 原始数据加载与基本统计完成。')


【INFO】【2025-12-26 15:31:15】【数据加载】文本表格 ..\data\weather\weatherAUS.csv 已读取，样本数=145460，列数=23
【INFO】【2025-12-26 15:31:15】【数据加载】3267 条标签无法映射，占比=2.25%，正负类已指定且未开启 dropna_target，已自动删除这些样本
【INFO】【2025-12-26 15:31:15】【数据加载】标签列 RainTomorrow 已处理完成：dropna_target=False, 丢弃样本=3267, 最终样本数=142193, 正类比例=22.42%
【INFO】【2025-12-26 15:31:15】【数据集信息】名称=weatherAUS，样本数=142193，目标列=RainTomorrow，正类比例=22.42%


Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,0
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,0
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,0
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,0
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,0


用于建模的标签列: RainTomorrow
原始目标列: RainTomorrow
count    142193.000000
mean          0.224181
std           0.417043
min           0.000000
25%           0.000000
50%           0.000000
75%           0.000000
max           1.000000
Name: RainTomorrow, dtype: float64
【INFO】【2025-12-26 15:31:15】【步骤2摘要】Telco Churn 原始数据加载与基本统计完成。


In [4]:
# 步骤3：预处理与特征工程
X, y, meta = prepare_features_and_labels(df_raw, cfg)
log_info(f'【预处理】编码特征维度={X.shape[1]}，样本数={X.shape[0]}')
log_info(f"【步骤3摘要】特征预处理完成：连续={len(meta['continuous_cols'])}，类别={len(meta['categorical_cols'])}，编码维度={X.shape[1]}。")

【INFO】【2025-12-26 15:31:19】【预处理】连续特征=16个，类别特征=5个
【INFO】【2025-12-26 15:31:21】【预处理】编码后维度=110
【INFO】【2025-12-26 15:31:21】【预处理】编码特征维度=110，样本数=142193
【INFO】【2025-12-26 15:31:21】【步骤3摘要】特征预处理完成：连续=16，类别=5，编码维度=110。


In [5]:
# 步骤4：构建桶树并检查划分
feature_cols_for_bucket = [c for c in df_raw.columns if c != target_col_model]

bucket_tree = BucketTree(
    cfg['BTTWD']['bucket_levels'],
    feature_names=feature_cols_for_bucket
)

bucket_ids_full = bucket_tree.assign_buckets(df_raw[feature_cols_for_bucket])

bucket_df = bucket_ids_full.value_counts().reset_index()
bucket_df.columns = ['bucket_id', 'count']

bucket_df['pos_rate'] = (
    df_raw.groupby(bucket_ids_full)[target_col_model]
    .apply(lambda s: (s == 1).mean())
    .values
)
display(bucket_df.head())
bucket_df.set_index('bucket_id')['count'].plot(kind='bar', figsize=(12,4), title='桶样本数分布')
fig_bucket = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'bucket_metrics_bar.png')
plt.savefig(fig_bucket, bbox_inches='tight')
plt.close()
log_info(f'【步骤4摘要】桶树划分完成，共有 {bucket_ids_full.nunique()} 个叶子桶。')


【INFO】【2025-12-26 15:31:30】【桶树】列 Humidity3pm 出现未知取值，3610 条记录记为 unknown
【INFO】【2025-12-26 15:31:30】【桶树】已为样本生成桶ID，共 499 个组合


Unnamed: 0,bucket_id,count,pos_rate
0,L1_Location=AliceSprings|L2_RainToday=rain_tod...,2587,0.875
1,L1_Location=Woomera|L2_RainToday=rain_today_no...,2280,0.4
2,L1_Location=Townsville|L2_RainToday=rain_today...,2086,0.428571
3,L1_Location=Cobar|L2_RainToday=rain_today_no|L...,1905,0.727273
4,L1_Location=Mildura|L2_RainToday=rain_today_no...,1896,0.080165


【INFO】【2025-12-26 15:31:46】【步骤4摘要】桶树划分完成，共有 499 个叶子桶。


In [6]:
# 步骤5：运行基线模型 k 折实验
# 基线部分在 run_kfold_experiments 内统一调度
log_info('【步骤5】基线模型将在整体交叉验证中一并运行。')
log_info('【步骤5摘要】基线模型性能将作为后续对比基准。')

【INFO】【2025-12-26 15:31:51】【步骤5】基线模型将在整体交叉验证中一并运行。
【INFO】【2025-12-26 15:31:51】【步骤5摘要】基线模型性能将作为后续对比基准。


In [7]:
import numpy as np
import pandas as pd

print("y 全局标签分布：", np.unique(y, return_counts=True))

print("原始标签列分布：")
print(df_raw[cfg['DATA']['target_col']].value_counts())


y 全局标签分布： (array([0, 1]), array([110316,  31877], dtype=int64))
原始标签列分布：
RainTomorrow
0    110316
1     31877
Name: count, dtype: int64


In [8]:
# 步骤6：运行 BTTWD k 折实验（含基线）
results = run_kfold_experiments(X, y, df_raw.drop(columns=[cfg['DATA']['target_col']]), cfg)
summary_df = pd.read_csv(os.path.join(root_path, cfg['OUTPUT']['results_dir'], 'metrics_kfold_summary.csv'))
display(summary_df)
summary_df.plot(x='model', kind='bar', figsize=(8,4), title='模型指标对比')
fig_compare = os.path.join(root_path, cfg['OUTPUT']['figs_dir'], 'metrics_compare.png')
plt.savefig(fig_compare, bbox_inches='tight')
plt.close()
log_info('【步骤6摘要】BTTWD 与基线的 k 折结果已生成并保存。')

【INFO】【2025-12-26 15:31:51】【基线-LogReg】使用决策阈值=0.400（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-26 15:32:43】【基线-LogReg】整体指标：AUC_mean=0.870, AUC_std=0.002, BAC_mean=0.757, BAC_std=0.002, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.632, F1_std=0.002, Kappa_mean=0.532, Kappa_std=0.003, MCC_mean=0.533, MCC_std=0.003, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.666, Precision_std=0.003, Recall_mean=0.601, Recall_std=0.003, Regret_mean=0.336, Regret_std=0.002
【INFO】【2025-12-26 15:32:43】【基线-RF】使用决策阈值=0.400（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-26 15:34:54】【基线-RF】整体指标：AUC_mean=0.887, AUC_std=0.002, BAC_mean=0.771, BAC_std=0.002, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.655, F1_std=0.004, Kappa_mean=0.562, Kappa_std=0.005, MCC_mean=0.564, MCC_std=0.004, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.691, Precision_std=0.003, Recall_mean=0.623, Recall_std=0.004, Regret_mean=0.316, Regret_std=0.003
【INFO】【2025-12-26 15:34:54】【基线-KNN】使用决策阈值=0.400（fixed 模式）


  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-26 15:35:28】【基线-KNN】整体指标：AUC_mean=0.854, AUC_std=0.002, BAC_mean=0.762, BAC_std=0.004, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.632, F1_std=0.006, Kappa_mean=0.526, Kappa_std=0.007, MCC_mean=0.526, MCC_std=0.007, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.634, Precision_std=0.006, Recall_mean=0.630, Recall_std=0.007, Regret_mean=0.330, Regret_std=0.005
【INFO】【2025-12-26 15:35:28】【基线-XGB】使用决策阈值=0.400（fixed 模式）


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

  summary[f"{col}_mean"] = float(np.nanmean(arr))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


【INFO】【2025-12-26 15:35:42】【基线-XGB】整体指标：AUC_mean=0.889, AUC_std=0.003, BAC_mean=0.774, BAC_std=0.003, BND_ratio_mean=0.000, BND_ratio_std=0.000, F1_mean=0.659, F1_std=0.005, Kappa_mean=0.567, Kappa_std=0.007, MCC_mean=0.568, MCC_std=0.007, POS_Coverage_mean=nan, POS_Coverage_std=nan, Precision_mean=0.694, Precision_std=0.005, Recall_mean=0.627, Recall_std=0.006, Regret_mean=0.313, Regret_std=0.005
【INFO】【2025-12-26 15:35:42】【K折实验】正在执行第 1/5 折...
【INFO】【2025-12-26 15:35:42】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5
【INFO】【2025-12-26 15:35:42】【桶树】列 Humidity3pm 出现未知取值，2904 条记录记为 unknown


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:35:45】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:35:46] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=113754
[INFO][BT][2025-12-26 15:35:46] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=2443
[INFO][BT][2025-12-26 15:35:46] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=2415
[INFO][BT][2025-12-26 15:35:46] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=2437
[INFO][BT][2025-12-26 15:35:46] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=2407
[INFO][BT][2025-12-26 15:35:46] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:35:57】【阈值】桶 ROOT（n_val=6805，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:35:57] 桶 bucket_id=ROOT level=0：
    n_train=15905, n_val=6805,
    BAC=0.706, F1=0.662, AUC=0.889,
    Regret=0.285, BND_ratio=0.149, POS_coverage=0.208,
    Score(f1_regret_bnd )=0.303，threshold_source=val
【INFO】【2025-12-26 15:35:57】【阈值】桶 L1_Location=Adelaide（n_val=290，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:35:57] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=376, n_val=290,
    BAC=0.742, F1=0.730, AUC=0.895,
    Regret=0.303, BND_ratio=0.128, POS_coverage=0.279,
    Score(f1_regret_bnd )=0.363，threshold_source=val
[INFO][BT][2025-12-26 15:35:57] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=0.303, bucket_Score=0.363,
    Gain=+0.060, is_weak=False
【INFO】【2025-12-26 15:35:57】【阈值】桶 L1_Location=Albany 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:35:57] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=351, n_val=307,
 

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:36:58】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-26 15:36:58】[BASELINE] 阈值搜索开始
【INFO】【2025-12-26 15:37:00】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.2794
【INFO】【2025-12-26 15:37:00】【桶树】列 Humidity3pm 出现未知取值，706 条记录记为 unknown
【INFO】【2025-12-26 15:37:00】【桶树】已为样本生成桶ID，共 392 个组合
【INFO】【2025-12-26 15:37:00】[BASELINE] 测试集桶映射完成，共 392 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-26 15:37:18】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_low: BAC=0.5364, Regret=0.1823, Precision=0.3333, Recall=0.0833
【INFO】【2025-12-26 15:37:18】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_mid: BAC=0.7333, Regret=0.4155, Precision=0.5833, Recall=0.6667
【INFO】【2025-12-26 15:37:18】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_low: BAC=0.5000, Regret=0.2857, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-26 15:37:18】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_high: BAC=0.7247, Regret=0.3621, Precision=0.8125, Recall=0.7222
【INFO】【2025-12-26 15:37:18】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_mid: BAC=0.6790, Regret=0.3643, Precision=0.9412, Recall=0.3636
【INFO】【2025-12-26 15:37:18】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_high: BAC=0.7023, Regret=0.3065, Precision=0.7

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-26 15:37:19】【K折实验】正在执行第 2/5 折...
【INFO】【2025-12-26 15:37:19】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5
【INFO】【2025-12-26 15:37:19】【桶树】列 Humidity3pm 出现未知取值，2880 条记录记为 unknown


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:37:22】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:37:23] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=113754
[INFO][BT][2025-12-26 15:37:23] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=2468
[INFO][BT][2025-12-26 15:37:23] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=2392
[INFO][BT][2025-12-26 15:37:23] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=2386
[INFO][BT][2025-12-26 15:37:23] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=2422
[INFO][BT][2025-12-26 15:37:23] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:37:34】【阈值】桶 ROOT（n_val=6806，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:37:34] 桶 bucket_id=ROOT level=0：
    n_train=15906, n_val=6806,
    BAC=0.701, F1=0.651, AUC=0.889,
    Regret=0.284, BND_ratio=0.149, POS_coverage=0.207,
    Score(f1_regret_bnd )=0.292，threshold_source=val
【INFO】【2025-12-26 15:37:34】【阈值】桶 L1_Location=Adelaide（n_val=305，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:37:34] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=374, n_val=305,
    BAC=0.735, F1=0.731, AUC=0.890,
    Regret=0.282, BND_ratio=0.148, POS_coverage=0.275,
    Score(f1_regret_bnd )=0.376，threshold_source=val
[INFO][BT][2025-12-26 15:37:34] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=0.292, bucket_Score=0.376,
    Gain=+0.083, is_weak=False
【INFO】【2025-12-26 15:37:34】【阈值】桶 L1_Location=Albany 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:37:34] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=350, n_val=313,
 

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:38:48】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-26 15:38:48】[BASELINE] 阈值搜索开始
【INFO】【2025-12-26 15:38:49】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.2781
【INFO】【2025-12-26 15:38:49】【桶树】列 Humidity3pm 出现未知取值，730 条记录记为 unknown
【INFO】【2025-12-26 15:38:49】【桶树】已为样本生成桶ID，共 401 个组合
【INFO】【2025-12-26 15:38:49】[BASELINE] 测试集桶映射完成，共 401 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-26 15:39:08】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_low: BAC=0.5883, Regret=0.1189, Precision=0.6667, Recall=0.1818
【INFO】【2025-12-26 15:39:08】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_high: BAC=0.5855, Regret=0.3043, Precision=0.8537, Recall=0.9211
【INFO】【2025-12-26 15:39:08】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_mid: BAC=0.8605, Regret=0.2836, Precision=0.8077, Recall=0.8400
【INFO】【2025-12-26 15:39:08】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_low: BAC=0.5000, Regret=0.1071, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-26 15:39:08】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_mid: BAC=0.6450, Regret=0.2974, Precision=0.7059, Recall=0.3158
【INFO】【2025-12-26 15:39:08】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_high: BAC=0.7252, Regret=0.4914, Precision=0.7



【INFO】【2025-12-26 15:39:10】【K折实验】正在执行第 3/5 折...
【INFO】【2025-12-26 15:39:10】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5
【INFO】【2025-12-26 15:39:10】【桶树】列 Humidity3pm 出现未知取值，2904 条记录记为 unknown


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:39:12】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:39:13] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=113754
[INFO][BT][2025-12-26 15:39:13] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=2493
[INFO][BT][2025-12-26 15:39:13] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=2422
[INFO][BT][2025-12-26 15:39:13] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=2386
[INFO][BT][2025-12-26 15:39:13] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=2451
[INFO][BT][2025-12-26 15:39:13] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:39:23】【阈值】桶 ROOT（n_val=6804，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:39:23] 桶 bucket_id=ROOT level=0：
    n_train=15907, n_val=6804,
    BAC=0.711, F1=0.665, AUC=0.891,
    Regret=0.271, BND_ratio=0.149, POS_coverage=0.208,
    Score(f1_regret_bnd )=0.319，threshold_source=val
【INFO】【2025-12-26 15:39:23】【阈值】桶 L1_Location=Adelaide 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:39:23] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=382, n_val=300,
    BAC=0.606, F1=0.720, AUC=0.886,
    Regret=0.328, BND_ratio=0.340, POS_coverage=0.243,
    Score(f1_regret_bnd )=0.221，threshold_source=val
[INFO][BT][2025-12-26 15:39:23] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=0.319, bucket_Score=0.221,
    Gain=-0.098, is_weak=True
【INFO】【2025-12-26 15:39:23】【阈值】桶 L1_Location=Albany 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:39:23] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=351, n_val=307,
    BAC=0.587, F1=0.721, AU

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:40:18】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-26 15:40:18】[BASELINE] 阈值搜索开始
【INFO】【2025-12-26 15:40:19】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.2754
【INFO】【2025-12-26 15:40:19】【桶树】列 Humidity3pm 出现未知取值，706 条记录记为 unknown
【INFO】【2025-12-26 15:40:19】【桶树】已为样本生成桶ID，共 398 个组合
【INFO】【2025-12-26 15:40:19】[BASELINE] 测试集桶映射完成，共 398 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-26 15:40:37】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_low: BAC=0.6474, Regret=0.1189, Precision=0.5714, Recall=0.3077
【INFO】【2025-12-26 15:40:37】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_low: BAC=1.0000, Regret=0.0000, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-26 15:40:37】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_high: BAC=0.6538, Regret=0.2386, Precision=0.7750, Recall=1.0000
【INFO】【2025-12-26 15:40:37】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_high: BAC=0.6738, Regret=0.3974, Precision=0.6923, Recall=0.8182
【INFO】【2025-12-26 15:40:37】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_mid: BAC=0.7435, Regret=0.5462, Precision=0.6000, Recall=0.7143
【INFO】【2025-12-26 15:40:37】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_mid: BAC=0.6718, Regret=0.2220, Precision=0.6

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:40:42】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:40:43] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=113755
[INFO][BT][2025-12-26 15:40:43] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=2464
[INFO][BT][2025-12-26 15:40:43] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=2407
[INFO][BT][2025-12-26 15:40:43] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=2426
[INFO][BT][2025-12-26 15:40:43] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=2429
[INFO][BT][2025-12-26 15:40:43] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:40:51】【阈值】桶 ROOT（n_val=6804，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:40:51] 桶 bucket_id=ROOT level=0：
    n_train=15905, n_val=6804,
    BAC=0.707, F1=0.659, AUC=0.885,
    Regret=0.280, BND_ratio=0.146, POS_coverage=0.206,
    Score(f1_regret_bnd )=0.305，threshold_source=val
【INFO】【2025-12-26 15:40:51】【阈值】桶 L1_Location=Adelaide 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:40:51] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=379, n_val=271,
    BAC=0.640, F1=0.706, AUC=0.878,
    Regret=0.358, BND_ratio=0.251, POS_coverage=0.321,
    Score(f1_regret_bnd )=0.222，threshold_source=val
[INFO][BT][2025-12-26 15:40:51] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=0.305, bucket_Score=0.222,
    Gain=-0.083, is_weak=True
【INFO】【2025-12-26 15:40:51】【阈值】桶 L1_Location=Albany 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:40:51] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=352, n_val=304,
    BAC=0.525, F1=0.741, AU

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:41:44】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-26 15:41:44】[BASELINE] 阈值搜索开始
【INFO】【2025-12-26 15:41:44】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.2814
【INFO】【2025-12-26 15:41:44】【桶树】列 Humidity3pm 出现未知取值，753 条记录记为 unknown
【INFO】【2025-12-26 15:41:44】【桶树】已为样本生成桶ID，共 396 个组合
【INFO】【2025-12-26 15:41:44】[BASELINE] 测试集桶映射完成，共 396 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-26 15:41:59】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_low: BAC=0.5859, Regret=0.1226, Precision=0.5000, Recall=0.1818
【INFO】【2025-12-26 15:41:59】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=OTHER|L3_Humidity3pm=hum_low: BAC=1.0000, Regret=0.0000, Precision=0.0000, Recall=0.0000
【INFO】【2025-12-26 15:41:59】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_mid: BAC=0.6935, Regret=0.4375, Precision=0.5417, Recall=0.6842
【INFO】【2025-12-26 15:41:59】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_mid: BAC=0.6077, Regret=0.2692, Precision=0.5833, Recall=0.2414
【INFO】【2025-12-26 15:41:59】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_high: BAC=0.6420, Regret=0.2843, Precision=0.7556, Recall=0.9714
【INFO】【2025-12-26 15:41:59】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_high: BAC=0.7981, Regret=0.2414, Precision=0.8571, Reca

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-26 15:42:01】【K折实验】正在执行第 5/5 折...
【INFO】【2025-12-26 15:42:01】[BT] 使用桶评分配置：mode=f1_regret_bnd, f1_weight=1.0, regret_weight=1.0, bnd_weight=0.5
【INFO】【2025-12-26 15:42:01】【桶树】列 Humidity3pm 出现未知取值，2895 条记录记为 unknown


Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:42:03】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:42:04] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=113755
[INFO][BT][2025-12-26 15:42:04] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=2492
[INFO][BT][2025-12-26 15:42:04] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=2428
[INFO][BT][2025-12-26 15:42:04] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=2409
[INFO][BT][2025-12-26 15:42:04] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=2415
[INFO][BT][2025-12-26 15:42:04] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:42:13】【阈值】桶 ROOT（n_val=6808，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:42:13] 桶 bucket_id=ROOT level=0：
    n_train=15908, n_val=6808,
    BAC=0.706, F1=0.659, AUC=0.886,
    Regret=0.279, BND_ratio=0.148, POS_coverage=0.202,
    Score(f1_regret_bnd )=0.306，threshold_source=val
【INFO】【2025-12-26 15:42:13】【阈值】桶 L1_Location=Adelaide 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:42:13] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=380, n_val=284,
    BAC=0.650, F1=0.726, AUC=0.879,
    Regret=0.308, BND_ratio=0.278, POS_coverage=0.246,
    Score(f1_regret_bnd )=0.279，threshold_source=val
[INFO][BT][2025-12-26 15:42:13] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=0.306, bucket_Score=0.279,
    Gain=-0.027, is_weak=True
【INFO】【2025-12-26 15:42:13】【阈值】桶 L1_Location=Albany（n_val=338，source=val) 使用本地阈值 α=0.4000, β=0.3000
[INFO][BT][2025-12-26 15:42:13] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=350, n_val=338,
  

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:43:03】[BASELINE] 全局 XGB 模型训练完成
【INFO】【2025-12-26 15:43:03】[BASELINE] 阈值搜索开始
【INFO】【2025-12-26 15:43:04】[BASELINE] 最佳阈值找到: alpha=0.4000, beta=0.2000, regret=0.2815
【INFO】【2025-12-26 15:43:04】【桶树】列 Humidity3pm 出现未知取值，715 条记录记为 unknown
【INFO】【2025-12-26 15:43:04】【桶树】已为样本生成桶ID，共 405 个组合
【INFO】【2025-12-26 15:43:04】[BASELINE] 测试集桶映射完成，共 405 个桶


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expecte

【INFO】【2025-12-26 15:43:16】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_low: BAC=0.5909, Regret=0.2085, Precision=1.0000, Recall=0.1818
【INFO】【2025-12-26 15:43:16】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_high: BAC=0.8373, Regret=0.2439, Precision=0.8889, Recall=0.8889
【INFO】【2025-12-26 15:43:16】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_low: BAC=0.7500, Regret=0.1667, Precision=1.0000, Recall=0.5000
【INFO】【2025-12-26 15:43:16】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_mid: BAC=0.7484, Regret=0.4058, Precision=0.5385, Recall=0.7368
【INFO】【2025-12-26 15:43:16】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_no|L3_Humidity3pm=hum_mid: BAC=0.6685, Regret=0.2372, Precision=0.5263, Recall=0.3846
【INFO】【2025-12-26 15:43:16】[BASELINE] 桶 L1_Location=Albury|L2_RainToday=rain_today_yes|L3_Humidity3pm=hum_high: BAC=0.6250, Regret=0.2386, Precision=0.7

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


【INFO】【2025-12-26 15:43:17】【K折实验】所有结果已写入 results 目录


Unnamed: 0,model,Precision_mean,Precision_std,Recall_mean,Recall_std,F1_mean,F1_std,BAC_mean,BAC_std,AUC_mean,...,MCC_mean,MCC_std,Kappa_mean,Kappa_std,BND_ratio_mean,BND_ratio_std,POS_Coverage_mean,POS_Coverage_std,Regret_mean,Regret_std
0,BTTWD,0.574069,0.002479,0.768517,0.004115,0.657205,0.002187,0.801872,0.001805,0.889015,...,0.549386,0.003011,0.538852,0.002955,0.135239,0.004946,0.200551,0.004158,0.284072,0.003489
1,LogReg,0.665849,0.002818,0.600527,0.002834,0.6315,0.002434,0.756721,0.00155,0.870079,...,0.533141,0.003066,0.531994,0.003063,0.0,0.0,,,0.336226,0.00214
2,RandomForest,0.690969,0.003333,0.623428,0.004313,0.65546,0.003665,0.77143,0.002397,0.886615,...,0.563605,0.004485,0.562401,0.004523,0.0,0.0,,,0.315768,0.003289
3,KNN,0.634113,0.005525,0.629765,0.006692,0.63192,0.005507,0.762379,0.003693,0.85397,...,0.526046,0.006964,0.526033,0.006969,0.0,0.0,,,0.330466,0.00508
4,XGBoost,0.694424,0.005095,0.627286,0.006072,0.659145,0.005363,0.773762,0.00346,0.889144,...,0.568197,0.006625,0.56701,0.006664,0.0,0.0,,,0.312547,0.004757


【INFO】【2025-12-26 15:43:19】【步骤6摘要】BTTWD 与基线的 k 折结果已生成并保存。


In [9]:
# 步骤7：桶级别分析
bucket_metrics_path = os.path.join(root_path, cfg['OUTPUT']['results_dir'], 'bucket_metrics.csv')
if os.path.exists(bucket_metrics_path):
    bucket_metrics_df = pd.read_csv(bucket_metrics_path)
    display(bucket_metrics_df.head())
    bucket_metrics_df.plot(x='bucket_id', y='pos_rate_all', kind='bar', figsize=(12,4), title='桶正类比例')
    plt.ylabel('正类比例')
    plt.xticks(rotation=90)
    plt.tight_layout()
    plt.savefig(fig_bucket, bbox_inches='tight')
    plt.close()
log_info('【步骤7摘要】桶级指标已整理，可用于局部化分析。')

Unnamed: 0,bucket_id,layer,parent_bucket_id,n_train,n_val,pos_rate_train,pos_rate_val,alpha,beta,regret_val,...,use_gain_weak_backoff,threshold_data_source,parent_with_threshold,n_test,pos_rate_test,BND_ratio_test,POS_Coverage_test,regret_test,fold,pos_rate
0,ROOT,L1,,15905,6805,0.223515,0.230419,0.4,0.2,0.285231,...,True,val,,,,,,,1,0.224186
1,L1_Location=Canberra,L1,ROOT,395,281,0.174684,0.24911,0.4,0.2,0.316726,...,True,val,,,,,,,1,0.190182
2,L1_Location=Sydney,L1,ROOT,377,342,0.265252,0.426901,0.4,0.2,0.326023,...,True,val,,,,,,,1,0.257356
3,L1_Location=Brisbane,L1,ROOT,367,310,0.220708,0.290323,0.4,0.2,0.343548,...,True,val,,,,,,,1,0.222743
4,L1_Location=Darwin,L1,ROOT,357,297,0.257703,0.444444,0.4,0.2,0.271044,...,True,val,,,,,,,1,0.26785


  plt.tight_layout()


【INFO】【2025-12-26 15:43:36】【步骤7摘要】桶级指标已整理，可用于局部化分析。


In [10]:
# 步骤8：结果汇总
log_info('【步骤8】检查结果文件与图表。')
print(os.listdir(os.path.join(root_path, cfg['OUTPUT']['results_dir'])))
print(os.listdir(os.path.join(root_path, cfg['OUTPUT']['figs_dir'])))
log_info('【全部步骤完成】Telco Churn 数据集上的 BT-TWD 实验结束。')

【INFO】【2025-12-26 15:43:36】【步骤8】检查结果文件与图表。
['baseline_bucket_metrics.csv', 'bucket_fallback_stats.csv', 'bucket_metrics.csv', 'bucket_metrics_gain.csv', 'bucket_metrics_gain_test_per_fold.csv', 'bucket_thresholds.csv', 'bucket_thresholds_per_fold.csv', 'bucket_tree_structure.csv', 'fallback_off_tsne', 'fallback_on_tsne', 'metrics_kfold_per_fold.csv', 'metrics_kfold_summary.csv', 'metrics_overview.csv', 'per_sample_test_predictions.csv', 'tsne_fallback']
['bank_class_distribution.png', 'bucket_metrics_bar.png', 'class_distribution.png', 'metrics_compare.png']
【INFO】【2025-12-26 15:43:37】【全部步骤完成】Telco Churn 数据集上的 BT-TWD 实验结束。


In [11]:
# 步骤9：t-SNE 可视化弱桶对比
from bttwdlib import visualize_fallback_with_tsne
from IPython.display import Image

# 调用 t-SNE 可视化函数，参数从 YAML 配置中读取
tsne_output_dir = os.path.join(root_path, cfg["OUTPUT"]["results_dir"], "tsne_fallback")
os.makedirs(tsne_output_dir, exist_ok=True)
results = visualize_fallback_with_tsne(config_path=cfg_path, output_dir=tsne_output_dir)

# 显示保存的图片
Image(filename=results["figure_path"])

# 输出其他结果路径
print(f"t-SNE 嵌入结果保存路径：{results['embedding_path']}")
print(f"弱桶对比摘要保存路径：{results['summary_path']}")
print(f"可视化图片保存路径：{results['figure_path']}")


【INFO】【2025-12-26 15:43:37】【配置加载】已读取 e:\yan\组\三支决策\机器学习\BT_TWD\configs\weatherAUS_bttwd.yaml
【INFO】【2025-12-26 15:43:37】【数据加载】文本表格 ..\data\weather\weatherAUS.csv 已读取，样本数=145460，列数=23
【INFO】【2025-12-26 15:43:37】【数据加载】3267 条标签无法映射，占比=2.25%，正负类已指定且未开启 dropna_target，已自动删除这些样本
【INFO】【2025-12-26 15:43:37】【数据加载】标签列 RainTomorrow 已处理完成：dropna_target=False, 丢弃样本=3267, 最终样本数=142193, 正类比例=22.42%
【INFO】【2025-12-26 15:43:37】【数据集信息】名称=weatherAUS，样本数=142193，目标列=RainTomorrow，正类比例=22.42%
【INFO】【2025-12-26 15:43:37】【预处理】连续特征=16个，类别特征=5个
【INFO】【2025-12-26 15:43:38】【预处理】编码后维度=110
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 142193 samples in 0.036s...
[t-SNE] Computed neighbors for 142193 samples in 33.020s...
[t-SNE] Computed conditional probabilities for sample 1000 / 142193
[t-SNE] Computed conditional probabilities for sample 2000 / 142193
[t-SNE] Computed conditional probabilities for sample 3000 / 142193
[t-SNE] Computed conditional probabilities for sample 4000 / 142193
[t-SNE] Compute

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:53:17】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:53:17] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=142193
[INFO][BT][2025-12-26 15:53:17] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=3090
[INFO][BT][2025-12-26 15:53:17] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=3016
[INFO][BT][2025-12-26 15:53:17] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=3011
[INFO][BT][2025-12-26 15:53:17] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=3031
[INFO][BT][2025-12-26 15:53:17] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:53:25】【阈值】桶 ROOT（n_val=8509，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:53:25] 桶 bucket_id=ROOT level=0：
    n_train=19886, n_val=8509,
    BAC=0.705, F1=0.659, AUC=0.891,
    Regret=0.281, BND_ratio=0.148, POS_coverage=0.206,
    Score(f1_regret_bnd )=0.304，threshold_source=val
【INFO】【2025-12-26 15:53:25】【阈值】桶 L1_Location=Adelaide 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:53:25] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=475, n_val=352,
    BAC=0.629, F1=0.744, AUC=0.901,
    Regret=0.310, BND_ratio=0.321, POS_coverage=0.293,
    Score(f1_regret_bnd )=0.274，threshold_source=val
[INFO][BT][2025-12-26 15:53:25] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=0.304, bucket_Score=0.274,
    Gain=-0.030, is_weak=True
【INFO】【2025-12-26 15:53:26】【阈值】桶 L1_Location=Albany 标记为弱桶，阈值将回退使用 ROOT 的阈值
[INFO][BT][2025-12-26 15:53:26] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=425, n_val=422,
    BAC=0.613, F1=0.703, AU

Parameters: { "use_label_encoder" } are not used.



【INFO】【2025-12-26 15:54:36】【BTTWD】全局模型训练完成，用于兜底预测
[INFO][BT][2025-12-26 15:54:37] 创建桶 bucket_id=ROOT，level=0，parent_id=ROOT，split_name=ROOT，split_type=ROOT，split_rule="all"，n_samples=142193
[INFO][BT][2025-12-26 15:54:37] 创建桶 bucket_id=L1_Location=Adelaide，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Adelaide"，n_samples=3090
[INFO][BT][2025-12-26 15:54:37] 创建桶 bucket_id=L1_Location=Albany，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albany"，n_samples=3016
[INFO][BT][2025-12-26 15:54:37] 创建桶 bucket_id=L1_Location=Albury，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="Albury"，n_samples=3011
[INFO][BT][2025-12-26 15:54:37] 创建桶 bucket_id=L1_Location=AliceSprings，level=1，parent_id=ROOT，split_name=L1_Location，split_type=category_group，split_rule="AliceSprings"，n_samples=3031
[INFO][BT][2025-12-26 15:54:37] 创建桶 bucket_id=L1_Location=BadgerysCreek，level=1，parent_id=ROOT，split_name=



【INFO】【2025-12-26 15:54:45】【阈值】桶 ROOT（n_val=8509，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:54:45] 桶 bucket_id=ROOT level=0：
    n_train=19886, n_val=8509,
    BAC=0.705, F1=0.659, AUC=0.891,
    Regret=0.281, BND_ratio=0.148, POS_coverage=0.206,
    Score(f1_regret_bnd )=nan，threshold_source=val
【INFO】【2025-12-26 15:54:45】【阈值】桶 L1_Location=Adelaide（n_val=352，source=val) 使用本地阈值 α=0.4000, β=0.1000
[INFO][BT][2025-12-26 15:54:45] 桶 bucket_id=L1_Location=Adelaide level=1：
    n_train=475, n_val=352,
    BAC=0.629, F1=0.744, AUC=0.901,
    Regret=0.310, BND_ratio=0.321, POS_coverage=0.293,
    Score(f1_regret_bnd )=nan，threshold_source=val
[INFO][BT][2025-12-26 15:54:45] 桶 bucket_id=L1_Location=Adelaide：
    parent_id=ROOT，parent_Score=nan, bucket_Score=nan,
    Gain=+nan, is_weak=False
【INFO】【2025-12-26 15:54:45】【阈值】桶 L1_Location=Albany（n_val=422，source=val) 使用本地阈值 α=0.4000, β=0.2000
[INFO][BT][2025-12-26 15:54:45] 桶 bucket_id=L1_Location=Albany level=1：
    n_train=42

  fig.tight_layout()


【INFO】【2025-12-26 15:55:59】[t-SNE] Comparison figure saved to: e:\yan\组\三支决策\机器学习\BT_TWD\results\weatherAUS_bttwd\tsne_fallback\tsne_fallback_compare.png
t-SNE 嵌入结果保存路径：e:\yan\组\三支决策\机器学习\BT_TWD\results\weatherAUS_bttwd\tsne_fallback\tsne_fallback_embedding.csv
弱桶对比摘要保存路径：e:\yan\组\三支决策\机器学习\BT_TWD\results\weatherAUS_bttwd\tsne_fallback\tsne_fallback_summary.csv
可视化图片保存路径：e:\yan\组\三支决策\机器学习\BT_TWD\results\weatherAUS_bttwd\tsne_fallback\tsne_fallback_compare.png
