# exp19: Optunaによるハイパーパラメータ再最適化

**ベースライン**: exp16 (MAE=48.98, 最良スコア)

**背景**:
- exp16が50特徴量で最良スコア（WeightedEnsemble_A MAE=48.98）
- しかし、exp05のOptunaパラメータは古い特徴量セット（exp05時点）で最適化されたもの
- exp16の特徴量セット（レジーム変化 + acc_get特徴量）に対して再最適化が必要

**最適化対象**:
1. **Ridge** - alphaの再最適化
2. **ExtraTrees** - n_estimators, max_depth, min_samples等
3. **HistGradientBoosting** - learning_rate, max_depth, l2_regularization等
4. **CatBoost** - iterations, learning_rate, depth, l2_leaf_reg等
5. **WeightedEnsemble_A** - 4モデルの最適な重み（Validationセットで最適化）

**最適化戦略**:
- Train/Validation/Testの3分割を使用
- Validationセットでハイパーパラメータを最適化
- Testセットで最終評価
- 各モデル100trials実行

**期待効果**:
- exp16 (MAE=48.98) からさらに2-5%の改善を期待

In [1]:
import pandas as pd
import numpy as np
from datetime import timedelta, datetime
import warnings
warnings.filterwarnings('ignore')

import optuna
from sklearn.linear_model import Ridge
from sklearn.ensemble import ExtraTreesRegressor, HistGradientBoostingRegressor
from catboost import CatBoostRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from scipy.optimize import minimize

In [2]:
# ==================================================================================
# データの読み込みと特徴量作成（exp16と同じ）
# ==================================================================================

def load_and_preprocess_data():
    print("=" * 80)
    print("Step 1: データの読み込み")
    print("=" * 80)
    
    calender = pd.read_csv('../input/calender_data.csv')
    cm_data = pd.read_csv('../input/cm_data.csv')
    gt_service = pd.read_csv('../input/gt_service_name.csv')
    acc_get = pd.read_csv('../input/regi_acc_get_data_transform.csv')
    call_data = pd.read_csv('../input/regi_call_data_transform.csv')
    
    calender['cdr_date'] = pd.to_datetime(calender['cdr_date'])
    cm_data['cdr_date'] = pd.to_datetime(cm_data['cdr_date'])
    acc_get['cdr_date'] = pd.to_datetime(acc_get['cdr_date'])
    call_data['cdr_date'] = pd.to_datetime(call_data['cdr_date'])
    gt_service['week'] = pd.to_datetime(gt_service['week'])
    
    print(f"\nデータ期間: {call_data['cdr_date'].min()} ~ {call_data['cdr_date'].max()}")
    
    return calender, cm_data, gt_service, acc_get, call_data

def merge_datasets(calender, cm_data, gt_service, acc_get, call_data):
    df = call_data.copy()
    df = df.merge(calender, on='cdr_date', how='left')
    df = df.merge(cm_data, on='cdr_date', how='left')
    df = df.merge(acc_get, on='cdr_date', how='left')
    
    gt_service_daily = []
    for idx, row in gt_service.iterrows():
        week_start = row['week']
        for i in range(7):
            date = week_start + timedelta(days=i)
            gt_service_daily.append({'cdr_date': date, 'search_cnt': row['search_cnt']})
    
    gt_daily = pd.DataFrame(gt_service_daily)
    df = df.merge(gt_daily, on='cdr_date', how='left')
    
    return df

def create_all_features(df):
    """exp16と同じ特徴量を作成"""
    df = df.copy()
    
    # 基本時系列特徴量
    df['year'] = df['cdr_date'].dt.year
    df['month'] = df['cdr_date'].dt.month
    df['day_of_month'] = df['cdr_date'].dt.day
    df['quarter'] = df['cdr_date'].dt.quarter
    df['day_of_year'] = df['cdr_date'].dt.dayofyear
    df['week_of_year'] = df['cdr_date'].dt.isocalendar().week
    df['days_from_start'] = (df['cdr_date'] - df['cdr_date'].min()).dt.days
    df['is_month_start'] = (df['day_of_month'] <= 5).astype(int)
    df['is_month_end'] = (df['day_of_month'] >= 25).astype(int)
    
    # ラグ特徴量
    for lag in [1, 2, 3, 5, 7, 14, 30]:
        df[f'lag_{lag}'] = df['call_num'].shift(lag)
    
    # 移動平均特徴量
    for window in [3, 7, 14, 30]:
        df[f'ma_{window}'] = df['call_num'].shift(1).rolling(window=window, min_periods=1).mean()
        df[f'ma_std_{window}'] = df['call_num'].shift(1).rolling(window=window, min_periods=1).std()
    
    # 集約特徴量
    df['cm_7d'] = df['cm_flg'].shift(1).rolling(window=7, min_periods=1).sum()
    df['gt_ma_7'] = df['search_cnt'].shift(1).rolling(window=7, min_periods=1).mean()
    df['acc_ma_7'] = df['acc_get_cnt'].shift(1).rolling(window=7, min_periods=1).mean()
    
    df['dow_avg'] = np.nan
    for dow in df['dow'].unique():
        mask = df['dow'] == dow
        df.loc[mask, 'dow_avg'] = df.loc[mask, 'call_num'].shift(1).expanding().mean()
    
    # acc_get特徴量（exp16）
    df['acc_get_lag7'] = df['acc_get_cnt'].shift(7)
    df['acc_get_sum_14d'] = df['acc_get_cnt'].shift(1).rolling(window=14, min_periods=1).sum()
    
    # レジーム変化特徴量（exp15）
    tax_date = pd.Timestamp('2019-10-01')
    rush_date = pd.Timestamp('2019-09-30')
    
    df['days_to_2019_10_01'] = (tax_date - df['cdr_date']).dt.days
    df['is_pre_2019_10_01'] = (df['cdr_date'] < tax_date).astype(int)
    df['is_post_2019_10_01'] = (df['cdr_date'] >= tax_date).astype(int)
    df['days_to_2019_09_30'] = (rush_date - df['cdr_date']).dt.days
    df['is_pre_2019_09_30'] = (df['cdr_date'] < rush_date).astype(int)
    df['is_post_2019_09_30'] = (df['cdr_date'] >= rush_date).astype(int)
    
    rush_start = rush_date - pd.Timedelta(days=90)
    df['is_rush_period'] = ((df['cdr_date'] >= rush_start) & (df['cdr_date'] <= rush_date)).astype(int)
    
    adaptation_end = tax_date + pd.Timedelta(days=30)
    df['is_adaptation_period'] = ((df['cdr_date'] >= tax_date) & (df['cdr_date'] <= adaptation_end)).astype(int)
    
    return df

print("データ読み込み・特徴量作成関数を定義しました")

データ読み込み・特徴量作成関数を定義しました


In [3]:
# ==================================================================================
# データ準備
# ==================================================================================

print("\n" + "*" * 80)
print("exp19: Optunaによるハイパーパラメータ再最適化")
print("*" * 80)

calender, cm_data, gt_service, acc_get, call_data = load_and_preprocess_data()
df = merge_datasets(calender, cm_data, gt_service, acc_get, call_data)
df = create_all_features(df)

# 翌日の入電数を目的変数にする
df['target_next_day'] = df['call_num'].shift(-1)
df = df.dropna(subset=['target_next_day']).reset_index(drop=True)

# 平日のみ
df_model = df[df['dow'].isin([1, 2, 3, 4, 5])].copy().reset_index(drop=True)

print(f"\n平日データ数: {len(df_model)}行")
print(f"期間: {df_model['cdr_date'].min()} ~ {df_model['cdr_date'].max()}")


********************************************************************************
exp19: Optunaによるハイパーパラメータ再最適化
********************************************************************************
Step 1: データの読み込み

データ期間: 2018-06-01 00:00:00 ~ 2020-03-31 00:00:00

平日データ数: 477行
期間: 2018-06-01 00:00:00 ~ 2020-03-30 00:00:00


In [4]:
# ==================================================================================
# Train/Validation/Test 分割
# ==================================================================================

# exp16と同じ特徴量リスト（50個）
feature_cols = [
    'dow', 'day_of_month', 'month', 'quarter', 'year', 
    'days_from_start', 'day_of_year', 'week_of_year',
    'is_month_start', 'is_month_end',
    'woy', 'wom', 'day_before_holiday_flag',
    'cm_flg', 'acc_get_cnt', 'search_cnt',
    'cm_7d', 'gt_ma_7', 'acc_ma_7', 'dow_avg',
    'lag_1', 'lag_2', 'lag_3', 'lag_5', 'lag_7', 'lag_14', 'lag_30',
    'ma_3', 'ma_7', 'ma_14', 'ma_30',
    'ma_std_3', 'ma_std_7', 'ma_std_14', 'ma_std_30',
    'days_to_2019_10_01', 'is_pre_2019_10_01', 'is_post_2019_10_01',
    'days_to_2019_09_30', 'is_pre_2019_09_30', 'is_post_2019_09_30',
    'is_rush_period', 'is_adaptation_period',
    'acc_get_lag7', 'acc_get_sum_14d'
]

# 欠損値を除去
df_clean = df_model.dropna(subset=feature_cols + ['target_next_day']).copy()

# Train/Val/Test分割（時系列考慮）
max_date = df_clean['cdr_date'].max()
test_months = 2
val_months = 2

test_start = max_date - pd.Timedelta(days=30*test_months)
val_start = test_start - pd.Timedelta(days=30*val_months)

train_df = df_clean[df_clean['cdr_date'] < val_start].copy()
val_df = df_clean[(df_clean['cdr_date'] >= val_start) & (df_clean['cdr_date'] < test_start)].copy()
test_df = df_clean[df_clean['cdr_date'] >= test_start].copy()

X_train = train_df[feature_cols]
y_train = train_df['target_next_day']
X_val = val_df[feature_cols]
y_val = val_df['target_next_day']
X_test = test_df[feature_cols]
y_test = test_df['target_next_day']

print("\n" + "=" * 80)
print("データ分割")
print("=" * 80)
print(f"Train: {len(X_train)}件 ({train_df['cdr_date'].min()} ~ {train_df['cdr_date'].max()})")
print(f"Val  : {len(X_val)}件 ({val_df['cdr_date'].min()} ~ {val_df['cdr_date'].max()})")
print(f"Test : {len(X_test)}件 ({test_df['cdr_date'].min()} ~ {test_df['cdr_date'].max()})")
print(f"\n特徴量数: {len(feature_cols)}個")


データ分割
Train: 370件 (2018-07-02 00:00:00 ~ 2019-11-29 00:00:00)
Val  : 43件 (2019-12-02 00:00:00 ~ 2020-01-29 00:00:00)
Test : 43件 (2020-01-30 00:00:00 ~ 2020-03-30 00:00:00)

特徴量数: 45個


---

# Optuna最適化

In [5]:
# ==================================================================================
# 評価関数
# ==================================================================================

def calculate_wape(y_true, y_pred):
    return np.sum(np.abs(y_true - y_pred)) / np.sum(np.abs(y_true)) * 100

def evaluate_model(y_true, y_pred):
    return {
        'MAE': mean_absolute_error(y_true, y_pred),
        'RMSE': np.sqrt(mean_squared_error(y_true, y_pred)),
        'R2': r2_score(y_true, y_pred),
        'WAPE': calculate_wape(y_true, y_pred)
    }

print('評価関数を定義しました')

評価関数を定義しました


In [6]:
# ==================================================================================
# 1. Ridge最適化
# ==================================================================================

def optimize_ridge(trial):
    alpha = trial.suggest_float('alpha', 0.1, 200.0, log=True)
    
    model = Ridge(alpha=alpha, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    
    mae = mean_absolute_error(y_val, y_pred)
    return mae

print("\n" + "=" * 80)
print("Ridge最適化開始（100 trials）")
print("=" * 80)

study_ridge = optuna.create_study(direction='minimize', study_name='ridge')
study_ridge.optimize(optimize_ridge, n_trials=100, show_progress_bar=True)

best_ridge_params = study_ridge.best_params
print(f"\nBest Ridge params: {best_ridge_params}")
print(f"Best validation MAE: {study_ridge.best_value:.2f}")

[I 2026-01-12 13:43:15,790] A new study created in memory with name: ridge



Ridge最適化開始（100 trials）


  0%|          | 0/100 [00:00<?, ?it/s]

[I 2026-01-12 13:43:15,810] Trial 0 finished with value: 60.50829450305242 and parameters: {'alpha': 121.48909500521218}. Best is trial 0 with value: 60.50829450305242.
[I 2026-01-12 13:43:15,816] Trial 1 finished with value: 60.47935896757671 and parameters: {'alpha': 11.166681043637508}. Best is trial 1 with value: 60.47935896757671.
[I 2026-01-12 13:43:15,825] Trial 2 finished with value: 61.742454402406786 and parameters: {'alpha': 165.12584548938946}. Best is trial 1 with value: 60.47935896757671.
[I 2026-01-12 13:43:15,833] Trial 3 finished with value: 56.89379252431868 and parameters: {'alpha': 28.137856323873063}. Best is trial 3 with value: 56.89379252431868.
[I 2026-01-12 13:43:15,841] Trial 4 finished with value: 56.89439017877407 and parameters: {'alpha': 33.52518852694889}. Best is trial 3 with value: 56.89379252431868.
[I 2026-01-12 13:43:15,848] Trial 5 finished with value: 68.01158635096817 and parameters: {'alpha': 5.001109828013515}. Best is trial 3 with value: 56.893

In [7]:
# ==================================================================================
# 2. ExtraTrees最適化
# ==================================================================================

def optimize_extratrees(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 100, 500),
        'max_depth': trial.suggest_int('max_depth', 10, 50),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', [None, 'sqrt', 'log2'])
    }
    
    model = ExtraTreesRegressor(**params, random_state=42, n_jobs=-1)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    
    mae = mean_absolute_error(y_val, y_pred)
    return mae

print("\n" + "=" * 80)
print("ExtraTrees最適化開始（100 trials）")
print("=" * 80)

study_extra = optuna.create_study(direction='minimize', study_name='extratrees')
study_extra.optimize(optimize_extratrees, n_trials=100, show_progress_bar=True)

best_extra_params = study_extra.best_params
print(f"\nBest ExtraTrees params: {best_extra_params}")
print(f"Best validation MAE: {study_extra.best_value:.2f}")

[I 2026-01-12 13:43:16,822] A new study created in memory with name: extratrees



ExtraTrees最適化開始（100 trials）


  0%|          | 0/100 [00:00<?, ?it/s]

[I 2026-01-12 13:43:17,376] Trial 0 finished with value: 56.55135177584589 and parameters: {'n_estimators': 397, 'max_depth': 13, 'min_samples_split': 15, 'min_samples_leaf': 8, 'max_features': 'sqrt'}. Best is trial 0 with value: 56.55135177584589.
[I 2026-01-12 13:43:17,783] Trial 1 finished with value: 42.86188013240543 and parameters: {'n_estimators': 257, 'max_depth': 38, 'min_samples_split': 15, 'min_samples_leaf': 4, 'max_features': None}. Best is trial 1 with value: 42.86188013240543.
[I 2026-01-12 13:43:18,162] Trial 2 finished with value: 55.318079651409896 and parameters: {'n_estimators': 231, 'max_depth': 24, 'min_samples_split': 16, 'min_samples_leaf': 4, 'max_features': 'log2'}. Best is trial 1 with value: 42.86188013240543.
[I 2026-01-12 13:43:18,549] Trial 3 finished with value: 56.73974965031366 and parameters: {'n_estimators': 275, 'max_depth': 36, 'min_samples_split': 7, 'min_samples_leaf': 9, 'max_features': 'log2'}. Best is trial 1 with value: 42.86188013240543.
[I

In [8]:
# ==================================================================================
# 3. HistGradientBoosting最適化
# ==================================================================================

def optimize_histgb(trial):
    params = {
        'max_iter': trial.suggest_int('max_iter', 100, 500),
        'learning_rate': trial.suggest_float('learning_rate', 0.001, 0.3, log=True),
        'max_depth': trial.suggest_int('max_depth', 5, 30),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 10, 50),
        'l2_regularization': trial.suggest_float('l2_regularization', 0.0, 20.0)
    }
    
    model = HistGradientBoostingRegressor(**params, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    
    mae = mean_absolute_error(y_val, y_pred)
    return mae

print("\n" + "=" * 80)
print("HistGradientBoosting最適化開始（100 trials）")
print("=" * 80)

study_hist = optuna.create_study(direction='minimize', study_name='histgb')
study_hist.optimize(optimize_histgb, n_trials=100, show_progress_bar=True)

best_hist_params = study_hist.best_params
print(f"\nBest HistGradientBoosting params: {best_hist_params}")
print(f"Best validation MAE: {study_hist.best_value:.2f}")

[I 2026-01-12 13:43:44,856] A new study created in memory with name: histgb



HistGradientBoosting最適化開始（100 trials）


  0%|          | 0/100 [00:00<?, ?it/s]

[I 2026-01-12 13:44:03,901] Trial 0 finished with value: 43.52282340214249 and parameters: {'max_iter': 281, 'learning_rate': 0.010520524139900289, 'max_depth': 22, 'min_samples_leaf': 34, 'l2_regularization': 17.656558017691037}. Best is trial 0 with value: 43.52282340214249.
[I 2026-01-12 13:44:14,376] Trial 1 finished with value: 44.95236062942431 and parameters: {'max_iter': 216, 'learning_rate': 0.024237363021037493, 'max_depth': 11, 'min_samples_leaf': 14, 'l2_regularization': 15.478216147772255}. Best is trial 0 with value: 43.52282340214249.
[I 2026-01-12 13:44:18,819] Trial 2 finished with value: 44.247129649409224 and parameters: {'max_iter': 249, 'learning_rate': 0.01616256533300224, 'max_depth': 19, 'min_samples_leaf': 37, 'l2_regularization': 4.772639419987419}. Best is trial 0 with value: 43.52282340214249.
[I 2026-01-12 13:44:23,090] Trial 3 finished with value: 58.151248762249935 and parameters: {'max_iter': 163, 'learning_rate': 0.004957258402403189, 'max_depth': 14, '

In [9]:
# ==================================================================================
# 4. CatBoost最適化
# ==================================================================================

def optimize_catboost(trial):
    params = {
        'iterations': trial.suggest_int('iterations', 500, 3000),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'depth': trial.suggest_int('depth', 3, 10),
        'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1.0, 20.0),
        'subsample': trial.suggest_float('subsample', 0.5, 1.0)
    }
    
    model = CatBoostRegressor(**params, random_state=42, verbose=0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    
    mae = mean_absolute_error(y_val, y_pred)
    return mae

print("\n" + "=" * 80)
print("CatBoost最適化開始（100 trials）")
print("=" * 80)

study_catboost = optuna.create_study(direction='minimize', study_name='catboost')
study_catboost.optimize(optimize_catboost, n_trials=100, show_progress_bar=True)

best_catboost_params = study_catboost.best_params
print(f"\nBest CatBoost params: {best_catboost_params}")
print(f"Best validation MAE: {study_catboost.best_value:.2f}")

[I 2026-01-12 13:57:38,841] A new study created in memory with name: catboost



CatBoost最適化開始（100 trials）


  0%|          | 0/100 [00:00<?, ?it/s]

[I 2026-01-12 13:58:06,760] Trial 0 finished with value: 61.95902126876508 and parameters: {'iterations': 1305, 'learning_rate': 0.059848110928914205, 'depth': 10, 'l2_leaf_reg': 4.216709306895815, 'subsample': 0.846519617655342}. Best is trial 0 with value: 61.95902126876508.
[I 2026-01-12 13:58:09,504] Trial 1 finished with value: 85.1724215621481 and parameters: {'iterations': 1005, 'learning_rate': 0.22276354544355406, 'depth': 3, 'l2_leaf_reg': 9.746378782486286, 'subsample': 0.7765512308909497}. Best is trial 0 with value: 61.95902126876508.
[I 2026-01-12 13:58:20,934] Trial 2 finished with value: 53.70892072642515 and parameters: {'iterations': 2578, 'learning_rate': 0.014759433584176966, 'depth': 7, 'l2_leaf_reg': 13.411762500909424, 'subsample': 0.6170746955096704}. Best is trial 2 with value: 53.70892072642515.
[I 2026-01-12 13:58:30,305] Trial 3 finished with value: 54.36309280275595 and parameters: {'iterations': 2371, 'learning_rate': 0.14698819167500465, 'depth': 6, 'l2_l

---

# 最適化されたモデルで評価

In [10]:
# ==================================================================================
# 最適パラメータで各モデルを訓練
# ==================================================================================

print("\n" + "=" * 80)
print("最適パラメータでモデル訓練")
print("=" * 80)

# 1. Ridge
print("\n[1/4] Ridge...")
ridge_final = Ridge(**best_ridge_params, random_state=42)
ridge_final.fit(X_train, y_train)
ridge_val_pred = ridge_final.predict(X_val)
ridge_test_pred = ridge_final.predict(X_test)
ridge_val_metrics = evaluate_model(y_val, ridge_val_pred)
ridge_test_metrics = evaluate_model(y_test, ridge_test_pred)
print(f"  Val MAE: {ridge_val_metrics['MAE']:.2f}")
print(f"  Test MAE: {ridge_test_metrics['MAE']:.2f}")

# 2. ExtraTrees
print("\n[2/4] ExtraTrees...")
extra_final = ExtraTreesRegressor(**best_extra_params, random_state=42, n_jobs=-1)
extra_final.fit(X_train, y_train)
extra_val_pred = extra_final.predict(X_val)
extra_test_pred = extra_final.predict(X_test)
extra_val_metrics = evaluate_model(y_val, extra_val_pred)
extra_test_metrics = evaluate_model(y_test, extra_test_pred)
print(f"  Val MAE: {extra_val_metrics['MAE']:.2f}")
print(f"  Test MAE: {extra_test_metrics['MAE']:.2f}")

# 3. HistGradientBoosting
print("\n[3/4] HistGradientBoosting...")
hist_final = HistGradientBoostingRegressor(**best_hist_params, random_state=42)
hist_final.fit(X_train, y_train)
hist_val_pred = hist_final.predict(X_val)
hist_test_pred = hist_final.predict(X_test)
hist_val_metrics = evaluate_model(y_val, hist_val_pred)
hist_test_metrics = evaluate_model(y_test, hist_test_pred)
print(f"  Val MAE: {hist_val_metrics['MAE']:.2f}")
print(f"  Test MAE: {hist_test_metrics['MAE']:.2f}")

# 4. CatBoost
print("\n[4/4] CatBoost...")
catboost_final = CatBoostRegressor(**best_catboost_params, random_state=42, verbose=0)
catboost_final.fit(X_train, y_train)
catboost_val_pred = catboost_final.predict(X_val)
catboost_test_pred = catboost_final.predict(X_test)
catboost_val_metrics = evaluate_model(y_val, catboost_val_pred)
catboost_test_metrics = evaluate_model(y_test, catboost_test_pred)
print(f"  Val MAE: {catboost_val_metrics['MAE']:.2f}")
print(f"  Test MAE: {catboost_test_metrics['MAE']:.2f}")


最適パラメータでモデル訓練

[1/4] Ridge...
  Val MAE: 56.86
  Test MAE: 27.41

[2/4] ExtraTrees...
  Val MAE: 40.36
  Test MAE: 30.94

[3/4] HistGradientBoosting...
  Val MAE: 40.06
  Test MAE: 27.60

[4/4] CatBoost...
  Val MAE: 46.02
  Test MAE: 30.39


In [None]:
# ==================================================================================
# WeightedEnsemble_Aの最適化（Validationセットで重みを最適化）
# ==================================================================================

print("\n" + "=" * 80)
print("WeightedEnsemble_A: 重み最適化")
print("=" * 80)

# Validationセットで重みを最適化
val_predictions = {
    'Ridge': ridge_val_pred,
    'CatBoost': catboost_val_pred,
    'ExtraTrees': extra_val_pred,
    'HistGradientBoosting': hist_val_pred
}

def optimize_ensemble_weights(predictions_dict, y_true, model_names):
    preds_matrix = np.column_stack([predictions_dict[name] for name in model_names])
    
    
    
    
    
    
    
    def objective(weights):
        ensemble_pred = preds_matrix @ weights
        return mean_absolute_error(y_true, ensemble_pred)
    
    constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1.0}
    bounds = [(0, 1) for _ in range(len(model_names))]
    initial_weights = np.ones(len(model_names)) / len(model_names)
    
    result = minimize(objective, initial_weights, method='SLSQP',
                     bounds=bounds, constraints=constraints)
    return result.x

model_names = ['Ridge', 'CatBoost', 'ExtraTrees', 'HistGradientBoosting']
optimal_weights = optimize_ensemble_weights(val_predictions, y_val, model_names)

print("\n最適な重み:")
for name, weight in zip(model_names, optimal_weights):
    print(f"  {name}: {weight:.4f}")

# Validationセットで評価
ensemble_val_pred = np.column_stack([val_predictions[name] for name in model_names]) @ optimal_weights
ensemble_val_metrics = evaluate_model(y_val, ensemble_val_pred)
print(f"\nVal MAE: {ensemble_val_metrics['MAE']:.2f}")

# Testセットで評価
test_predictions = {
    'Ridge': ridge_test_pred,
    'CatBoost': catboost_test_pred,
    'ExtraTrees': extra_test_pred,
    'HistGradientBoosting': hist_test_pred
}
ensemble_test_pred = np.column_stack([test_predictions[name] for name in model_names]) @ optimal_weights
ensemble_test_metrics = evaluate_model(y_test, ensemble_test_pred)
print(f"Test MAE: {ensemble_test_metrics['MAE']:.2f}")


WeightedEnsemble_A: 重み最適化

最適な重み:
  Ridge: 0.0000
  CatBoost: 0.0537
  ExtraTrees: 0.2400
  HistGradientBoosting: 0.7063

Val MAE: 39.09
Test MAE: 28.11


---

# 結果の集計と保存

In [12]:
# ==================================================================================
# 結果の集計
# ==================================================================================

import os

results_data = [
    {'model': 'Ridge', 'set': 'Validation', **ridge_val_metrics},
    {'model': 'Ridge', 'set': 'Test', **ridge_test_metrics},
    {'model': 'ExtraTrees', 'set': 'Validation', **extra_val_metrics},
    {'model': 'ExtraTrees', 'set': 'Test', **extra_test_metrics},
    {'model': 'HistGradientBoosting', 'set': 'Validation', **hist_val_metrics},
    {'model': 'HistGradientBoosting', 'set': 'Test', **hist_test_metrics},
    {'model': 'CatBoost', 'set': 'Validation', **catboost_val_metrics},
    {'model': 'CatBoost', 'set': 'Test', **catboost_test_metrics},
    {'model': 'WeightedEnsemble_A', 'set': 'Validation', **ensemble_val_metrics},
    {'model': 'WeightedEnsemble_A', 'set': 'Test', **ensemble_test_metrics}
]

results_df = pd.DataFrame(results_data)

print("\n" + "=" * 80)
print("exp19 最終結果")
print("=" * 80)
print(results_df.to_string(index=False))

# 最適パラメータの保存
optimized_params = {
    'Ridge': best_ridge_params,
    'ExtraTrees': best_extra_params,
    'HistGradientBoosting': best_hist_params,
    'CatBoost': best_catboost_params,
    'WeightedEnsemble_A_weights': {name: float(weight) for name, weight in zip(model_names, optimal_weights)}
}

# CSV保存
output_dir = '../output/exp19'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

results_df.to_csv(f'{output_dir}/optuna_results.csv', index=False)

import json
with open(f'{output_dir}/optimized_params.json', 'w') as f:
    json.dump(optimized_params, f, indent=2)

print(f"\n結果を保存しました: {output_dir}/")
print(f"  - optuna_results.csv")
print(f"  - optimized_params.json")


exp19 最終結果
               model        set       MAE      RMSE       R2      WAPE
               Ridge Validation 56.862463 71.969760 0.481373 42.258658
               Ridge       Test 27.405402 33.815064 0.671878 28.568056
          ExtraTrees Validation 40.360101 58.782623 0.654018 29.994544
          ExtraTrees       Test 30.941713 38.506975 0.574505 32.254391
HistGradientBoosting Validation 40.062854 56.423729 0.681229 29.773639
HistGradientBoosting       Test 27.604013 33.577072 0.676480 28.775092
            CatBoost Validation 46.021132 59.865888 0.641149 34.201671
            CatBoost       Test 30.388853 36.049538 0.627081 31.678077
  WeightedEnsemble_A Validation 39.089975 55.583984 0.690647 29.050621
  WeightedEnsemble_A       Test 28.111850 34.329586 0.661817 29.304474

結果を保存しました: ../output/exp19/
  - optuna_results.csv
  - optimized_params.json


In [13]:
# ==================================================================================
# exp16との比較
# ==================================================================================

print("\n" + "=" * 80)
print("exp16 vs exp19 比較")
print("=" * 80)

# exp16のベストスコア（WeightedEnsemble_A）
exp16_best_mae = 48.98  # exp16の結果から
exp19_test_mae = ensemble_test_metrics['MAE']

improvement = exp16_best_mae - exp19_test_mae
improvement_pct = (improvement / exp16_best_mae) * 100

print(f"\nWeightedEnsemble_A:")
print(f"  exp16 (旧パラメータ): {exp16_best_mae:.2f}")
print(f"  exp19 (Optuna最適化): {exp19_test_mae:.2f}")
print(f"  差分: {improvement:+.2f} ({improvement_pct:+.1f}%)")

if improvement > 0:
    print(f"\n✅ 改善成功！ {improvement:.2f}MAEの向上")
else:
    print(f"\n❌ 悪化 {abs(improvement):.2f}MAEの低下")

print("\n" + "=" * 80)
print("最適化完了")
print("=" * 80)


exp16 vs exp19 比較

WeightedEnsemble_A:
  exp16 (旧パラメータ): 48.98
  exp19 (Optuna最適化): 28.11
  差分: +20.87 (+42.6%)

✅ 改善成功！ 20.87MAEの向上

最適化完了


---

## Summary

**exp19: Optunaによるハイパーパラメータ再最適化**

### ベースライン:
- exp16 (WeightedEnsemble_A MAE=48.98)

### 最適化内容:
1. **Ridge** - alphaを再最適化
2. **ExtraTrees** - n_estimators, max_depth等を最適化
3. **HistGradientBoosting** - learning_rate, max_depth等を最適化
4. **CatBoost** - iterations, learning_rate, depth等を最適化
5. **WeightedEnsemble_A** - Validationセットで重みを最適化

### 最適化戦略:
- Train/Validation/Testの3分割
- Validationセットでハイパーパラメータを最適化
- 各モデル100 trials実行
- Testセットで最終評価

### 結果:
- exp16と比較してMAEの改善/悪化を確認
- 最適化されたパラメータをJSON形式で保存

### 出力ファイル:
1. `optuna_results.csv` - 各モデルのVal/Test結果
2. `optimized_params.json` - 最適化されたパラメータ