# 地方競馬（NAR）ハイパーパラメータ最適化 (09モデル)

Optunaを用いて、08モデルで使用した特徴量セットに対する最適なLightGBMパラメータを探索します。

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import lightgbm as lgb
import optuna
from scipy.stats import spearmanr
from sklearn.metrics import mean_squared_error

# プロジェクトのsrcディレクトリをパスに追加
src_path = os.path.abspath(os.path.join(os.getcwd(), '../../src'))
if src_path not in sys.path:
    sys.path.append(src_path)

from nar.loader import NarDataLoader
from nar.features import NarFeatureGenerator

  from optuna import progress_bar as pbar_module


In [2]:
loader = NarDataLoader()
raw_df = loader.load(limit=150000, region='south_kanto')

# 08モデルと同じ特徴量生成（1-5走前、改善ロジック込）
generator = NarFeatureGenerator(history_windows=[1, 2, 3, 4, 5])
df = generator.generate_features(raw_df)

df = df.dropna(subset=['rank']).copy()
df['date'] = pd.to_datetime(df['date'])

# 特徴量リスト
baseline_features = [
    'distance', 'venue', 'state', 'frame_number', 'horse_number', 'weight', 'impost',
    'jockey_win_rate', 'jockey_place_rate', 'trainer_win_rate', 'trainer_place_rate',
    'horse_run_count'
] + [col for col in df.columns if 'horse_prev' in col]

advanced_features = [
    'gender', 'age', 'days_since_prev_race', 'weight_diff',
    'horse_jockey_place_rate', 'is_consecutive_jockey',
    'distance_diff', 'horse_venue_place_rate',
    'trainer_30d_win_rate',
    'impost_diff', 'was_accident_prev1', 'weighted_si_momentum', 'weighted_rank_momentum'
]
features = list(set(baseline_features + advanced_features))

# Preprocessing
categorical_cols = ['venue', 'state', 'gender']
for col in features:
    if col in categorical_cols:
        df[col] = df[col].astype(str).astype('category')
    else:
        df[col] = pd.to_numeric(df[col], errors='coerce')

# Split
split_date = df['date'].quantile(0.8)
train_df = df[df['date'] < split_date].copy()
test_df = df[df['date'] >= split_date].copy()

重複データを削除しました: 150000 -> 143895 件


## Optunaによるパラメータ探索

検証データに対するRMSEを最小化するパラメータを探索します。

In [3]:
def objective(trial):
    params = {
        'objective': 'regression',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'random_state': 42,
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
        'num_leaves': trial.suggest_int('num_leaves', 20, 150),
        'max_depth': trial.suggest_int('max_depth', 5, 15),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 7),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
    }
    
    # 訓練/検証
    model = lgb.LGBMRegressor(**params)
    model.fit(
        train_df[features], train_df['rank'],
        eval_set=[(test_df[features], test_df['rank'])],
        callbacks=[lgb.early_stopping(stopping_rounds=50)]
    )
    
    preds = model.predict(test_df[features])
    rmse = np.sqrt(mean_squared_error(test_df['rank'], preds))
    return rmse

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=30)

print("Best Trial:")
print(study.best_trial.params)

[I 2026-01-27 14:37:47,477] A new study created in memory with name: no-name-3d3190e5-975b-4c8c-a80f-35a6a82f4b47


Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:37:56,495] Trial 0 finished with value: 3.3437159590180867 and parameters: {'learning_rate': 0.035093769227807346, 'num_leaves': 106, 'max_depth': 14, 'feature_fraction': 0.5612289521771443, 'bagging_fraction': 0.7653261846783888, 'bagging_freq': 4, 'min_child_samples': 26}. Best is trial 0 with value: 3.3437159590180867.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.34372
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:06,850] Trial 1 finished with value: 3.448134301025562 and parameters: {'learning_rate': 0.012194132887044892, 'num_leaves': 70, 'max_depth': 8, 'feature_fraction': 0.8449744984608496, 'bagging_fraction': 0.6814986713716191, 'bagging_freq': 4, 'min_child_samples': 27}. Best is trial 0 with value: 3.3437159590180867.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.44813
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:11,790] Trial 2 finished with value: 3.327054741309906 and parameters: {'learning_rate': 0.07687953239796541, 'num_leaves': 70, 'max_depth': 10, 'feature_fraction': 0.5744512942209867, 'bagging_fraction': 0.6786014421745137, 'bagging_freq': 1, 'min_child_samples': 78}. Best is trial 2 with value: 3.327054741309906.


Did not meet early stopping. Best iteration is:
[96]	valid_0's rmse: 3.32705
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:26,260] Trial 3 finished with value: 3.3561136696491087 and parameters: {'learning_rate': 0.09501057153265861, 'num_leaves': 145, 'max_depth': 9, 'feature_fraction': 0.5345937438942765, 'bagging_fraction': 0.6025998127824441, 'bagging_freq': 2, 'min_child_samples': 5}. Best is trial 2 with value: 3.327054741309906.


Did not meet early stopping. Best iteration is:
[72]	valid_0's rmse: 3.35611
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:34,513] Trial 4 finished with value: 3.3399846629613004 and parameters: {'learning_rate': 0.03374118135108218, 'num_leaves': 108, 'max_depth': 11, 'feature_fraction': 0.8278321743658474, 'bagging_fraction': 0.9760254270044311, 'bagging_freq': 2, 'min_child_samples': 17}. Best is trial 2 with value: 3.327054741309906.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.33998
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:41,301] Trial 5 finished with value: 3.333292612251729 and parameters: {'learning_rate': 0.0915177161258977, 'num_leaves': 124, 'max_depth': 7, 'feature_fraction': 0.6464112215911288, 'bagging_fraction': 0.6014556015365428, 'bagging_freq': 4, 'min_child_samples': 86}. Best is trial 2 with value: 3.327054741309906.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.33329
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:43,199] Trial 6 finished with value: 3.44480546061119 and parameters: {'learning_rate': 0.016002746452624724, 'num_leaves': 36, 'max_depth': 6, 'feature_fraction': 0.5320225193258639, 'bagging_fraction': 0.6666169609794005, 'bagging_freq': 2, 'min_child_samples': 79}. Best is trial 2 with value: 3.327054741309906.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.44481
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:38:52,495] Trial 7 finished with value: 3.3237570451296317 and parameters: {'learning_rate': 0.08321759781804604, 'num_leaves': 108, 'max_depth': 12, 'feature_fraction': 0.7283249455851352, 'bagging_fraction': 0.9424346324636406, 'bagging_freq': 4, 'min_child_samples': 64}. Best is trial 7 with value: 3.3237570451296317.


Did not meet early stopping. Best iteration is:
[86]	valid_0's rmse: 3.32376
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:01,561] Trial 8 finished with value: 3.3374259623438394 and parameters: {'learning_rate': 0.04612112321651181, 'num_leaves': 126, 'max_depth': 11, 'feature_fraction': 0.9891267883746186, 'bagging_fraction': 0.5417414392978301, 'bagging_freq': 2, 'min_child_samples': 41}. Best is trial 7 with value: 3.3237570451296317.


Did not meet early stopping. Best iteration is:
[94]	valid_0's rmse: 3.33743
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:07,328] Trial 9 finished with value: 3.32305395752781 and parameters: {'learning_rate': 0.057195855474358744, 'num_leaves': 64, 'max_depth': 8, 'feature_fraction': 0.7723936353072873, 'bagging_fraction': 0.7773239836305461, 'bagging_freq': 5, 'min_child_samples': 46}. Best is trial 9 with value: 3.32305395752781.


Did not meet early stopping. Best iteration is:
[96]	valid_0's rmse: 3.32305
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:11,981] Trial 10 finished with value: 3.3386856506225135 and parameters: {'learning_rate': 0.061482118407679316, 'num_leaves': 21, 'max_depth': 5, 'feature_fraction': 0.9584528651292512, 'bagging_fraction': 0.8494031922903686, 'bagging_freq': 7, 'min_child_samples': 53}. Best is trial 9 with value: 3.32305395752781.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.33869
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:19,812] Trial 11 finished with value: 3.315868653892015 and parameters: {'learning_rate': 0.06929655065309909, 'num_leaves': 78, 'max_depth': 14, 'feature_fraction': 0.718724093748932, 'bagging_fraction': 0.9931263288182801, 'bagging_freq': 6, 'min_child_samples': 60}. Best is trial 11 with value: 3.315868653892015.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.31587
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:30,480] Trial 12 finished with value: 3.327397093996466 and parameters: {'learning_rate': 0.06179615325660124, 'num_leaves': 64, 'max_depth': 15, 'feature_fraction': 0.7172089213996627, 'bagging_fraction': 0.8704033681325001, 'bagging_freq': 6, 'min_child_samples': 57}. Best is trial 11 with value: 3.315868653892015.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.3274
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:34,062] Trial 13 finished with value: 3.3222228513480134 and parameters: {'learning_rate': 0.07292782723305677, 'num_leaves': 51, 'max_depth': 13, 'feature_fraction': 0.8265395562466653, 'bagging_fraction': 0.7791487023849764, 'bagging_freq': 6, 'min_child_samples': 38}. Best is trial 11 with value: 3.315868653892015.


Did not meet early stopping. Best iteration is:
[99]	valid_0's rmse: 3.32222
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:39,204] Trial 14 finished with value: 3.3209694709935573 and parameters: {'learning_rate': 0.07439499981618328, 'num_leaves': 45, 'max_depth': 13, 'feature_fraction': 0.8946906093295522, 'bagging_fraction': 0.8885346779566712, 'bagging_freq': 7, 'min_child_samples': 37}. Best is trial 11 with value: 3.315868653892015.


Did not meet early stopping. Best iteration is:
[94]	valid_0's rmse: 3.32097
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:51,170] Trial 15 finished with value: 3.3287083225683314 and parameters: {'learning_rate': 0.06861915077009152, 'num_leaves': 88, 'max_depth': 15, 'feature_fraction': 0.8975463261054779, 'bagging_fraction': 0.9043417533939105, 'bagging_freq': 7, 'min_child_samples': 67}. Best is trial 11 with value: 3.315868653892015.


Did not meet early stopping. Best iteration is:
[91]	valid_0's rmse: 3.32871
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:39:54,394] Trial 16 finished with value: 3.313837914625291 and parameters: {'learning_rate': 0.08685988063106218, 'num_leaves': 43, 'max_depth': 13, 'feature_fraction': 0.6594684801910744, 'bagging_fraction': 0.9696836279039363, 'bagging_freq': 6, 'min_child_samples': 93}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[92]	valid_0's rmse: 3.31384
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:01,216] Trial 17 finished with value: 3.324430302212564 and parameters: {'learning_rate': 0.09916866040073528, 'num_leaves': 86, 'max_depth': 13, 'feature_fraction': 0.6627912646127228, 'bagging_fraction': 0.9766990446852627, 'bagging_freq': 5, 'min_child_samples': 99}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[71]	valid_0's rmse: 3.32443
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:04,357] Trial 18 finished with value: 3.3206053495626104 and parameters: {'learning_rate': 0.08399605810822655, 'num_leaves': 35, 'max_depth': 14, 'feature_fraction': 0.6321541752852223, 'bagging_fraction': 0.8150869807313665, 'bagging_freq': 6, 'min_child_samples': 98}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.32061
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:06,874] Trial 19 finished with value: 3.331997017193942 and parameters: {'learning_rate': 0.04966767137632412, 'num_leaves': 22, 'max_depth': 12, 'feature_fraction': 0.6975950698982194, 'bagging_fraction': 0.9297536476654876, 'bagging_freq': 5, 'min_child_samples': 88}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.332
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:11,809] Trial 20 finished with value: 3.3194820187137046 and parameters: {'learning_rate': 0.08165881768159389, 'num_leaves': 53, 'max_depth': 14, 'feature_fraction': 0.766201508227773, 'bagging_fraction': 0.9800241202723954, 'bagging_freq': 6, 'min_child_samples': 67}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[90]	valid_0's rmse: 3.31948
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:19,216] Trial 21 finished with value: 3.322073713957766 and parameters: {'learning_rate': 0.08507630549168727, 'num_leaves': 51, 'max_depth': 14, 'feature_fraction': 0.77369842339292, 'bagging_fraction': 0.9705626924409576, 'bagging_freq': 6, 'min_child_samples': 68}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[99]	valid_0's rmse: 3.32207
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:25,917] Trial 22 finished with value: 3.3219082466382153 and parameters: {'learning_rate': 0.08872437010640959, 'num_leaves': 76, 'max_depth': 15, 'feature_fraction': 0.6002403268934764, 'bagging_fraction': 0.9978020533445204, 'bagging_freq': 5, 'min_child_samples': 78}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[76]	valid_0's rmse: 3.32191
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:34,106] Trial 23 finished with value: 3.3177206363434775 and parameters: {'learning_rate': 0.06636919231300129, 'num_leaves': 54, 'max_depth': 12, 'feature_fraction': 0.6848739052792757, 'bagging_fraction': 0.9195716218712542, 'bagging_freq': 6, 'min_child_samples': 60}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.31772
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:41,397] Trial 24 finished with value: 3.3240112525688974 and parameters: {'learning_rate': 0.06646953558503448, 'num_leaves': 37, 'max_depth': 12, 'feature_fraction': 0.6907504679639227, 'bagging_fraction': 0.9176649708832814, 'bagging_freq': 7, 'min_child_samples': 60}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[99]	valid_0's rmse: 3.32401
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:40:55,280] Trial 25 finished with value: 3.3342065581937836 and parameters: {'learning_rate': 0.05072671707524165, 'num_leaves': 92, 'max_depth': 11, 'feature_fraction': 0.6150406088476599, 'bagging_fraction': 0.8341848660296325, 'bagging_freq': 5, 'min_child_samples': 48}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.33421
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:41:02,079] Trial 26 finished with value: 3.3238888007125844 and parameters: {'learning_rate': 0.06808743305509773, 'num_leaves': 61, 'max_depth': 13, 'feature_fraction': 0.6730075564688703, 'bagging_fraction': 0.9327006428506727, 'bagging_freq': 3, 'min_child_samples': 90}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.32389
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:41:10,211] Trial 27 finished with value: 3.3374114206654064 and parameters: {'learning_rate': 0.04018713686148884, 'num_leaves': 77, 'max_depth': 10, 'feature_fraction': 0.7537099800035326, 'bagging_fraction': 0.8759117039231462, 'bagging_freq': 6, 'min_child_samples': 78}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[100]	valid_0's rmse: 3.33741
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:41:14,298] Trial 28 finished with value: 3.3261896403590208 and parameters: {'learning_rate': 0.07816451685424672, 'num_leaves': 43, 'max_depth': 12, 'feature_fraction': 0.7967517388647005, 'bagging_fraction': 0.9446839107475313, 'bagging_freq': 7, 'min_child_samples': 74}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[90]	valid_0's rmse: 3.32619
Training until validation scores don't improve for 50 rounds


[I 2026-01-27 14:41:24,014] Trial 29 finished with value: 3.3239078931573762 and parameters: {'learning_rate': 0.058672659486744425, 'num_leaves': 95, 'max_depth': 14, 'feature_fraction': 0.5982048785088567, 'bagging_fraction': 0.7403606453262424, 'bagging_freq': 3, 'min_child_samples': 30}. Best is trial 16 with value: 3.313837914625291.


Did not meet early stopping. Best iteration is:
[91]	valid_0's rmse: 3.32391
Best Trial:
{'learning_rate': 0.08685988063106218, 'num_leaves': 43, 'max_depth': 13, 'feature_fraction': 0.6594684801910744, 'bagging_fraction': 0.9696836279039363, 'bagging_freq': 6, 'min_child_samples': 93}


## 最良パラメータでの評価

探索で見つかった最良のパラメータを用いて最終的な精度を算出します。

In [4]:
best_params = study.best_trial.params
best_params['objective'] = 'regression'
# n_estimators は十分に大きく設定
best_model = lgb.LGBMRegressor(n_estimators=2000, **best_params)

best_model.fit(
    train_df[features], train_df['rank'],
    eval_set=[(test_df[features], test_df['rank'])],
    callbacks=[lgb.early_stopping(stopping_rounds=100)]
)

test_df['pred_score'] = best_model.predict(test_df[features])
corr, _ = spearmanr(test_df['pred_score'], test_df['rank'])
print(f"最良パラメータでの Spearman相関: {corr:.4f}")

# 的中率
test_df['pred_rank'] = test_df.groupby('race_id')['pred_score'].rank(method='min')
top1 = test_df[test_df['pred_rank'] == 1]
print(f"予測1位 勝率: {(top1['rank'] == 1).mean():.2%}")
print(f"予測1位 複勝率: {(top1['rank'] <= 3).mean():.2%}")

Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[104]	valid_0's l2: 10.9899
最良パラメータでの Spearman相関: 0.4829
予測1位 勝率: 24.09%
予測1位 複勝率: 61.55%
