### Model Optimizasyonu ve Overfitting Yönetimi
Selin Hoca'nın slaytındaki 'Overfitting' riskini yönetmek ve modelin genelleme yeteneğini artırmak için Optuna ile hiperparametre optimizasyonu yaptım.

Validation setini kullanarak `learning_rate`, `num_leaves` ve regülasyon (`lambda`) parametrelerini bilimsel olarak ayarladım.

In [38]:
import pandas as pd
import numpy as np
import optuna

In [39]:
import os
import joblib

In [40]:
train_df = pd.read_csv('../data/ranking/train.csv')
val_df = pd.read_csv('../data/ranking/val.csv')

In [41]:
q_train = train_df.groupby('Event_Block', sort=False).size().to_list()
q_val = val_df.groupby('Event_Block', sort=False).size().to_list()

In [53]:
drop_cols = ['Class', 'Time', 'Event_Block']

In [54]:
X_train = train_df.drop(drop_cols, axis=1)
y_train = train_df['Class']

In [55]:
X_val = val_df.drop(drop_cols, axis=1)
y_val = val_df['Class']

In [56]:
print(f"Train Gruplari: {len(q_train)} adet")
print(f"Val Gruplari: {len(q_val)} adet")

Train Gruplari: 33 adet
Val Gruplari: 7 adet


In [57]:
import lightgbm as lgb
from sklearn.metrics import ndcg_score

In [58]:
def objective(trial):
    params = {
        'objective': 'lambdarank', 
        'metric': 'ndcg',          
        'eval_at': 5,              
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'random_state': 42,
        
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2),
        'num_leaves': trial.suggest_int('num_leaves', 20, 150),
        'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
        'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
    }

    model = lgb.LGBMRanker(**params)

    model.fit(
        X_train, y_train,
        group=q_train,
        eval_set=[(X_val, y_val)],
        eval_group=[q_val],
        callbacks=[lgb.early_stopping(stopping_rounds=50)] 
    )

    score = model.best_score_['valid_0']['ndcg@5']
    return score

print("Optimizasyon başlıyor... (Bu işlem birkaç dakika sürebilir)")
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)

print("\n------------------------------------------------")
print("EN İYİ PARAMETRELER:", study.best_params)
print("EN İYİ SKOR (NDCG@5):", study.best_value)
print("------------------------------------------------")

[I 2025-12-01 23:53:42,044] A new study created in memory with name: no-name-5563f767-0df8-4522-af4a-9c3cc8e5f257


Optimizasyon başlıyor... (Bu işlem birkaç dakika sürebilir)
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:42,588] Trial 0 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.18220039063807442, 'num_leaves': 41, 'lambda_l1': 9.40875096434723, 'lambda_l2': 1.6564274910709353, 'bagging_fraction': 0.59814878239506, 'feature_fraction': 0.5499314480931168}. Best is trial 0 with value: 0.9603895103778503.


Early stopping, best iteration is:
[29]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:43,097] Trial 1 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.19511587706273342, 'num_leaves': 124, 'lambda_l1': 8.265091392456004, 'lambda_l2': 5.740962831374724, 'bagging_fraction': 0.5454587244276248, 'feature_fraction': 0.7385304986203407}. Best is trial 0 with value: 0.9603895103778503.


Early stopping, best iteration is:
[34]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:43,574] Trial 2 finished with value: 0.9549073639314979 and parameters: {'learning_rate': 0.05147313455074986, 'num_leaves': 62, 'lambda_l1': 8.389329282231957, 'lambda_l2': 8.414902679201896, 'bagging_fraction': 0.5208331860816878, 'feature_fraction': 0.807106372671865}. Best is trial 0 with value: 0.9603895103778503.


Early stopping, best iteration is:
[20]	valid_0's ndcg@5: 0.954907
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:44,013] Trial 3 finished with value: 0.9570306885501214 and parameters: {'learning_rate': 0.09212249364965762, 'num_leaves': 87, 'lambda_l1': 8.049878246777128, 'lambda_l2': 4.2722198045894775, 'bagging_fraction': 0.5981626891094557, 'feature_fraction': 0.5718639687199591}. Best is trial 0 with value: 0.9603895103778503.


Early stopping, best iteration is:
[9]	valid_0's ndcg@5: 0.957031
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:44,945] Trial 4 finished with value: 1.0 and parameters: {'learning_rate': 0.0868193414184596, 'num_leaves': 26, 'lambda_l1': 0.10660088294024428, 'lambda_l2': 4.157067925555031, 'bagging_fraction': 0.5429073696880456, 'feature_fraction': 0.9326918410538783}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[57]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:45,819] Trial 5 finished with value: 1.0 and parameters: {'learning_rate': 0.12497930413252059, 'num_leaves': 91, 'lambda_l1': 1.8309085467616493, 'lambda_l2': 5.185007587170793, 'bagging_fraction': 0.7392312306713402, 'feature_fraction': 0.8832986397141234}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[66]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:46,397] Trial 6 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.10505787894267012, 'num_leaves': 35, 'lambda_l1': 7.463594448355369, 'lambda_l2': 7.433121502437886, 'bagging_fraction': 0.5110336182445496, 'feature_fraction': 0.7914757199514363}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[36]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:47,420] Trial 7 finished with value: 1.0 and parameters: {'learning_rate': 0.19473733521957765, 'num_leaves': 133, 'lambda_l1': 0.1609020024400487, 'lambda_l2': 7.680459212501468, 'bagging_fraction': 0.8231276274458116, 'feature_fraction': 0.6935894184427549}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[36]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:48,071] Trial 8 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.0745395810234161, 'num_leaves': 28, 'lambda_l1': 7.590498360815719, 'lambda_l2': 2.4303120976229584, 'bagging_fraction': 0.982179108274573, 'feature_fraction': 0.5556864977096287}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[60]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:49,482] Trial 9 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.15788417535253743, 'num_leaves': 131, 'lambda_l1': 0.6040971542314721, 'lambda_l2': 3.895078517959387, 'bagging_fraction': 0.6569606339092432, 'feature_fraction': 0.5840634833037289}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[50]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:49,884] Trial 10 finished with value: 0.9174201989279717 and parameters: {'learning_rate': 0.011893976987290333, 'num_leaves': 64, 'lambda_l1': 4.173577900683506, 'lambda_l2': 9.751865163069542, 'bagging_fraction': 0.8241225102609616, 'feature_fraction': 0.9885177164936169}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[11]	valid_0's ndcg@5: 0.91742
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:50,651] Trial 11 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.1313649715986423, 'num_leaves': 100, 'lambda_l1': 2.390521330170046, 'lambda_l2': 5.649346010190467, 'bagging_fraction': 0.7389868816609154, 'feature_fraction': 0.943461294486278}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[41]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:51,366] Trial 12 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.12490502180132732, 'num_leaves': 100, 'lambda_l1': 2.5447922925693933, 'lambda_l2': 3.047386954249752, 'bagging_fraction': 0.7205734330268613, 'feature_fraction': 0.8895660221349426}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[36]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:52,214] Trial 13 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.05583567660024231, 'num_leaves': 63, 'lambda_l1': 1.6430868951347581, 'lambda_l2': 0.048303945882516075, 'bagging_fraction': 0.9422606753241274, 'feature_fraction': 0.8854686529503876}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[50]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:52,872] Trial 14 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.14647647492653612, 'num_leaves': 110, 'lambda_l1': 4.397182979005675, 'lambda_l2': 6.3520957463036325, 'bagging_fraction': 0.8382604126248034, 'feature_fraction': 0.8736276198154125}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[53]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:53,643] Trial 15 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.10017201955070196, 'num_leaves': 73, 'lambda_l1': 3.392504923922034, 'lambda_l2': 4.614667999692069, 'bagging_fraction': 0.6614518928593012, 'feature_fraction': 0.9987537850422921}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[71]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:54,106] Trial 16 finished with value: 0.9395226032574636 and parameters: {'learning_rate': 0.038883081278820356, 'num_leaves': 149, 'lambda_l1': 5.990261266592525, 'lambda_l2': 1.4475427742981761, 'bagging_fraction': 0.8983869708226935, 'feature_fraction': 0.849068120488944}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[17]	valid_0's ndcg@5: 0.939523
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:55,117] Trial 17 finished with value: 1.0 and parameters: {'learning_rate': 0.11849255355020938, 'num_leaves': 43, 'lambda_l1': 1.1006164997983676, 'lambda_l2': 3.3814533420430095, 'bagging_fraction': 0.678855851539612, 'feature_fraction': 0.9421597443531575}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[62]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:55,745] Trial 18 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.08289941943631444, 'num_leaves': 87, 'lambda_l1': 5.863973702758553, 'lambda_l2': 6.637826250447291, 'bagging_fraction': 0.591995298357778, 'feature_fraction': 0.6913010916892359}. Best is trial 4 with value: 1.0.


Early stopping, best iteration is:
[42]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-01 23:53:56,602] Trial 19 finished with value: 1.0 and parameters: {'learning_rate': 0.15947407843120223, 'num_leaves': 22, 'lambda_l1': 1.7510221106881967, 'lambda_l2': 4.714491015151293, 'bagging_fraction': 0.7906168588629652, 'feature_fraction': 0.9519789808898135}. Best is trial 4 with value: 1.0.


Did not meet early stopping. Best iteration is:
[55]	valid_0's ndcg@5: 1

------------------------------------------------
EN İYİ PARAMETRELER: {'learning_rate': 0.0868193414184596, 'num_leaves': 26, 'lambda_l1': 0.10660088294024428, 'lambda_l2': 4.157067925555031, 'bagging_fraction': 0.5429073696880456, 'feature_fraction': 0.9326918410538783}
EN İYİ SKOR (NDCG@5): 1.0
------------------------------------------------


In [59]:
best_params = study.best_params

In [60]:
best_params['objective'] = 'lambdarank'
best_params['metric'] = 'ndcg'
best_params['eval_at'] = 5
best_params['random_state'] = 42

In [61]:
final_model = lgb.LGBMRanker(**best_params)

In [62]:
final_model.fit(
    X_train, y_train,
    group=q_train,
    eval_set=[(X_val, y_val)],
    eval_group=[q_val],
    callbacks=[lgb.early_stopping(stopping_rounds=50)]
)

Training until validation scores don't improve for 50 rounds




Did not meet early stopping. Best iteration is:
[57]	valid_0's ndcg@5: 1


0,1,2
,boosting_type,'gbdt'
,num_leaves,26
,max_depth,-1
,learning_rate,0.0868193414184596
,n_estimators,100
,subsample_for_bin,200000
,objective,'lambdarank'
,class_weight,
,min_split_gain,0.0
,min_child_weight,0.001


In [63]:
os.makedirs('../models', exist_ok=True)
joblib.dump(final_model, '../models/lgbm_ranker.pkl')

['../models/lgbm_ranker.pkl']