### Model Optimizasyonu ve Overfitting Yönetimi
Selin Hoca'nın slaytındaki 'Overfitting' riskini yönetmek ve modelin genelleme yeteneğini artırmak için Optuna ile hiperparametre optimizasyonu yaptım.

Validation setini kullanarak `learning_rate`, `num_leaves` ve regülasyon (`lambda`) parametrelerini bilimsel olarak ayarladım.

In [1]:
import pandas as pd
import numpy as np
import optuna

In [2]:
import os
import joblib

In [3]:
train_df = pd.read_csv('../data/ranking/train.csv')
val_df = pd.read_csv('../data/ranking/val.csv')

In [4]:
q_train = train_df.groupby('Event_Block', sort=False).size().to_list()
q_val = val_df.groupby('Event_Block', sort=False).size().to_list()

In [5]:
drop_cols = ['Class', 'Time', 'Event_Block']

In [6]:
X_train = train_df.drop(drop_cols, axis=1)
y_train = train_df['Class']

In [7]:
X_val = val_df.drop(drop_cols, axis=1)
y_val = val_df['Class']

In [8]:
print(f"Train Gruplari: {len(q_train)} adet")
print(f"Val Gruplari: {len(q_val)} adet")

Train Gruplari: 33 adet
Val Gruplari: 7 adet


In [9]:
import lightgbm as lgb
from sklearn.metrics import ndcg_score

In [16]:
def objective(trial):
    params = {
        'objective': 'lambdarank', 
        'metric': 'ndcg',          
        'eval_at': 5,              
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'random_state': 42,
        
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2),
        'num_leaves': trial.suggest_int('num_leaves', 20, 150),
        'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
        'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
    }

    model = lgb.LGBMRanker(**params)

    model.fit(
        X_train, y_train,
        group=q_train,
        eval_set=[(X_val, y_val)],
        eval_group=[q_val],
        callbacks=[lgb.early_stopping(stopping_rounds=50)] 
    )

    score = model.best_score_['valid_0']['ndcg@5']
    return score

In [31]:
print("Optimizasyon başlıyor... (Bu işlem birkaç dakika sürebilir)")
sampler = optuna.samplers.TPESampler(seed=42) # hep ayni sirayla arama yapilsin diye sampler a seed degiskeni verdim
study = optuna.create_study(direction='maximize', sampler=sampler)
study.optimize(objective, n_trials=20)

[I 2025-12-03 20:53:42,555] A new study created in memory with name: no-name-eff94eb2-ae28-49b8-92da-9d79fe87b8cb


Optimizasyon başlıyor... (Bu işlem birkaç dakika sürebilir)
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:43,242] Trial 0 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.08116262258099888, 'num_leaves': 144, 'lambda_l1': 7.319939418114051, 'lambda_l2': 5.986584841970366, 'bagging_fraction': 0.5780093202212182, 'feature_fraction': 0.5779972601681014}. Best is trial 0 with value: 0.9603895103778503.


Did not meet early stopping. Best iteration is:
[63]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:43,656] Trial 1 finished with value: 0.9174201989279717 and parameters: {'learning_rate': 0.021035886311957897, 'num_leaves': 133, 'lambda_l1': 6.011150117432088, 'lambda_l2': 7.080725777960454, 'bagging_fraction': 0.5102922471479012, 'feature_fraction': 0.9849549260809971}. Best is trial 0 with value: 0.9603895103778503.


Early stopping, best iteration is:
[11]	valid_0's ndcg@5: 0.91742
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:44,382] Trial 2 finished with value: 1.0 and parameters: {'learning_rate': 0.16816410175208013, 'num_leaves': 47, 'lambda_l1': 1.8182496720710062, 'lambda_l2': 1.8340450985343382, 'bagging_fraction': 0.6521211214797689, 'feature_fraction': 0.762378215816119}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[45]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:45,056] Trial 3 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.092069553542002, 'num_leaves': 58, 'lambda_l1': 6.118528947223795, 'lambda_l2': 1.3949386065204183, 'bagging_fraction': 0.6460723242676091, 'feature_fraction': 0.6831809216468459}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[41]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:45,581] Trial 4 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.09665329700123683, 'num_leaves': 122, 'lambda_l1': 1.9967378215835974, 'lambda_l2': 5.142344384136116, 'bagging_fraction': 0.7962072844310213, 'feature_fraction': 0.5232252063599989}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[7]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:46,420] Trial 5 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.1254335218612733, 'num_leaves': 42, 'lambda_l1': 0.6505159298527952, 'lambda_l2': 9.488855372533333, 'bagging_fraction': 0.9828160165372797, 'feature_fraction': 0.9041986740582306}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[30]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:47,086] Trial 6 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.06787661614294044, 'num_leaves': 32, 'lambda_l1': 6.842330265121569, 'lambda_l2': 4.4015249373960135, 'bagging_fraction': 0.5610191174223894, 'feature_fraction': 0.7475884550556351}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[51]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:47,534] Trial 7 finished with value: 0.9570306885501214 and parameters: {'learning_rate': 0.016533819011891496, 'num_leaves': 139, 'lambda_l1': 2.587799816000169, 'lambda_l2': 6.62522284353982, 'bagging_fraction': 0.6558555380447055, 'feature_fraction': 0.7600340105889054}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[17]	valid_0's ndcg@5: 0.957031
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:48,101] Trial 8 finished with value: 0.9625128349964738 and parameters: {'learning_rate': 0.11387495307522313, 'num_leaves': 44, 'lambda_l1': 9.695846277645586, 'lambda_l2': 7.7513282336111455, 'bagging_fraction': 0.9697494707820946, 'feature_fraction': 0.9474136752138245}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[45]	valid_0's ndcg@5: 0.962513
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:48,866] Trial 9 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.12360099597410618, 'num_leaves': 140, 'lambda_l1': 0.884925020519195, 'lambda_l2': 1.959828624191452, 'bagging_fraction': 0.522613644455269, 'feature_fraction': 0.6626651653816322}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[29]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:49,334] Trial 10 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.19581925407542802, 'num_leaves': 82, 'lambda_l1': 3.6872877272109736, 'lambda_l2': 0.1796187561481961, 'bagging_fraction': 0.8029358617165022, 'feature_fraction': 0.8451235367845726}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[9]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:50,262] Trial 11 finished with value: 1.0 and parameters: {'learning_rate': 0.17068579312142942, 'num_leaves': 21, 'lambda_l1': 0.37248352985317434, 'lambda_l2': 9.957948552553766, 'bagging_fraction': 0.996202058341917, 'feature_fraction': 0.8736970936526044}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[61]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:50,949] Trial 12 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.1769197468967294, 'num_leaves': 23, 'lambda_l1': 0.18278272444875965, 'lambda_l2': 3.3860745211327066, 'bagging_fraction': 0.8780958550259074, 'feature_fraction': 0.8365851505488732}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[17]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:51,565] Trial 13 finished with value: 0.9625128349964738 and parameters: {'learning_rate': 0.16253840674237013, 'num_leaves': 70, 'lambda_l1': 4.041553554315727, 'lambda_l2': 9.710665954291946, 'bagging_fraction': 0.7130662749503884, 'feature_fraction': 0.8263406001124904}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[47]	valid_0's ndcg@5: 0.962513
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:52,328] Trial 14 finished with value: 1.0 and parameters: {'learning_rate': 0.15666556235221354, 'num_leaves': 99, 'lambda_l1': 2.12320586644048, 'lambda_l2': 3.427992882663163, 'bagging_fraction': 0.8876293236567184, 'feature_fraction': 0.7574852070104715}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[58]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:53,097] Trial 15 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.14548079853636997, 'num_leaves': 20, 'lambda_l1': 3.2407882666785666, 'lambda_l2': 8.362491687987395, 'bagging_fraction': 0.7104520545908102, 'feature_fraction': 0.8866498554013109}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[35]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:53,746] Trial 16 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.1970552474174781, 'num_leaves': 57, 'lambda_l1': 1.3881283755258784, 'lambda_l2': 2.104294281386745, 'bagging_fraction': 0.8903345089717967, 'feature_fraction': 0.6788144755558251}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[19]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:54,303] Trial 17 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.13888988334565458, 'num_leaves': 102, 'lambda_l1': 4.723219616672015, 'lambda_l2': 3.3259871020889737, 'bagging_fraction': 0.6462217384063956, 'feature_fraction': 0.8010432208060295}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[38]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:55,158] Trial 18 finished with value: 1.0 and parameters: {'learning_rate': 0.17164924085214237, 'num_leaves': 41, 'lambda_l1': 1.3787401756220243, 'lambda_l2': 5.0902714455019344, 'bagging_fraction': 0.7851232997829733, 'feature_fraction': 0.9003707193018444}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[39]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-12-03 20:53:56,142] Trial 19 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.18170292925343698, 'num_leaves': 56, 'lambda_l1': 0.004795398173113807, 'lambda_l2': 0.28687726743779063, 'bagging_fraction': 0.8404619928673609, 'feature_fraction': 0.606315181465154}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[18]	valid_0's ndcg@5: 0.981256


In [32]:
print("\n------------------------------------------------")
print("EN İYİ PARAMETRELER:", study.best_params)
print("EN İYİ SKOR (NDCG@5):", study.best_value)
print("------------------------------------------------")


------------------------------------------------
EN İYİ PARAMETRELER: {'learning_rate': 0.16816410175208013, 'num_leaves': 47, 'lambda_l1': 1.8182496720710062, 'lambda_l2': 1.8340450985343382, 'bagging_fraction': 0.6521211214797689, 'feature_fraction': 0.762378215816119}
EN İYİ SKOR (NDCG@5): 1.0
------------------------------------------------


In [33]:
best_params = study.best_params

In [34]:
best_params['objective'] = 'lambdarank'
best_params['metric'] = 'ndcg'
best_params['eval_at'] = 5
best_params['random_state'] = 42

In [35]:
final_model = lgb.LGBMRanker(**best_params)

In [36]:
final_model.fit(
    X_train, y_train,
    group=q_train,
    eval_set=[(X_val, y_val)],
    eval_group=[q_val],
    callbacks=[lgb.early_stopping(stopping_rounds=50)]
)

Training until validation scores don't improve for 50 rounds




Early stopping, best iteration is:
[45]	valid_0's ndcg@5: 1


0,1,2
,boosting_type,'gbdt'
,num_leaves,47
,max_depth,-1
,learning_rate,0.16816410175208013
,n_estimators,100
,subsample_for_bin,200000
,objective,'lambdarank'
,class_weight,
,min_split_gain,0.0
,min_child_weight,0.001


In [37]:
os.makedirs('../models', exist_ok=True)
joblib.dump(final_model, '../models/lgbm_ranker.pkl')

['../models/lgbm_ranker.pkl']