### Model Optimizasyonu ve Overfitting Yönetimi
Selin Hoca'nın slaytındaki 'Overfitting' riskini yönetmek ve modelin genelleme yeteneğini artırmak için Optuna ile hiperparametre optimizasyonu yaptım.
Validation setini kullanarak `learning_rate`, `num_leaves` ve regülasyon (`lambda`) parametrelerini bilimsel olarak ayarladım.

In [1]:
import pandas as pd
import numpy as np
import optuna

In [2]:
train_df = pd.read_csv('../data/ranking/train.csv')
val_df = pd.read_csv('../data/ranking/val.csv')

In [3]:
q_train = train_df.groupby('Event_Block', sort=False).size().to_list()
q_val = val_df.groupby('Event_Block', sort=False).size().to_list()

In [4]:
drop_cols = ['Class', 'Time', 'Event_Block']

In [5]:
X_train = train_df.drop(drop_cols, axis=1)
y_train = train_df['Class']

In [6]:
X_val = val_df.drop(drop_cols, axis=1)
y_val = val_df['Class']

In [7]:
print(f"Train Gruplari: {len(q_train)} adet")
print(f"Val Gruplari: {len(q_val)} adet")

Train Gruplari: 33 adet
Val Gruplari: 7 adet


In [11]:
import lightgbm as lgb
from sklearn.metrics import ndcg_score

In [12]:
def objective(trial):
    params = {
        'objective': 'lambdarank', 
        'metric': 'ndcg',          
        'eval_at': 5,              
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'random_state': 42,
        
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2),
        'num_leaves': trial.suggest_int('num_leaves', 20, 150),
        'lambda_l1': trial.suggest_float('lambda_l1', 0, 10),
        'lambda_l2': trial.suggest_float('lambda_l2', 0, 10),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.5, 1.0),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1.0),
    }

    model = lgb.LGBMRanker(**params)

    model.fit(
        X_train, y_train,
        group=q_train,
        eval_set=[(X_val, y_val)],
        eval_group=[q_val],
        callbacks=[lgb.early_stopping(stopping_rounds=50)] 
    )

    score = model.best_score_['valid_0']['ndcg@5']
    return score

print("Optimizasyon başlıyor... (Bu işlem birkaç dakika sürebilir)")
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)

print("\n------------------------------------------------")
print("EN İYİ PARAMETRELER:", study.best_params)
print("EN İYİ SKOR (NDCG@5):", study.best_value)
print("------------------------------------------------")

[I 2025-11-30 18:12:52,240] A new study created in memory with name: no-name-bee0d369-a2ca-4ecf-8db9-a0864f808afe


Optimizasyon başlıyor... (Bu işlem birkaç dakika sürebilir)
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:52,831] Trial 0 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.04792845379393924, 'num_leaves': 112, 'lambda_l1': 8.707290590061314, 'lambda_l2': 8.518315023335091, 'bagging_fraction': 0.6913322652438483, 'feature_fraction': 0.8900113205710999}. Best is trial 0 with value: 0.9603895103778503.


Did not meet early stopping. Best iteration is:
[96]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:53,445] Trial 1 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.0442149038843821, 'num_leaves': 73, 'lambda_l1': 9.997749019978025, 'lambda_l2': 5.960911384846318, 'bagging_fraction': 0.6331726462735414, 'feature_fraction': 0.7624139626863615}. Best is trial 0 with value: 0.9603895103778503.


Did not meet early stopping. Best iteration is:
[68]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:54,306] Trial 2 finished with value: 1.0 and parameters: {'learning_rate': 0.0995788010026312, 'num_leaves': 24, 'lambda_l1': 0.9057918177073165, 'lambda_l2': 1.1653184276747852, 'bagging_fraction': 0.8420171866482506, 'feature_fraction': 0.6296344145026685}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[47]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:54,876] Trial 3 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.09879508507543237, 'num_leaves': 124, 'lambda_l1': 7.621710674805851, 'lambda_l2': 2.704320467114414, 'bagging_fraction': 0.6929275477462401, 'feature_fraction': 0.7103695691176085}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[41]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:55,369] Trial 4 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.05639591520322504, 'num_leaves': 101, 'lambda_l1': 3.688010650998188, 'lambda_l2': 8.125418908535684, 'bagging_fraction': 0.523052509521515, 'feature_fraction': 0.5439938544466906}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[15]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:56,230] Trial 5 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.11309069411640713, 'num_leaves': 49, 'lambda_l1': 1.2990610191498253, 'lambda_l2': 4.245640034962977, 'bagging_fraction': 0.5547597909522967, 'feature_fraction': 0.8249891291934126}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[42]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:57,027] Trial 6 finished with value: 1.0 and parameters: {'learning_rate': 0.06835353783663363, 'num_leaves': 96, 'lambda_l1': 1.7432147753119054, 'lambda_l2': 5.370394669873724, 'bagging_fraction': 0.5891306248078354, 'feature_fraction': 0.8778451662411421}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[99]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:57,567] Trial 7 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.15376374744818114, 'num_leaves': 99, 'lambda_l1': 4.995626286926859, 'lambda_l2': 3.54725328878166, 'bagging_fraction': 0.8530764799089927, 'feature_fraction': 0.6886054119852305}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[36]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:58,155] Trial 8 finished with value: 0.9382871060483583 and parameters: {'learning_rate': 0.020337156387812614, 'num_leaves': 86, 'lambda_l1': 3.8055187430484914, 'lambda_l2': 8.238760697762547, 'bagging_fraction': 0.5911130674910432, 'feature_fraction': 0.8890427911639425}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[95]	valid_0's ndcg@5: 0.938287
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:59,158] Trial 9 finished with value: 1.0 and parameters: {'learning_rate': 0.11118993087773849, 'num_leaves': 48, 'lambda_l1': 0.7691731723076534, 'lambda_l2': 8.123706055290626, 'bagging_fraction': 0.6846778544560612, 'feature_fraction': 0.8934098801563726}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[57]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:12:59,912] Trial 10 finished with value: 1.0 and parameters: {'learning_rate': 0.19056439012303533, 'num_leaves': 20, 'lambda_l1': 0.02679804103667016, 'lambda_l2': 1.7274974382452166, 'bagging_fraction': 0.9876036432968848, 'feature_fraction': 0.526025785295285}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[24]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:00,657] Trial 11 finished with value: 1.0 and parameters: {'learning_rate': 0.08514041379036365, 'num_leaves': 148, 'lambda_l1': 2.663119750810714, 'lambda_l2': 0.07425696805744075, 'bagging_fraction': 0.8144622924340417, 'feature_fraction': 0.9773122479149413}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[68]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:01,358] Trial 12 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.13221068923696344, 'num_leaves': 64, 'lambda_l1': 2.2745700524527077, 'lambda_l2': 5.878763822882708, 'bagging_fraction': 0.8970307377083501, 'feature_fraction': 0.6043743446151588}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[34]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:01,986] Trial 13 finished with value: 0.9625128349964738 and parameters: {'learning_rate': 0.07473112882426, 'num_leaves': 21, 'lambda_l1': 6.316388832529393, 'lambda_l2': 0.1502126671870334, 'bagging_fraction': 0.7766459665783365, 'feature_fraction': 0.6185149792311752}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[78]	valid_0's ndcg@5: 0.962513
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:02,806] Trial 14 finished with value: 1.0 and parameters: {'learning_rate': 0.14972724955106326, 'num_leaves': 132, 'lambda_l1': 1.8665495732594302, 'lambda_l2': 5.8916363451545415, 'bagging_fraction': 0.9329098317547211, 'feature_fraction': 0.993361879009312}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[60]	valid_0's ndcg@5: 1
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:03,205] Trial 15 finished with value: 0.9570306885501214 and parameters: {'learning_rate': 0.019118650104660295, 'num_leaves': 40, 'lambda_l1': 4.060802131417253, 'lambda_l2': 1.887211205286363, 'bagging_fraction': 0.7670232552578146, 'feature_fraction': 0.7883740136679778}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[4]	valid_0's ndcg@5: 0.957031
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:04,267] Trial 16 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.07165357184597428, 'num_leaves': 79, 'lambda_l1': 0.055032512376738785, 'lambda_l2': 4.79895349462844, 'bagging_fraction': 0.8629438470196213, 'feature_fraction': 0.6339030520257584}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[51]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:05,003] Trial 17 finished with value: 0.9812564174982369 and parameters: {'learning_rate': 0.12643838850059658, 'num_leaves': 93, 'lambda_l1': 2.7857948755898243, 'lambda_l2': 6.952453798281119, 'bagging_fraction': 0.6196598766470685, 'feature_fraction': 0.8399158713398436}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[45]	valid_0's ndcg@5: 0.981256
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:05,618] Trial 18 finished with value: 0.9603895103778503 and parameters: {'learning_rate': 0.09357110195935393, 'num_leaves': 64, 'lambda_l1': 5.903010360328257, 'lambda_l2': 3.347626816320904, 'bagging_fraction': 0.5038311234770345, 'feature_fraction': 0.6962217575875714}. Best is trial 2 with value: 1.0.


Early stopping, best iteration is:
[43]	valid_0's ndcg@5: 0.96039
Training until validation scores don't improve for 50 rounds


[I 2025-11-30 18:13:06,605] Trial 19 finished with value: 1.0 and parameters: {'learning_rate': 0.19115677497326883, 'num_leaves': 112, 'lambda_l1': 1.2158755007973108, 'lambda_l2': 9.940314941065147, 'bagging_fraction': 0.7252335694073534, 'feature_fraction': 0.9472093216417008}. Best is trial 2 with value: 1.0.


Did not meet early stopping. Best iteration is:
[66]	valid_0's ndcg@5: 1

------------------------------------------------
EN İYİ PARAMETRELER: {'learning_rate': 0.0995788010026312, 'num_leaves': 24, 'lambda_l1': 0.9057918177073165, 'lambda_l2': 1.1653184276747852, 'bagging_fraction': 0.8420171866482506, 'feature_fraction': 0.6296344145026685}
EN İYİ SKOR (NDCG@5): 1.0
------------------------------------------------


In [14]:
best_params = study.best_params

In [15]:
best_params['objective'] = 'lambdarank'
best_params['metric'] = 'ndcg'
best_params['eval_at'] = 5
best_params['random_state'] = 42

In [16]:
final_model = lgb.LGBMRanker(**best_params)

In [17]:
final_model.fit(
    X_train, y_train,
    group=q_train,
    eval_set=[(X_val, y_val)],
    eval_group=[q_val],
    callbacks=[lgb.early_stopping(stopping_rounds=50)]
)



Training until validation scores don't improve for 50 rounds
Early stopping, best iteration is:
[47]	valid_0's ndcg@5: 1


0,1,2
,boosting_type,'gbdt'
,num_leaves,24
,max_depth,-1
,learning_rate,0.0995788010026312
,n_estimators,100
,subsample_for_bin,200000
,objective,'lambdarank'
,class_weight,
,min_split_gain,0.0
,min_child_weight,0.001


In [18]:
import os
import joblib

In [19]:
os.makedirs('../models', exist_ok=True)
joblib.dump(final_model, '../models/lgbm_ranker.pkl')

['../models/lgbm_ranker.pkl']