Tuning LambdaRank Objective with Sklearn's GroupKFold (RandomSearch) #5660

amir-rahnama · 2023-01-03T17:00:12Z

Description

For the lambdarank objective, the Scikit-learn GroupKFold does not work. Is there a way to make this work? Below is a simple example.

Reproducible example

import numpy as np 
import pandas as pd 
import os 
from sklearn.model_selection import RandomizedSearchCV, GroupKFold
from sklearn.metrics import make_scorer, ndcg_score
import lightgbm


X = [[0.  , 0.  , 0.01],
    [1.  , 0.  , 1.  ],
    [0.43, 0.  , 0.  ],
    [0.43, 0.  , 0.4 ],
    [0.  , 0.  , 0.01],
    [0.  , 0.  , 0.31],
    [0.  , 0.  , 1.  ],
    [0.  , 0.  , 0.  ],
    [0.  , 0.  , 0.15]]

y = [0, 0, 1, 0, 3, 0, 4, 0, 0]
groups = np.array([0, 0, 0, 0, 0, 0, 0, 1, 1]).astype(int)  
flat_group = [7, 2]

## Training on the data works
gbm = lightgbm.LGBMRanker(objective='lambdarank')
gbm.fit(X=X, y=y,group=flat_group)

### Random hyperparameter tuning doesn't work
hyper_params = {
    'n_estimators': [10, 20, 30, 40],
    'num_leaves': [20, 50, 100, 200],
    'max_depth': [5,10,15,20],
    'learning_rate': [0.01, 0.02, 0.03]
}

gkf = GroupKFold(n_splits=2)
folds = gkf.split(X, groups=groups)


grid = RandomizedSearchCV(gbm, hyper_params, n_iter=2, 
        cv=folds, verbose=3, scoring=make_scorer(ndcg_score), 
        error_score='raise')

def group_gen(groups, folds):
    for train, _ in folds:
        yield np.unique(groups[train], return_counts=True)[1]

gen = group_gen(groups, folds)

grid.fit(X, y, group=next(gen))

Which produces the following error:

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[102], line 1
----> 1 grid.fit(X, y, group=next(gen))

StopIteration:

Environment info

Sklearn: 1.1.3
LightGBM: 3.2.1

Additional Comments

The code I pasted is inspired by the solution given in #1137 which refers to https://github.com/Microsoft/LightGBM/blob/4df7b21dcf2ca173a812f9667e30a21ef827104e/python-package/lightgbm/engine.py#L267-L274. However, this does not work in our case.

Any help would be appreciated.

The text was updated successfully, but these errors were encountered:

replacementAI · 2023-01-07T04:16:52Z

Unfortunately I dont think sklearn supports ranking estimators, tho I could be wrong.

amir-rahnama · 2023-01-07T16:00:13Z

@replacementAI Thank you for your feedback. Is there a way to tune the parameters of LightGBM in cross-validation when it comes to ranking models? I tried optuna.integration.lightgbm.LightGBMTuner but it also doesn't work for ranking scenarios.

jameslamb added the question label Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuning LambdaRank Objective with Sklearn's GroupKFold (RandomSearch) #5660

Tuning LambdaRank Objective with Sklearn's GroupKFold (RandomSearch) #5660

amir-rahnama commented Jan 3, 2023 •

edited

replacementAI commented Jan 7, 2023

amir-rahnama commented Jan 7, 2023

Tuning LambdaRank Objective with Sklearn's GroupKFold (RandomSearch) #5660

Tuning LambdaRank Objective with Sklearn's GroupKFold (RandomSearch) #5660

Comments

amir-rahnama commented Jan 3, 2023 • edited

Description

Reproducible example

Environment info

Additional Comments

replacementAI commented Jan 7, 2023

amir-rahnama commented Jan 7, 2023

amir-rahnama commented Jan 3, 2023 •

edited