Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning LambdaRank Objective with Sklearn's GroupKFold (RandomSearch) #5660

Open
amir-rahnama opened this issue Jan 3, 2023 · 2 comments
Open
Labels

Comments

@amir-rahnama
Copy link

amir-rahnama commented Jan 3, 2023

Description

For the lambdarank objective, the Scikit-learn GroupKFold does not work. Is there a way to make this work? Below is a simple example.

Reproducible example

import numpy as np 
import pandas as pd 
import os 
from sklearn.model_selection import RandomizedSearchCV, GroupKFold
from sklearn.metrics import make_scorer, ndcg_score
import lightgbm


X = [[0.  , 0.  , 0.01],
    [1.  , 0.  , 1.  ],
    [0.43, 0.  , 0.  ],
    [0.43, 0.  , 0.4 ],
    [0.  , 0.  , 0.01],
    [0.  , 0.  , 0.31],
    [0.  , 0.  , 1.  ],
    [0.  , 0.  , 0.  ],
    [0.  , 0.  , 0.15]]

y = [0, 0, 1, 0, 3, 0, 4, 0, 0]
groups = np.array([0, 0, 0, 0, 0, 0, 0, 1, 1]).astype(int)  
flat_group = [7, 2]

## Training on the data works
gbm = lightgbm.LGBMRanker(objective='lambdarank')
gbm.fit(X=X, y=y,group=flat_group)

### Random hyperparameter tuning doesn't work
hyper_params = {
    'n_estimators': [10, 20, 30, 40],
    'num_leaves': [20, 50, 100, 200],
    'max_depth': [5,10,15,20],
    'learning_rate': [0.01, 0.02, 0.03]
}

gkf = GroupKFold(n_splits=2)
folds = gkf.split(X, groups=groups)


grid = RandomizedSearchCV(gbm, hyper_params, n_iter=2, 
        cv=folds, verbose=3, scoring=make_scorer(ndcg_score), 
        error_score='raise')

def group_gen(groups, folds):
    for train, _ in folds:
        yield np.unique(groups[train], return_counts=True)[1]

gen = group_gen(groups, folds)

grid.fit(X, y, group=next(gen))

Which produces the following error:

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[102], line 1
----> 1 grid.fit(X, y, group=next(gen))

StopIteration: 

Environment info

Sklearn: 1.1.3
LightGBM: 3.2.1

Additional Comments

The code I pasted is inspired by the solution given in #1137 which refers to https://github.com/Microsoft/LightGBM/blob/4df7b21dcf2ca173a812f9667e30a21ef827104e/python-package/lightgbm/engine.py#L267-L274. However, this does not work in our case.

Any help would be appreciated.

@replacementAI
Copy link

Unfortunately I dont think sklearn supports ranking estimators, tho I could be wrong.

@amir-rahnama
Copy link
Author

@replacementAI Thank you for your feedback. Is there a way to tune the parameters of LightGBM in cross-validation when it comes to ranking models? I tried optuna.integration.lightgbm.LightGBMTuner but it also doesn't work for ranking scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants