Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] first cut at LambdaMART #2580

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
c423210
lambdamart, rebased on master
jwkvam Jan 25, 2014
1367bf9
Fixes bug happening subsample < 1 with non-individual groups in Lambd…
Jan 28, 2014
9f869e5
- hardcode ndcg loss
jwkvam Jan 29, 2014
e7fc6b3
Added application example comparing LambdaMART with gradient boosting
jwkvam Jan 29, 2014
6391439
PEP8 + naming conventions
ogrisel Jan 29, 2014
d5f1d6c
PEP8 and naming conventions
ogrisel Jan 29, 2014
293d346
Merge pull request #1 from ogrisel/lambdamart-ogrisel
jwkvam Jan 31, 2014
0b2dbd2
Added max_rank parameter to LambdaMART.
jwkvam Jan 31, 2014
b9d3da2
Faster inverse permutations. Other minor code cleanup.
jwkvam Feb 5, 2014
abefeb4
Use default gain of (2**y - 1).
jwkvam Feb 11, 2014
b4bd93a
add doc string for gain parameter
jwkvam Feb 11, 2014
3c759c8
Make gain function private. Added test.
jwkvam Feb 12, 2014
7f066fa
Add pessimistic tie breaks. Add groupby to cleanup code. Use ravel to
jwkvam Feb 16, 2014
c70bf6f
Added LambdaMART to some existing tests. Moved LambdaMART specific code
jwkvam Feb 16, 2014
bfa2882
- Provide rational for tree leaf updates, leaf = numerator /
jwkvam Feb 25, 2014
0afad72
Updated docs, added test for invalid max_rank values.
jwkvam Feb 27, 2014
27db889
More doc updates to keep this consistent for the time being.
jwkvam Feb 27, 2014
f209668
rename group to sample_group to be consistent with sample_weight
jwkvam Feb 27, 2014
ee84b3d
It is just as good to use the ZeroEstimator for LambdaMART as using the
jwkvam Feb 27, 2014
12b54ca
Refactor sample_group out of BaseGradientBoosting
jwkvam Feb 28, 2014
86bc5e8
remove **kargs from loss.__call__
jwkvam Feb 28, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 48 additions & 0 deletions examples/applications/letor_with_lambdamart.py
@@ -0,0 +1,48 @@
#!/usr/bin/env python
"""
================================
Learning to Rank with LambdaMART
================================

LambdaMART is a state-of-the-art algorithm for learn to rank problems.
This example uses the MQ2008 dataset and compares LambdaMART with
linear regression and least squares gradient boosting.

The MQ2008 dataset is available at the following links,
download MQ2008.rar and extract the files to 'data' folder.

https://research.microsoft.com/en-us/um/beijing/projects/letor/letor4download.aspx
https://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rar
"""

# Author: Jacques Kvam <jwkvam@gmail.com>
# License: BSD 3 clause

from sklearn.datasets import load_svmlight_file
from sklearn.ensemble import LambdaMART
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from time import time

X_train, y_train, group_train = load_svmlight_file('data/train.txt',
query_id=True)
X_valid, y_valid, group_valid = load_svmlight_file('data/vali.txt',
query_id=True)

X_train = X_train.todense()
X_valid = X_valid.todense()

# transform labels to NDCG relevance values
y_train = 2**y_train - 1
y_valid = 2**y_valid - 1

t0 = time()
lm = LambdaMART().fit(X_train, y_train, group=group_train)
print("LambdaMART fit in %fs" % (time() - t0))
gb = GradientBoostingRegressor().fit(X_train, y_train)
lr = LinearRegression().fit(X_train, y_train)
for reg in [lm, gb, lr]:
print("%s training score is %f" % (reg.__class__.__name__,
lm.loss_(y_train, reg.predict(X_train)[:, None], group_train)))
print("%s validation score is %f" % (reg.__class__.__name__,
lm.loss_(y_valid, reg.predict(X_valid)[:, None], group_valid)))
6 changes: 4 additions & 2 deletions sklearn/ensemble/__init__.py
Expand Up @@ -15,6 +15,7 @@
from .weight_boosting import AdaBoostRegressor
from .gradient_boosting import GradientBoostingClassifier
from .gradient_boosting import GradientBoostingRegressor
from .gradient_boosting import LambdaMART

from . import bagging
from . import forest
Expand All @@ -27,6 +28,7 @@
"RandomTreesEmbedding", "ExtraTreesClassifier",
"ExtraTreesRegressor", "BaggingClassifier",
"BaggingRegressor", "GradientBoostingClassifier",
"GradientBoostingRegressor", "AdaBoostClassifier",
"AdaBoostRegressor", "bagging", "forest", "gradient_boosting",
"GradientBoostingRegressor", "LambdaMART",
"AdaBoostClassifier", "AdaBoostRegressor", "bagging",
"forest", "gradient_boosting",
"partial_dependence", "weight_boosting"]