-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Pairwise Ranking/Scoring in LambdaMART #6147
Labels
Comments
FYI, @shiyu1994, @jameslamb. |
exciting! I've added this to #2302 along with other features. I have 2 questions:
|
Hi @jameslamb,
|
Linking #6182. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Introduce an alternative for the ranking task (powered by LambdaMART algorithm) to leverage pairwise scoring function (instead of the standard pointwise one).
Motivation
We expect to increase the performance of ranking models trained with LightGBM in most cases.
Description
LambdaMART (the learning-to-rank algorithm implemented in LightGBM) is often referred to as pairwise due to its loss function defined over pairs of documents from the ranked list. However, this pairwise loss function leverages pointwise scoring function s(x) defined over feature vectors of individual documents. Here we propose to extend LambdaMART to use pairwise scoring function s(x_i, x_j) defined on any pair of documents whose feature vectors are x_i and x_j. The advantage of using pairwise scoring function comes when we let it, additionally, use the differential (delta) features: x_i - x_j. That is, we learn a pairwise scoring function s(x_i, x_j, x_i - x_j). The use of delta features can often help ranking documents accurately by a much smaller model: imagine we have a document recency feature which is highly correlated with relevance. To rank documents effectively by this feature with a pointwise scoring function s(x) would require building a regression tree/ensemble with many splits corresponding to as many levels of recency as possible. On the other hand, when we use delta features (i.e., delta recency in this case), having only one split could be enough to predict relative relevance for a pair of documents, and rank them accurately. As a result, we have smaller and more accurate models with better generalization properties.
References
The text was updated successfully, but these errors were encountered: