Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the rank objective, lambdas and hessians need to factor sigmoid_ into the computation. #2322

Merged
merged 4 commits into from Aug 17, 2019

Conversation

sbruch
Copy link
Contributor

@sbruch sbruch commented Aug 12, 2019

Additionally, the sigmoid function has an arbitrary factor of 2 in the exponent; it is not just non-standard but the gradients are not computed correctly anyway.

…ditionally, the sigmoid function has an arbitrary factor of 2 in the exponent; it is not just non-standard but the gradients are not computed correctly anyway.
@sbruch sbruch changed the title Lambdas and hessians need to factor sigmoid_ into the computation. In the rank objective, lambdas and hessians need to factor sigmoid_ into the computation. Aug 12, 2019
@msftclas
Copy link

msftclas commented Aug 12, 2019

CLA assistant check
All CLA requirements met.

@sbruch
Copy link
Contributor Author

sbruch commented Aug 12, 2019

I am also proposing to remove what looks like a heuristic that normalizes the gradients by the difference in document scores. This is not part of the LambdaMART algorithm and at any rate does not help with convergence rate or generalization. An alternative to this change is to make such heuristics optional.

I have also trained a model on MSLR Web30k (Fold 1) and Yahoo! Learning-to-Rank Challenge (Set 1) datasets and what I have obtained is no worse or better than a model trained from the head:

NDCG@5 on Web30k before the change is 49.20 (+/-0.07) and after the change it is 49.05 (+/-0.09)
NDCG@5 on Yahoo LTRC before the change is 74.16 (+/-0.14) and after the change it is 74.22 (+/-0.08)

where sigmoid_ is set to two (post change).

  • Note that in my experiments I have removed the queries that have no relevant documents to avoid NDCG inflation -- LightGBM computes an NDCG of 1 for such queries.

@sbruch
Copy link
Contributor Author

sbruch commented Aug 15, 2019

Update: I wanted to share an updated NDCG@5 after much fine-tuning on validation sets:

Web30K Fold 1: 49.39 (+/-0.08)
Yahoo! Set 1: 74.97 (+/-0.09)

As noted in an earlier comment, queries with no relevant documents were discarded from the test set. Here are the hyperparameters I used if you were to reproduce the results:

For Web30k (Yahoo): learning rate is 0.05 (0.05), num_leaves is 400 (400), min_data_in_leaf is 50 (100), and min_sum_hessian_in_leafis set to 200 (10), and sigmoid_ is set to 2.

p_lambda *= -delta_pair_NDCG;
p_hessian *= 2 * delta_pair_NDCG;
p_lambda *= -sigmoid_ * delta_pair_NDCG;
p_hessian *= sigmoid_ * sigmoid_ * delta_pair_NDCG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about eliminating sigmoid_ here. That is, no sigmoid_ in p_lambda, and only sigmoid_ in p_hessian. What is your opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the LambdaMART (implicit) cost function has the following form:

log(1 + exp(-sigmoid_ * (s_i - s_j)))

The first derivative of the above has the sigmoid_ term -- so mathematically p_lambda in the code must have the sigmoid_ term in its computation. The second derivative will have sigmoid_^2 as proposed in the code change.

For more details, please see Section 7 of this paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.180.634&rep=rep1&type=pdf

@guolinke guolinke merged commit aee92f6 into microsoft:master Aug 17, 2019
@sbruch
Copy link
Contributor Author

sbruch commented Aug 19, 2019 via email

@guolinke
Copy link
Collaborator

@sbruch refer to #2331, besides, I add the normalization for lambdas/hessians. However, both of them are not tested.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants