New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In the rank objective, lambdas and hessians need to factor sigmoid_ into the computation. #2322
Conversation
…ditionally, the sigmoid function has an arbitrary factor of 2 in the exponent; it is not just non-standard but the gradients are not computed correctly anyway.
I am also proposing to remove what looks like a heuristic that normalizes the gradients by the difference in document scores. This is not part of the LambdaMART algorithm and at any rate does not help with convergence rate or generalization. An alternative to this change is to make such heuristics optional. I have also trained a model on MSLR Web30k (Fold 1) and Yahoo! Learning-to-Rank Challenge (Set 1) datasets and what I have obtained is no worse or better than a model trained from the head: NDCG@5 on Web30k before the change is 49.20 (+/-0.07) and after the change it is 49.05 (+/-0.09) where sigmoid_ is set to two (post change).
|
Update: I wanted to share an updated NDCG@5 after much fine-tuning on validation sets: Web30K Fold 1: 49.39 (+/-0.08) As noted in an earlier comment, queries with no relevant documents were discarded from the test set. Here are the hyperparameters I used if you were to reproduce the results: For Web30k (Yahoo): learning rate is 0.05 (0.05), num_leaves is 400 (400), min_data_in_leaf is 50 (100), and min_sum_hessian_in_leafis set to 200 (10), and sigmoid_ is set to 2. |
p_lambda *= -delta_pair_NDCG; | ||
p_hessian *= 2 * delta_pair_NDCG; | ||
p_lambda *= -sigmoid_ * delta_pair_NDCG; | ||
p_hessian *= sigmoid_ * sigmoid_ * delta_pair_NDCG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about eliminating sigmoid_ here. That is, no sigmoid_ in p_lambda, and only sigmoid_ in p_hessian. What is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the LambdaMART (implicit) cost function has the following form:
log(1 + exp(-sigmoid_ * (s_i - s_j)))
The first derivative of the above has the sigmoid_ term -- so mathematically p_lambda in the code must have the sigmoid_ term in its computation. The second derivative will have sigmoid_^2 as proposed in the code change.
For more details, please see Section 7 of this paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.180.634&rep=rep1&type=pdf
Thanks for accepting and merging the pull request. I'd certainly be happy
to send another PR and add the normalization piece back conditioned on a
boolean; but a couple of questions: (1) Would it make sense to add a flag
to the config file? (e.g., lambdarank_normalize_gradients_by_score_diff)
and (2) Could it be false by default in the public library and set to true
in internal configuration?
…On Fri, Aug 16, 2019 at 8:48 PM Guolin Ke ***@***.***> wrote:
@SBrush <https://github.com/SBrush> would you mind to create another PR
for the normalization?
We can add a bool parameter for that function.
And I think it is better to open by default, for the consistent behavious
as before.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2322?email_source=notifications&email_token=AKZUQSOVV3IOWHQAXYI2BDDQE5YHTA5CNFSM4ILC2X52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QCS3I#issuecomment-522201453>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKZUQSK7FOWD6Z5GO2JGYKLQE5YHTANCNFSM4ILC2X5Q>
.
|
Additionally, the sigmoid function has an arbitrary factor of 2 in the exponent; it is not just non-standard but the gradients are not computed correctly anyway.