Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top k multi error #2178

Merged
merged 12 commits into from May 26, 2019
Merged

Top k multi error #2178

merged 12 commits into from May 26, 2019

Conversation

btrotta
Copy link
Collaborator

@btrotta btrotta commented May 15, 2019

Implements the feature requested in issue #1139 . Added new parameter top_k_threshold. When the metric is set to multi_error, parameter top_k_threshold can be used to obtain the top-k multi-error. By default this parameter is set to 1, which gives the usual multi-error.

@msftclas
Copy link

msftclas commented May 15, 2019

CLA assistant check
All CLA requirements met.

@guolinke guolinke requested a review from StrikerRUS May 15, 2019 09:18
@StrikerRUS
Copy link
Collaborator

@btrotta Thank you very much for your contribution!

Seems that your new tests fail on Python 2.7. Can you please take a look?

Also please move them to the existent test_engine.py file. Maybe in the future we should divide our tests in more elegant way, but for now there is no need to create a new separate file for 2 small tests.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented May 15, 2019

We already have top_k/topk parameter for voting parallel: https://lightgbm.readthedocs.io/en/latest/Parameters.html#top_k.
I suppose that for this parameter we should take less confusing name. What about multi_error_top_k_threshold or simply multi_error_top_k?
@guolinke

@guolinke
Copy link
Collaborator

yes, I think it is better to change the parameter name.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that new behavior differs from previous one in case of model predicts all classes to be equally likely.
Consider topk@1 and score = [0.33, 0.33, 0.33]
Old:
LossOnPoint => 1.0
New:
LossOnPoint => 0.0

@btrotta
Copy link
Collaborator Author

btrotta commented May 17, 2019

@StrikerRUS Good point! I've fixed this in the latest commit. The metric is now defined so that the top-k error on a sample is 0 if there are at least num_classes - k predictions strictly less than the prediction on the true class.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented May 17, 2019

@btrotta Thank you for hotfixing that!
I think we can enhance the metric name to help users orient in logs.
Just like map or ndcg:

for (auto k : eval_at_) {
name_.emplace_back(std::string("map@") + std::to_string(k));
}

The only thing I'm doubt about is that we cannot change multi_error -> multi_error@1 due to backward compatibility with users' existing codebase which relies on the previous name. So, it should be a special case.

@btrotta
Copy link
Collaborator Author

btrotta commented May 17, 2019

@StrikerRUS Done

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Now the general concept looks good to me. Just several nitpicks below for the consistency with existing codebase and formatting style.

include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Show resolved Hide resolved
src/metric/multiclass_metric.hpp Outdated Show resolved Hide resolved
src/metric/multiclass_metric.hpp Outdated Show resolved Hide resolved
@btrotta
Copy link
Collaborator Author

btrotta commented May 18, 2019

Formatting issues fixed in latest commit.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btrotta Thank you very much for your contribution!
LGTM except two comments below!

include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Show resolved Hide resolved
@StrikerRUS StrikerRUS requested a review from guolinke May 18, 2019 10:36
@btrotta
Copy link
Collaborator Author

btrotta commented May 18, 2019

@StrikerRUS Thanks so much for your careful review! Those 2 issues with docs are fixed now.

@StrikerRUS
Copy link
Collaborator

@guolinke Can you please give a second review to this PR?

@guolinke
Copy link
Collaborator

it looks good to me

@StrikerRUS
Copy link
Collaborator

@guolinke Should this PR be in 2.2.4 release?

@guolinke
Copy link
Collaborator

@StrikerRUS yeah, it can be.

@StrikerRUS
Copy link
Collaborator

@guolinke Then I'm merging 😃

@StrikerRUS StrikerRUS merged commit b3db9e9 into microsoft:master May 26, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants