Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] confusing error when using lgb.train() with an unknown metric #3481

Open
jameslamb opened this issue Oct 25, 2020 · 2 comments
Open

Comments

@jameslamb
Copy link
Collaborator

How you are using LightGBM?

LightGBM component: R package

Environment info

Operating System: macOS 10.14

C++ compiler version: gcc 8.1.0

CMake version: 3.17.3

R version: 4.0.2

LightGBM version or commit hash: https://github.com/microsoft/LightGBM/tree/c07644d1d71540204a9b56f26667e8180bd009e2

Reproducible example(s)

Thanks to @Laurae2 for sharing this with me and creating the reproducible example below.

After installing with Rscript build_r.R, code that uses an unrecognized metric, like this:

library(lightgbm)
library(data.table)
set.seed(1)
labels <- sample(2, 100, replace = TRUE) - 1
data <- as.matrix(data.frame(A = runif(100, min = 0, max = 1), B = runif(100, min = 0, max = 1)))
data_train <- data[1:90, ]
data_valid <- data[91:100, ]
labels_train <- labels[1:90]
labels_valid <- labels[91:100]
dtrain_lgb <- lgb.Dataset(data_train, label = labels_train)
dvalid_lgb <- lgb.Dataset.create.valid(dtrain_lgb, data_valid, label = labels_valid)
valids_lgb <- list(valid = dvalid_lgb)

model <- lgb.train(
    obj = "binary",
    params = list(metric = "nonsense"),
    data = dtrain_lgb,
    valids = valids_lgb,
    nrounds = 1,
    verbose = 1,
    num_thread = 1
)

produces this error

[LightGBM] [Info] Number of positive: 47, number of negative: 43
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000016 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 62
[LightGBM] [Info] Number of data points in the train set: 90, number of used features: 2
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.522222 -> initscore=0.088947
[LightGBM] [Info] Start training from score 0.088947
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Error in env$eval_list[[1L]] : subscript out of bound

How to close this issue

I think the ideal fix is to raise an error from the C++ side for unrecognized metrics, so that all wrappers benefit from the fix. That would mean changing

return nullptr;
to raise an error instead of returning a null pointer.

If this change isn't made on the C++ side, I would add a new .METRIC_ALIASES in https://github.com/microsoft/LightGBM/blob/c07644d1d71540204a9b56f26667e8180bd009e2/R-package/R/aliases.R, which lists all of the valid metrics from https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters, and then raise an error in lgb.check.eval() when any unknown metrics are provided in params.

@Laurae2 @guolinke @StrikerRUS @btrotta what do you think?

@guolinke
Copy link
Collaborator

@jameslamb we could have a key, like 'na', 'nan', 'empty', for the empty metrics, and it returns nullptr in this case.
Otherwise it should throw errors.
The same strategy could be adapted to objective functions.

@jameslamb
Copy link
Collaborator Author

I like that idea! I can open a pull request so we can see what it would look like.

I think it would help reduce confusion. Even just a few minutes ago, another user ran into this issue where they used an unsupported metric but got a seemingly-unrelated error message: #3028 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants