Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

Closed
jameslamb opened this issue May 7, 2021 · 3 comments · Fixed by #4270
Closed

[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

jameslamb opened this issue May 7, 2021 · 3 comments · Fixed by #4270

Comments

@jameslamb
Copy link
Collaborator

Description

While working on #4256, I was confused by the use of LGBM_BoosterGetEvalNames. Based on the documentation, I expected that this function would return the names of validation sets.

* \brief Get names of evaluation datasets.

However, I found that in the places where this function is used in the R and Python packages, it seems to actually be returning a list of evaluation metrics.

_safe_call(_LIB.LGBM_BoosterGetEvalNames(
self.handle,
ctypes.c_int(self.__num_inner_eval),
ctypes.byref(tmp_out_len),
ctypes.c_size_t(reserved_string_buffer_size),
ctypes.byref(required_string_buffer_size),
ptr_string_buffers))
if self.__num_inner_eval != tmp_out_len.value:
raise ValueError("Length of eval names doesn't equal with num_evals")
if reserved_string_buffer_size < required_string_buffer_size.value:
raise BufferError(
"Allocated eval name buffer size ({}) was inferior to the needed size ({})."
.format(reserved_string_buffer_size, required_string_buffer_size.value)
)
self.__name_inner_eval = \
[string_buffers[i].value.decode('utf-8') for i in range(self.__num_inner_eval)]
self.__higher_better_inner_eval = \
[name.startswith(('auc', 'ndcg@', 'map@', 'average_precision')) for name in self.__name_inner_eval]

.Call(
LGBM_BoosterGetEvalNames_R
, private$handle
, buf_len
, act_len
, buf
)
}
names <- lgb.encode.char(arr = buf, len = act_len)
# Check names' length
if (nchar(names) > 0L) {
# Parse and store privately names
names <- strsplit(names, "\t")[[1L]]
private$eval_names <- names
# some metrics don't map cleanly to metric names, for example "ndcg@1" is just the
# ndcg metric evaluated at the first "query result" in learning-to-rank
metric_names <- gsub("@.*", "", names)
private$higher_better_inner_eval <- .METRICS_HIGHER_BETTER()[metric_names]
}

Is the documentation incorrect? Or have I just misunderstood it?

Reproducible example

The code below uses the same code paths in the Python and R packages linked above. I expected that the output of LGBM_BoosterGetEvalNames would be ["valid_1"] (the name I used for one eval set).

Python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

import lightgbm as lgb

X_train, X_test, y_train, y_test = train_test_split(
    *load_breast_cancer(return_X_y=True),
    test_size=0.1,
    random_state=2
)

train_data = lgb.Dataset(X_train, label=y_train)
valid_data = train_data.create_valid(X_test, label=y_test)

params = {
    "objective": "binary",
    "num_leaves": 31,
    "metric": ["binary_logloss", "auc"]
}
bst = lgb.Booster(params, train_data)
bst.add_valid(valid_data, "valid_1")

# check the result of a call to LGBM_BoosterGetEvalNames()
bst._Booster__get_eval_info()
print(bst._Booster__name_inner_eval)

# ['binary_logloss', 'auc']

R

library(lightgbm)

data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
dtrain <- lgb.Dataset(
    agaricus.train$data
    , label = agaricus.train$label
)
bst <- lgb.train(
    params = list(
        objective = "regression"
        , metric = c("l2", "l1")
    )
    , data = dtrain
    , nrounds = 2L
    , valids = list(
        "valid_1" = lgb.Dataset.create.valid(
            dtrain
            , agaricus.test$data
            , label = agaricus.test$label
        )
    )
)

eval_info <- bst$.__enclos_env__$private$get_eval_info()
print(eval_info)

# [1] "l2" "l1"

Environment info

LightGBM version or commit hash: latest master as of May 6, 2021 (0246721)

Command(s) you used to install LightGBM

# python
cd python-package
python setup.py install

# R
sh build-cran-package.sh
R CMD INSTALL lightgbm_*.tar.gz
@StrikerRUS
Copy link
Collaborator

StrikerRUS commented May 7, 2021

Looks like the issue in docs. Thanks for spotting this!

Incorrectness was introduced here:
https://github.com/microsoft/LightGBM/pull/2076/files#diff-0b41a7775380c6d2b3321f189fe3fd3412c6621c4075ce00067b74f9312f38efR590
Probably with the "help" of my suggestions for that huge PR. Sorry!

Originally it was

\brief Get name of eval

@jameslamb
Copy link
Collaborator Author

Ah ok, thanks! I'll open a PR updating the docs.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants