[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

jameslamb · 2021-05-07T17:01:24Z

Description

While working on #4256, I was confused by the use of LGBM_BoosterGetEvalNames. Based on the documentation, I expected that this function would return the names of validation sets.

LightGBM/include/LightGBM/c_api.h

Line 587 in f831808

* \brief Get names of evaluation datasets.

However, I found that in the places where this function is used in the R and Python packages, it seems to actually be returning a list of evaluation metrics.

LightGBM/python-package/lightgbm/basic.py

Lines 3463 to 3481 in f831808

    
           _safe_call(_LIB.LGBM_BoosterGetEvalNames( 
        
               self.handle, 
        
               ctypes.c_int(self.__num_inner_eval), 
        
               ctypes.byref(tmp_out_len), 
        
               ctypes.c_size_t(reserved_string_buffer_size), 
        
               ctypes.byref(required_string_buffer_size), 
        
               ptr_string_buffers)) 
        
           if self.__num_inner_eval != tmp_out_len.value: 
        
               raise ValueError("Length of eval names doesn't equal with num_evals") 
        
           if reserved_string_buffer_size < required_string_buffer_size.value: 
        
               raise BufferError( 
        
                   "Allocated eval name buffer size ({}) was inferior to the needed size ({})." 
        
                   .format(reserved_string_buffer_size, required_string_buffer_size.value) 
        
               ) 
        
           self.__name_inner_eval = \ 
        
               [string_buffers[i].value.decode('utf-8') for i in range(self.__num_inner_eval)] 
        
           self.__higher_better_inner_eval = \ 
        
               [name.startswith(('auc', 'ndcg@', 'map@', 'average_precision')) for name in self.__name_inner_eval]

LightGBM/R-package/R/lgb.Booster.R

Lines 684 to 706 in f831808

    
             .Call( 
        
               LGBM_BoosterGetEvalNames_R 
        
               , private$handle 
        
               , buf_len 
        
               , act_len 
        
               , buf 
        
             ) 
        
           } 
        
           names <- lgb.encode.char(arr = buf, len = act_len) 
        
           # Check names' length 
        
           if (nchar(names) > 0L) { 
        
             # Parse and store privately names 
        
             names <- strsplit(names, "\t")[[1L]] 
        
             private$eval_names <- names 
        
             # some metrics don't map cleanly to metric names, for example "ndcg@1" is just the 
        
             # ndcg metric evaluated at the first "query result" in learning-to-rank 
        
             metric_names <- gsub("@.*", "", names) 
        
             private$higher_better_inner_eval <- .METRICS_HIGHER_BETTER()[metric_names] 
        
           }

Is the documentation incorrect? Or have I just misunderstood it?

Reproducible example

The code below uses the same code paths in the Python and R packages linked above. I expected that the output of LGBM_BoosterGetEvalNames would be ["valid_1"] (the name I used for one eval set).

Python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

import lightgbm as lgb

X_train, X_test, y_train, y_test = train_test_split(
    *load_breast_cancer(return_X_y=True),
    test_size=0.1,
    random_state=2
)

train_data = lgb.Dataset(X_train, label=y_train)
valid_data = train_data.create_valid(X_test, label=y_test)

params = {
    "objective": "binary",
    "num_leaves": 31,
    "metric": ["binary_logloss", "auc"]
}
bst = lgb.Booster(params, train_data)
bst.add_valid(valid_data, "valid_1")

# check the result of a call to LGBM_BoosterGetEvalNames()
bst._Booster__get_eval_info()
print(bst._Booster__name_inner_eval)

# ['binary_logloss', 'auc']

R

library(lightgbm)

data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
dtrain <- lgb.Dataset(
    agaricus.train$data
    , label = agaricus.train$label
)
bst <- lgb.train(
    params = list(
        objective = "regression"
        , metric = c("l2", "l1")
    )
    , data = dtrain
    , nrounds = 2L
    , valids = list(
        "valid_1" = lgb.Dataset.create.valid(
            dtrain
            , agaricus.test$data
            , label = agaricus.test$label
        )
    )
)

eval_info <- bst$.__enclos_env__$private$get_eval_info()
print(eval_info)

# [1] "l2" "l1"

Environment info

LightGBM version or commit hash: latest master as of May 6, 2021 (0246721)

Command(s) you used to install LightGBM

# python
cd python-package
python setup.py install

# R
sh build-cran-package.sh
R CMD INSTALL lightgbm_*.tar.gz

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2021-05-07T20:49:55Z

Looks like the issue in docs. Thanks for spotting this!

Incorrectness was introduced here:
https://github.com/microsoft/LightGBM/pull/2076/files#diff-0b41a7775380c6d2b3321f189fe3fd3412c6621c4075ce00067b74f9312f38efR590
Probably with the "help" of my suggestions for that huge PR. Sorry!

Originally it was

\brief Get name of eval

jameslamb · 2021-05-10T03:55:00Z

Ah ok, thanks! I'll open a PR updating the docs.

…valCounts (fxies #4264)

…valCounts (fixes #4264)

…valCounts (fixes #4264) (#4270)

github-actions · 2023-08-23T14:38:23Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added question doc labels May 7, 2021

jameslamb mentioned this issue May 7, 2021

[R-package] move creation of character vectors in some methods to C++ side #4256

Merged

jameslamb added a commit that referenced this issue May 10, 2021

[docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetE…

6d58ab9

…valCounts (fxies #4264)

jameslamb added a commit that referenced this issue May 10, 2021

[docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetE…

4e69279

…valCounts (fixes #4264)

jameslamb mentioned this issue May 10, 2021

[docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetEvalCounts (fixes #4264) #4270

Merged

StrikerRUS closed this as completed in #4270 May 10, 2021

StrikerRUS pushed a commit that referenced this issue May 10, 2021

[docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetE…

08d1ce4

…valCounts (fixes #4264) (#4270)

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

jameslamb commented May 7, 2021

StrikerRUS commented May 7, 2021 •

edited

Loading

jameslamb commented May 10, 2021

github-actions bot commented Aug 23, 2023

[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

[docs] what should LGBM_BoosterGetEvalNames be used for? #4264

Comments

jameslamb commented May 7, 2021

Description

Reproducible example

Environment info

StrikerRUS commented May 7, 2021 • edited Loading

jameslamb commented May 10, 2021

github-actions bot commented Aug 23, 2023

StrikerRUS commented May 7, 2021 •

edited

Loading