xgboost learner inverts labels #32

001ben · 2019-09-24T01:15:19Z

Line 102 in b48679b

    
           label = match(as.character(as.matrix(task$data(cols = task$target_names))), lvls) - 1

The match line for extracting labels from the task inverts the labels which messes with measures on binary tasks. This causes issues when supplying a watchlist to an xgb task for early stopping.

# positive class comes first
lvls = c('1', '0')
labels = c('0', '1', '0')
new_labels = match(labels, pos_1_lvls) - 1
new_labels == labels # FALSE

Suggested:

label = length(lvls) - match(as.character(as.matrix(task$data(cols = task$target_names))), lvls)

mllg · 2019-09-24T07:40:45Z

@berndbischl @pat-s this also affects mlr2.

mllg · 2019-09-24T08:47:10Z

Thanks for reporting.

pat-s · 2019-09-24T09:04:28Z

Why were the labels inverted in the first place?

mllg · 2019-09-24T09:07:31Z

xgboost needs the labels translated to 0:nclass. Usually it does not matter how you encode from factor -> int as long as you translate back correctly (and we did this).

However, xgboost supports stuff like early stopping where it calculates performance measures internally to decide whether to terminate or keep going. And for some binary classification measures it matters which class is the positive class (PPV, precision, recall, ...).

bmreiniger · 2020-02-21T21:01:47Z

This obviously causes problems if one wants to extract the underlying xgboost model (in my case, to convert into PMML), but I don't see an easy way around that on the mlr side. (I've brought it up for r2pmml at jpmml/r2pmml#46 (comment).)

mllg · 2020-02-21T21:20:29Z

@bmreiniger Are you still encountering problems in mlr3?

bmreiniger · 2020-02-21T21:39:36Z

@mllg Yes. There's an additional weirdness around column order. Here's the mlr3 adaptation of what I posted over at r2pmml:

library(r2pmml)
library(mlr3)
library(mlr3learner)
library(xgboost)

set.seed(314)

data("iris")
# make binary target
iris$Species <- as.integer(iris$Species)
iris$Species <- as.integer(abs(iris$Species - 2))
iris$Species <- as.factor(iris$Species)

task <- mlr3::TaskClassif$new("bin_iris", iris, "Species")
task
xgb_learner <- lrn("classif.xgboost")
xgb_learner$param_set$values = list(
  objective = 'binary:logistic',
  eval_metric = 'auc',
  nrounds = 10
  )
xgb_learner$predict_type = "prob"

xgb_learner$train(task)

mlr_preds <- xgb_learner$predict(task)

xgb_model <- xgb_learner$model
dmat <- xgb.DMatrix(data = as.matrix(iris[, c(3,4,1,2)]))  # drop Species and reorder columns to match xgb_model$feature_names
xgb_preds <- predict(xgb_model, dmat)

head(mlr_preds$prob)
head(xgb_preds)

mllg closed this as completed in 333f231 Sep 24, 2019

berndbischl mentioned this issue Sep 24, 2019

general learner test for label switching #33

Closed

001ben mentioned this issue Sep 25, 2019

Added general unit test for label switching #34

Closed

pat-s mentioned this issue Oct 2, 2019

Fix xgboost label inv in binary settings mlr-org/mlr#2644

Merged

be-marc mentioned this issue Mar 22, 2021

XGBoost learner uses wrong good/bad logic #167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost learner inverts labels #32

xgboost learner inverts labels #32

001ben commented Sep 24, 2019

mllg commented Sep 24, 2019

mllg commented Sep 24, 2019

pat-s commented Sep 24, 2019

mllg commented Sep 24, 2019 •

edited

Loading

bmreiniger commented Feb 21, 2020

mllg commented Feb 21, 2020

bmreiniger commented Feb 21, 2020 •

edited

Loading

xgboost learner inverts labels #32

xgboost learner inverts labels #32

Comments

001ben commented Sep 24, 2019

mllg commented Sep 24, 2019

mllg commented Sep 24, 2019

pat-s commented Sep 24, 2019

mllg commented Sep 24, 2019 • edited Loading

bmreiniger commented Feb 21, 2020

mllg commented Feb 21, 2020

bmreiniger commented Feb 21, 2020 • edited Loading

mllg commented Sep 24, 2019 •

edited

Loading

bmreiniger commented Feb 21, 2020 •

edited

Loading