set_threshold sometimes returns NA response for thresholds equal to 0. #452

pfistfl · 2020-03-06T12:27:06Z

learner = lrn("classif.rpart", predict_type = "prob")
t = tsk("iris")
prd = learner$train(t)$predict(t)
prd$set_threshold(c(setosa = 1,versicolor = 0, virginica = 0))
prd

This happens in case predicted probabilities of 0 occur together with thresholds 0.
(1 / 0) * 0 -> NaN -> max.col returns NA.

This leads to problems when we e.g. tune over thresholds, as this then breaks the prediction's $score() method.

The text was updated successfully, but these errors were encountered:

mllg · 2020-03-09T10:07:36Z

This is somewhat undefined behavior here. The learner says probability 0, then this gets re-scaled by multiplication with 1/threshold which is Inf for threshold == 0. Is this now a very good label to predict or a very bad label to predict?

pfistfl · 2020-03-09T12:08:51Z

I understand.
I intuitively would have assumed that I could enforce prediction of any level by setting its threshold to 0. This would mean that a prob could never be really 0 I guess.

I guess we need to decide and document what happens here, but returning NA might just make
things harder downstream.

berndbischl · 2020-03-09T18:20:01Z

we had exactly the same issue(s) in mlr2. these edgecases really need to be solved for robust applicability and ROC.

it is also not that undefined? it is simply a tie? and we need/have tiehandling anyway?

berndbischl · 2020-03-09T18:24:17Z

it is simply a tie? and we need/have tiehandling anyway?

no, sorry, i didnt read correctly.
i think this is how it is best:

case 1:
p = c(0.8, 0.1, 0.1)
t = c(1, 0, 0)
--> preference weighs (0.8, Inf, Inf)
This is a tie now between classes 2 and 3

case 2:
p = c(1, 0, 0)
t = c(1, 0, 0)

i think the most reasonable think is to define this now as
pref = c(1, 0, 0).
so in cases where P=0, the weight is always 0.
(then ties coud still occur, in the example here the do not)

agreed?
this would NEED to be properly documentend and tested

mllg · 2020-03-09T20:31:52Z

Agreed. Documented and tested now.

mllg closed this as completed in 2670210 Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set_threshold sometimes returns NA response for thresholds equal to 0. #452

set_threshold sometimes returns NA response for thresholds equal to 0. #452

pfistfl commented Mar 6, 2020

mllg commented Mar 9, 2020

pfistfl commented Mar 9, 2020

berndbischl commented Mar 9, 2020

berndbischl commented Mar 9, 2020

mllg commented Mar 9, 2020

set_threshold sometimes returns NA response for thresholds equal to 0. #452

set_threshold sometimes returns NA response for thresholds equal to 0. #452

Comments

pfistfl commented Mar 6, 2020

mllg commented Mar 9, 2020

pfistfl commented Mar 9, 2020

berndbischl commented Mar 9, 2020

berndbischl commented Mar 9, 2020

mllg commented Mar 9, 2020