Skip to content

Conversation

@lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Apr 25, 2020

Reference Issues/PRs

Resolves #15244.

What does this implement/fix? Explain your changes.

Add d2_tweedie_score metric.

Open questions

  • For mean_tweedie_deviance, there are also the 2 specialized versions mean_poisson_devianceand mean_gamma_deviance. Do we also want the corresponding d2_poisson_score and d2_gamma_score?

@lorentzenchr lorentzenchr changed the title [WIP] ENH add d2_tweedie_score [MRG] ENH add d2_tweedie_score Apr 25, 2020
@cmarmo
Copy link
Contributor

cmarmo commented Oct 30, 2020

Hi @lorentzenchr will you be able to synchronize with upstream? I'm wondering if this one could be milestoned for 0.24? Thanks!

@lorentzenchr
Copy link
Member Author

@cmarmo Thanks for looking into this. The underlying issue #15244 needs a decision before this PR can be merged. But syncing with master doesn't hurt:smirk:

@cmarmo
Copy link
Contributor

cmarmo commented Oct 30, 2020

The underlying issue #15244 needs a decision before this PR can be merged

Right! Sorry for the noise!

@cmarmo cmarmo added this to the 1.0 milestone Nov 3, 2020
Base automatically changed from master to main January 22, 2021 10:52
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few comments/suggestions:

if _num_samples(y_pred) < 2:
msg = "D^2 score is not well-defined with less than two samples."
warnings.warn(msg, UndefinedMetricWarning)
return float('nan')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably raise a ValueError instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact same behavior as for r2_score. I think there is nothing wrong with a single sample. So I'll remove that warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But single sample does not make sense, so ValueError it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also chance from warning to raising ValueError for r2_score?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising ValueError, however, produces failure in test_single_sample/check_single_sample which has the following comment:

# Non-regression test: scores should work with a single sample.
# This is important for leave-one-out cross validation.

So I revert to returning nan for both r2_score and d2_score.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... Ok, but then we have a design issue for LOO, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. r2_score is defined for one sample, it just doesn't make sense. Therefore, we'd like to either warn or raise ValueError. As to LOO, one should use MSE instead of R2. Same applies for D2 and tweedie deviances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also change test_single_sample and let both R2 and D2 raise ValueError. In either case, a note somewhere in the docstring or user guide concerning LOO would be good.

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what should we do:

  1. warn (and keep behavior of r2_score)
  2. raise ValueError (and change test_single_sample)

@adrinjalali
Copy link
Member

pinging @lorentzenchr and @rth, this needs to pick up pace if you want it in 1.0

@lorentzenchr
Copy link
Member Author

@adrinjalali Picking up pace... But it's not the most pressing PR to merge neither:smirk:

@lorentzenchr
Copy link
Member Author

Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment).
Further meaningful D^2 scores that could be handy are logloss, absolute error and pinball loss (and maybe more).

@ogrisel
Copy link
Member

ogrisel commented Aug 23, 2021

Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment).

I am fine with implemented both options together (overcomplete APIs for the win!). But this can be done in later PRs. No need to put everything in this one.

Further meaningful D^2 scores that could be handy are logloss, absolute error and pinball loss (and maybe more).

I agree.

@lorentzenchr
Copy link
Member Author

The current status is that we need to decide what R2 and D2 return in the case of one single sample point (n_samples < 2):

  1. Current behavior of R2: return float("nan")
  2. Or raise ValueError("R^2 score is not well-defined with less than two samples.")

@lorentzenchr lorentzenchr removed this from the 1.0 milestone Sep 3, 2021
@lorentzenchr lorentzenchr added this to the 1.1 milestone Sep 3, 2021
@rth
Copy link
Member

rth commented Sep 3, 2021

+1 for option 1. I think if a metric is undefined it should return a nan. Rather than error and expect that the user code will somehow deal with it.

@lorentzenchr
Copy link
Member Author

@rth I reverted to option 2: return float("nan") as is done in R2. Some more minor fixed. Good to merge, IMO, if there is a further +1 on the horizon.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lorentzenchr ! LGTM. Please add a changelog entry, I guess for 1.0 and feel free to merge. There is one doctest failure that needs fixing though.

Comment on lines 996 to 997
arbitrarily worse). A model that always predicts a constant value for the expected
value of y, disregarding the input features, would get a D^2 score of 0.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "the expected value of y"? Any constant value would work, or does it mean the mean on y_true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it means a model that constantly uses the empirical mean of the observed y as prediction. I thought this one would be even clearer than the sentence for R2. I'll make it more precise.

@lorentzenchr lorentzenchr modified the milestones: 1.1, 1.0 Sep 4, 2021
@lorentzenchr lorentzenchr changed the title [MRG] ENH add d2_tweedie_score FEA add d2_tweedie_score Sep 4, 2021
@lorentzenchr lorentzenchr merged commit 9061ff9 into scikit-learn:main Sep 4, 2021
@lorentzenchr lorentzenchr deleted the d2_score branch September 4, 2021 21:17
@lorentzenchr lorentzenchr mentioned this pull request Sep 4, 2021
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Sep 5, 2021
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Sep 5, 2021
adrinjalali pushed a commit that referenced this pull request Sep 6, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add d2_tweedie_score

5 participants