DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

lorentzenchr · 2022-04-03T12:26:45Z

Reference Issues/PRs

Partially addresses #18248

What does this implement/fix? Explain your changes.

This PR deprecates loss="deviance" in favor of loss="log_loss" in GradientBoostingClassifier. The default is changed accordingly.

thomasjpfan

Minor comments, otherwise LGTM.

doc/whats_new/v1.1.rst

thomasjpfan · 2022-04-03T14:16:36Z

sklearn/ensemble/_gb.py

-        boosting recovers the AdaBoost algorithm.
+    loss : {'log_loss', 'exponential'}, default='log_loss'
+        The loss function to be optimized. 'log_loss' refers to binomial and
+        multinomial deviance, the same as used in linear logistic regression.


To be consistent with the docstring change above this one, can we remove the reference to "deviance"?

Suggested change

multinomial deviance, the same as used in linear logistic regression.

multinomial log loss, the same as used in logistic regression.

(Also I think most think of logistic regression as linear model.)

Hmm. How about calling them just binomial and multinomial log-likelihood?

"binomial" and "multinomial" are distributions, have a likelihood and, equivalently, a deviance. Log loss, although derivable from those likelihoods, is a loss that a priori has no distribution. I would speak of binary and multiclass log loss.

doc/whats_new/v1.1.rst

jeremiedbb · 2022-04-04T13:50:58Z

sklearn/ensemble/_gb.py

-        deviance (= logistic regression) for classification
-        with probabilistic outputs. For loss 'exponential' gradient
-        boosting recovers the AdaBoost algorithm.
+    loss : {'log_loss', 'exponential'}, default='log_loss'


During the deprecation, "deviance" is still valid and should stay in the set of valid losses

I have seen both. E.g. the deprecation of the "mae" and "mse" criterion in trees are also not listed anymore during the deprecation cycle. They are only listed under deprecated.
I personally prefer it this way, but will do as reviewers wish.

There's also the default value that is directly changed. The thing is that it's just a renaming that gives the same model, so maybe it's acceptable. I'm not sure if it can be a problem ?

I personally prefer it this way, but will do as reviewers wish.

I don't have a strong opinion, but I'd like that we always do it the same way (I didn't remember that we did it this way in the trees). Maybe we should explain how to update the doc in the contributing guide.

#19310 kept "mae" and "mse", but #21314 removed them before the end of the cycle.

don't have a strong opinion

Me neither. If I have a wish, then to get over it fast 😉

So I re-introduce them.

jeremiedbb

LGTM.

There's just one thing I'm not sure about. Here the default changes from "deviance" to "log_loss" immediately. Usually when we rename, we make the change effective at the end of the cycle. Here it should be fine since both are equivalent so I think we can safely move forward, but maybe I'm missing something ?

doc/whats_new/v1.1.rst

sklearn/ensemble/_gb.py

sklearn/ensemble/tests/test_gradient_boosting.py

glemaitre

LGTM as well.

sklearn/ensemble/_gb.py

sklearn/ensemble/tests/test_gradient_boosting.py

lorentzenchr · 2022-04-06T18:57:03Z

@jeremiedbb Thanks for finishing.

…kit-learn#23036) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

DEP deviance -> log_loss

e0607c0

github-actions bot added the module:ensemble label Apr 3, 2022

DOC add whatsnew

d08473c

thomasjpfan approved these changes Apr 3, 2022

View reviewed changes

lorentzenchr added 2 commits April 4, 2022 08:18

CLN some review comments

b27d4fd

DOC adapt user guide

30602ca

This was referenced Apr 4, 2022

DEP auto, binary_crossentropy, categorical_crossentropy in HGBT #23040

Merged

DEP loss "log" in favor of "log loss" in SGDClassifier #23046

Merged

lorentzenchr added this to the 1.1 milestone Apr 4, 2022

lorentzenchr mentioned this pull request Apr 4, 2022

ENH add criterion log_loss as alternative to entropy in trees and forests #23047

Merged

1 task

jeremiedbb reviewed Apr 4, 2022

View reviewed changes

CLN put back deviance in parameter options

23d9647

jeremiedbb approved these changes Apr 6, 2022

View reviewed changes

jeremiedbb reviewed Apr 6, 2022

View reviewed changes

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved

jeremiedbb reviewed Apr 6, 2022

View reviewed changes

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

jeremiedbb reviewed Apr 6, 2022

View reviewed changes

sklearn/ensemble/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved

jeremiedbb added 3 commits April 6, 2022 11:44

Update doc/whats_new/v1.1.rst

9ddfbec

Update sklearn/ensemble/_gb.py

2684f12

Update sklearn/ensemble/tests/test_gradient_boosting.py

ef95a19

glemaitre approved these changes Apr 6, 2022

View reviewed changes

jeremiedbb reviewed Apr 6, 2022

View reviewed changes

sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved

Update sklearn/ensemble/_gb.py

c8f74d0

jeremiedbb reviewed Apr 6, 2022

View reviewed changes

sklearn/ensemble/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved

Update sklearn/ensemble/tests/test_gradient_boosting.py

cac986f

jeremiedbb merged commit 2d97ef8 into scikit-learn:main Apr 6, 2022

lorentzenchr deleted the call_it_log_loss_gb branch April 6, 2022 18:57

lorentzenchr mentioned this pull request Apr 11, 2022

RFC Consistent options/names for loss and criterion #18248

Closed

3 tasks

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022

DEP deviance in favor of log_loss for GradientBoostingClassifier (sci…

1dc12d0

…kit-learn#23036) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

lorentzenchr mentioned this pull request Mar 24, 2023

Use common loss module in gradient boosting #25964

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

lorentzenchr commented Apr 3, 2022

thomasjpfan left a comment

thomasjpfan Apr 3, 2022

lorentzenchr Apr 4, 2022

lorentzenchr Apr 4, 2022

jeremiedbb Apr 4, 2022

lorentzenchr Apr 4, 2022

jeremiedbb Apr 4, 2022

jeremiedbb Apr 4, 2022 •

edited

Loading

lorentzenchr Apr 4, 2022

lorentzenchr Apr 4, 2022

jeremiedbb left a comment

glemaitre left a comment

lorentzenchr commented Apr 6, 2022

	multinomial deviance, the same as used in linear logistic regression.
	multinomial log loss, the same as used in logistic regression.

DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

DEP deviance in favor of log_loss for GradientBoostingClassifier #23036

Conversation

lorentzenchr commented Apr 3, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

lorentzenchr commented Apr 6, 2022

jeremiedbb Apr 4, 2022 •

edited

Loading