DEP deprecate multi_class in LogisticRegression #28703

lorentzenchr · 2024-03-26T19:11:26Z

Reference Issues/PRs

Towards #11865.

What does this implement/fix? Explain your changes.

This PR deprecates the multi_class parameter in LogisticRegression. Using that option is equivalent to OneVsRestClassifier(LogisticRegression()), so no functionality is lost and, once gone, it would simplify the code of logreg quite a bit and make in more maintainable.

Any other comments?

This PR starts very simple with only LogisticRegression. In case of positive feedback, I'll extend it to LogisticRegressionCV and adapt all the docstrings and so on.

github-actions · 2024-03-26T19:12:49Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 63a9399. Link to the linter CI: here}

lorentzenchr · 2024-03-27T22:47:57Z

@scikit-learn/core-devs friendly ping for visibility.

jjerphan · 2024-03-28T06:58:33Z

I have not had time to look at this PR at all, but before deprecating, can we have tests to be sure they are equivalent in results and potentially in UX (e.g. would accessing fitted attributes still be possible)?

agramfort · 2024-03-28T09:58:31Z

would be ok for me. I think it's ok to live with

OneVsRestClassifier(LogisticRegression(..))

when using liblinear.

GaelVaroquaux · 2024-03-28T11:05:25Z

Just to make sure that I understand things correctly: the plan is that by default multi-class is supported via multinomial loss but if the solver is liblinear multi-class raises an error, and the user must use a OvR?

If so, that strategy is fine by me, but I think that the deprecation message should first suggest to use solvers that support multinomial or user OvR if liblinear is desired.

lorentzenchr · 2024-03-28T21:39:01Z

Just to make sure that I understand things correctly: the plan is that by default multi-class is supported via multinomial loss but if the solver is liblinear multi-class raises an error, and the user must use a OvR?

Yes, exactly. And yes, with an informative deprecation warning, later an error advertising solvers that support the multinomial loss.

thomasjpfan

It also looks like solver="newton-cholesky" does not support ovr .

From a high level API point of view, LinearSVC only does OVR with liblinear and LogisticRegression will no longer support it. In the future, should LinearSVC also force users to use OneVsRestClassifier for multi-class problems?

Over the last few years, we are slowly making meta-estimators more necessary for certain task. (i.e., the removal of normalize or this PR). It kind of goes against the history of "lets make estimators easy to use". For example, the classifiers encodes string labels to "make these easy". This is my observation and I am undecided on the current path.

thomasjpfan · 2024-03-28T21:48:28Z

sklearn/linear_model/_logistic.py

+                    "'multi_class' was deprecated in version 1.5 and will be removed in"
+                    " 1.7. Use OneVsRestClassifier(LogisticRegression(..)) instead."


If one specifically sets multi_class="multinomial", then this warning seems out of place.

Specifically, the problem is now that this warning message would be off when calling LogisticRegression(multi_class="auto").fit(X, y) on multiclass data.

Maybe we can just use a generic all ecompassing message such as:

if self.multi_class != "deprecated": warnings.warn( ( "'multi_class' was deprecated in version 1.5 and will be removed in" " 1.7. For solvers and penalties that support it, the multinomial" " scheme is used automatically when the data has more than two" " classes. The one-vs-rest scheme can be implemented with" " OneVsRestClassifier(LogisticRegression(...)) instead." " See the docstring for more details." ), FutureWarning, ) else: # Set to old default value. multi_class = "auto"

Why is it off? The parameter multi_class should best not be used anymore, also not with value "auto".

GaelVaroquaux · 2024-03-28T22:01:36Z

Over the last few years, we are slowly making meta-estimators more necessary for certain task. (i.e., the removal of normalize or this PR). It kind of goes against the history of "lets make estimators easy to use". For example, the classifiers encodes string labels to "make these easy". This is my observation and I am undecided on the current path.

Yes, I worry a lot about this trend. Everybody that I talk to values a lot the fact that it's easy to get things going with scikit-learn. It's the number one benefit that people mention. If we loose this, we loose what made scikit-learn.

lorentzenchr · 2024-03-29T07:53:03Z

Over the last few years, we are slowly making meta-estimators more necessary for certain task.

You raise good points. For this particular case, I have 3 answers:

Out of the box, logreg will continue to just work. Only if a user wants to use a specific solver, he/she will be directed to the meta-estimator. (We can even extent the newton-cholesky one to support multinomial, no big deal. Then, only liblinear would be left.)
From a statistical point of view, multinomial logreg should clearly be favored over OvR logreg. (In fact, I find it even hard to provide good statistical reference for OvR logreg.)
The logic of LogisticRegression will become significantly less „opaque“ and the code more maintainable. (And yes, this is a concern as the code complexity has been a problem in several PRs, making contributing hard.)

…ass)

lorentzenchr · 2024-04-09T16:43:38Z

I interpret the conversation so far as decision to deprecate multi_class in LogisticRegression.

I have an open questions: Shall we raise an error for multiclass liblinear or internally switch to OvR? (And state this in the docs)?

ogrisel · 2024-04-10T12:19:11Z

I am fine for deprecating the multi_class thingy in LogisticRegression but not in LinearSVC because there is no good default alternative to handle the multiclass case for the latter so that would be a regression from a UX point of view.

lorentzenchr · 2024-04-12T07:38:29Z

Ready for review.

ogrisel

Some feedback. The most important comments are related to the contents of the docstrings and the warning message to address Thomas' remark.

The other things are more minor and overall LGTM.

Thanks for the PR.

sklearn/linear_model/_logistic.py

ogrisel · 2024-04-24T08:02:07Z

sklearn/linear_model/_logistic.py

+                ),
+                FutureWarning,
+            )
+        elif self.multi_class != "deprecated":


Suggested change

elif self.multi_class != "deprecated":

elif self.multi_class != "ovr":

You mean == "ovr, right?

ogrisel · 2024-04-24T08:10:27Z

sklearn/linear_model/_logistic.py

+                    "'multi_class' was deprecated in version 1.5 and will be removed in"
+                    " 1.7. Use OneVsRestClassifier(LogisticRegression(..)) instead."


Specifically, the problem is now that this warning message would be off when calling LogisticRegression(multi_class="auto").fit(X, y) on multiclass data.

Maybe we can just use a generic all ecompassing message such as:

if self.multi_class != "deprecated": warnings.warn( ( "'multi_class' was deprecated in version 1.5 and will be removed in" " 1.7. For solvers and penalties that support it, the multinomial" " scheme is used automatically when the data has more than two" " classes. The one-vs-rest scheme can be implemented with" " OneVsRestClassifier(LogisticRegression(...)) instead." " See the docstring for more details." ), FutureWarning, ) else: # Set to old default value. multi_class = "auto"

sklearn/linear_model/_logistic.py

ogrisel · 2024-04-24T08:16:48Z

sklearn/linear_model/_logistic.py

+            )
+        else:
+            # Set to old default value.
+            multi_class = "auto"


Similar comment about the warning messages, especially when multi_class="auto" is passed explicitly.

ogrisel · 2024-04-24T09:13:27Z

sklearn/linear_model/tests/test_logistic.py

@@ -1598,6 +1564,9 @@ def test_LogisticRegressionCV_GridSearchCV_elastic_net(multi_class):
    assert gs.best_params_["C"] == lrcv.C_[0]


+# TODO(1.7): remove filterwarnings after the deprecation of multi_class
+# Maybe remove whole test after removal of the deprecated multi_class.


For the record, I am +1 for this suggestion.

ogrisel · 2024-04-24T09:14:51Z

sklearn/linear_model/tests/test_sag.py

-    tol = 0.00001
-    max_iter = 40
+    tol = 1e-5
+    max_iter = 70


I assume this was needed because the previous max_iter value was too RNG sensitive?

Yes. I checked the new (and old) value with all global random seeds. SAG and SAGA work better for larger datasets,

sklearn/linear_model/tests/test_sag.py

DEP deprecate multi_class in LogisticRegression

103d9ef

lorentzenchr added this to the 1.5 milestone Mar 26, 2024

github-actions bot added the module:linear_model label Mar 26, 2024

lorentzenchr added the Needs Decision Requires decision label Mar 26, 2024

thomasjpfan reviewed Mar 28, 2024

View reviewed changes

lorentzenchr added 11 commits April 7, 2024 21:30

DOC improve deprecation messages

c0901a3

DOC improve solver support matrix

2e3605f

FIX do not change self.multi_class in fit

fced0b2

DEP multi_class in LogisticRegressionCV

df4091e

DOC update linear model user guide

c192198

FIX typos

a80142b

DOC / EXA adapt for deprecated ovr

1b67f3e

EXA fix coef in example

b4a6629

FIX develop.rst

398fd01

TST fix linear model tests

100ae1d

DOC / EXA fix examples and docstring with LogisticRegression(multi_cl…

1072da0

…ass)

EXA fix plot_logistic_multinomial

fc8f319

lorentzenchr added 3 commits April 10, 2024 22:32

TST catch all FutureWarnings in test_logistic.py

94a7ec5

Merge branch 'main' into deprecate_ovr_logreg

9f03542

TST make all tests pass with -Werror::FutureWarning

81161e5

lorentzenchr added 2 commits April 11, 2024 18:51

Merge branch 'main' into deprecate_ovr_logreg

8dc3c0d

DOC add whatsnew

a67bb05

lorentzenchr added API and removed Needs Decision Requires decision labels Apr 11, 2024

CLN whatsnew entry text alignment

29f6b52

ogrisel approved these changes Apr 24, 2024

View reviewed changes

CLN address review comments

63a9399

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEP deprecate multi_class in LogisticRegression #28703

DEP deprecate multi_class in LogisticRegression #28703

lorentzenchr commented Mar 26, 2024 •

edited

github-actions bot commented Mar 26, 2024 •

edited

lorentzenchr commented Mar 27, 2024

jjerphan commented Mar 28, 2024

agramfort commented Mar 28, 2024

GaelVaroquaux commented Mar 28, 2024

lorentzenchr commented Mar 28, 2024

thomasjpfan left a comment

thomasjpfan Mar 28, 2024

ogrisel Apr 24, 2024

lorentzenchr Apr 24, 2024

GaelVaroquaux commented Mar 28, 2024 via email

lorentzenchr commented Mar 29, 2024

lorentzenchr commented Apr 9, 2024

ogrisel commented Apr 10, 2024 •

edited

lorentzenchr commented Apr 12, 2024

ogrisel left a comment •

edited

ogrisel Apr 24, 2024

lorentzenchr Apr 24, 2024

ogrisel Apr 24, 2024

ogrisel Apr 24, 2024

ogrisel Apr 24, 2024

ogrisel Apr 24, 2024

lorentzenchr Apr 24, 2024

		"'multi_class' was deprecated in version 1.5 and will be removed in"
		" 1.7. Use OneVsRestClassifier(LogisticRegression(..)) instead."

	elif self.multi_class != "deprecated":
	elif self.multi_class != "ovr":

DEP deprecate multi_class in LogisticRegression #28703

Are you sure you want to change the base?

DEP deprecate multi_class in LogisticRegression #28703

Conversation

lorentzenchr commented Mar 26, 2024 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Mar 26, 2024 • edited

✔️ Linting Passed

lorentzenchr commented Mar 27, 2024

jjerphan commented Mar 28, 2024

agramfort commented Mar 28, 2024

GaelVaroquaux commented Mar 28, 2024

lorentzenchr commented Mar 28, 2024

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GaelVaroquaux commented Mar 28, 2024 via email

lorentzenchr commented Mar 29, 2024

lorentzenchr commented Apr 9, 2024

ogrisel commented Apr 10, 2024 • edited

lorentzenchr commented Apr 12, 2024

ogrisel left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr commented Mar 26, 2024 •

edited

github-actions bot commented Mar 26, 2024 •

edited

ogrisel commented Apr 10, 2024 •

edited

ogrisel left a comment •

edited