Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC adapt logistic regression objective in user guide #28706

Merged
merged 4 commits into from Apr 5, 2024

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

This popped up in #28700.

What does this implement/fix? Explain your changes.

All solvers, except liblinear, use the formulation where C directly enters the penalty.

Any other comments?

@lorentzenchr
Copy link
Member Author

@ogrisel ping.

Copy link

github-actions bot commented Mar 27, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ca6edf8. Link to the linter CI: here

doc/modules/linear_model.rst Outdated Show resolved Hide resolved
doc/modules/linear_model.rst Outdated Show resolved Hide resolved
\min_{w} C \sum_{i=1}^n s_i \left(-y_i \log(\hat{p}(X_i)) - (1 - y_i) \log(1 - \hat{p}(X_i))\right) + r(w),
\min_{w} \frac{1}{s}\sum_{i=1}^n s_i
\left(-y_i \log(\hat{p}(X_i)) - (1 - y_i) \log(1 - \hat{p}(X_i))\right)
+ \frac{r(w)}{sC}\,,
Copy link
Member

@ogrisel ogrisel Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readers might find it weird to see an objective function that has a fixed constant positive multiplier on all the terms. But I agree it makes it explicit that the user-provided sample weights do not need to sum to 1.

Also maybe we could add a note to state that scikit-learn uses a weird inverse parametrization C for the strength of the regularizer. This has historical roots in the fact that scikit-learn reused the parametrization of Liblinear which is a library that provided an implementation of penalized logistic regression that reused the mathemtical parametrization from the support vector machine literature and maybe cross-link to https://scikit-learn.org/stable/modules/svm.html#svc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering, for some time already, to deprecate C and add the same parameters as for elastic net. Not sure what other core devs think about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #28711.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readers might find it weird to see an objective function that has a fixed constant positive multiplier on all the terms.

I do

But I agree it makes it explicit that the user-provided sample weights do not need to sum to 1.

This is never an assumption in any scikit-learn estimator. What makes you think user might think otherwise here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loss part is just an empirical mean, or weighted average, maybe writing np.average(loss, weights=weights) or a comment makes reading easier. Or throwing away the weights in the formula?
@jeremiedbb What is your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR multiplies the objective function by a constant term, which can be seen as, quoting Olivier, weird. So I'm trying to understand the arguments for doing it anyway. I don't find the argument "it makes it explicit that the user-provided sample weights do not need to sum to 1" very convincing because I've never seen this being a source of confusion. This is why I was seeking for more information 😄.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a chat irl with Olivier and although I'm still not convinced by the argument of explicitness, I'm convinced by other arguments, like the aim to standardize the objective functions throughout the doc and make it a per sample averaged objective function to be comparable between datasets of different sizes (linked to #28169).

@ogrisel ogrisel changed the title DOC adapte logistic regression objective in user guide DOC adapt logistic regression objective in user guide Mar 27, 2024
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the new expression seem artificially complex, I agree this makes that lack of any assumption about weight normalization more explicit.

Furthermore dividing the objective function by C makes the LogReg objective function more easily aligned with the other objective functions in the same page.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

doc/modules/linear_model.rst Outdated Show resolved Hide resolved
@jeremiedbb jeremiedbb enabled auto-merge (squash) April 5, 2024 13:21
@jeremiedbb jeremiedbb merged commit 4ebd7ff into scikit-learn:main Apr 5, 2024
28 checks passed
@lorentzenchr lorentzenchr deleted the doc_logreg_objective branch April 5, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants