New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC adapt logistic regression objective in user guide #28706
DOC adapt logistic regression objective in user guide #28706
Conversation
@ogrisel ping. |
doc/modules/linear_model.rst
Outdated
\min_{w} C \sum_{i=1}^n s_i \left(-y_i \log(\hat{p}(X_i)) - (1 - y_i) \log(1 - \hat{p}(X_i))\right) + r(w), | ||
\min_{w} \frac{1}{s}\sum_{i=1}^n s_i | ||
\left(-y_i \log(\hat{p}(X_i)) - (1 - y_i) \log(1 - \hat{p}(X_i))\right) | ||
+ \frac{r(w)}{sC}\,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readers might find it weird to see an objective function that has a fixed constant positive multiplier on all the terms. But I agree it makes it explicit that the user-provided sample weights do not need to sum to 1.
Also maybe we could add a note to state that scikit-learn uses a weird inverse parametrization C
for the strength of the regularizer. This has historical roots in the fact that scikit-learn reused the parametrization of Liblinear which is a library that provided an implementation of penalized logistic regression that reused the mathemtical parametrization from the support vector machine literature and maybe cross-link to https://scikit-learn.org/stable/modules/svm.html#svc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering, for some time already, to deprecate C
and add the same parameters as for elastic net. Not sure what other core devs think about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #28711.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readers might find it weird to see an objective function that has a fixed constant positive multiplier on all the terms.
I do
But I agree it makes it explicit that the user-provided sample weights do not need to sum to 1.
This is never an assumption in any scikit-learn estimator. What makes you think user might think otherwise here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loss part is just an empirical mean, or weighted average, maybe writing np.average(loss, weights=weights)
or a comment makes reading easier. Or throwing away the weights in the formula?
@jeremiedbb What is your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR multiplies the objective function by a constant term, which can be seen as, quoting Olivier, weird. So I'm trying to understand the arguments for doing it anyway. I don't find the argument "it makes it explicit that the user-provided sample weights do not need to sum to 1" very convincing because I've never seen this being a source of confusion. This is why I was seeking for more information 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a chat irl with Olivier and although I'm still not convinced by the argument of explicitness, I'm convinced by other arguments, like the aim to standardize the objective functions throughout the doc and make it a per sample averaged objective function to be comparable between datasets of different sizes (linked to #28169).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the new expression seem artificially complex, I agree this makes that lack of any assumption about weight normalization more explicit.
Furthermore dividing the objective function by C
makes the LogReg objective function more easily aligned with the other objective functions in the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
This popped up in #28700.
What does this implement/fix? Explain your changes.
All solvers, except liblinear, use the formulation where
C
directly enters the penalty.Any other comments?