Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Add support for unpenalized logistic regression #6738

Closed
Kodiologist opened this issue Apr 30, 2016 · 34 comments · Fixed by #12860

Comments

@Kodiologist
Copy link

@Kodiologist Kodiologist commented Apr 30, 2016

LinearRegression provides unpenalized OLS, and SGDClassifier, which supports loss="log", also supports penalty="none". But if you want plain old unpenalized logistic regression, you have to fake it by setting C in LogisticRegression to a large number, or use Logit from statsmodels instead.

@Kodiologist Kodiologist changed the title Suggestion: Add support for unpenalized linear regression Suggestion: Add support for unpenalized logistic regression Apr 30, 2016
@mblondel

This comment has been minimized.

Copy link
Member

@mblondel mblondel commented May 2, 2016

you have to fake it by setting C in LogisticRegression to a large number

What's the problem with that approach?

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented May 2, 2016

I assumed that it's inexact and slower than a direct implementation of unpenalized logistic regression. Am I wrong?

I notice that setting C too high, as in the following, will cause LogisticRegression.fit to hang. But I don't know if this is a bug or just an inherent property of the algorithm and its implementation on a 64-bit computer.

import numpy as np
from sklearn.linear_model import LogisticRegression

x = np.matrix([0, 0, 0, 0,  1, 1, 1, 1]).T
y =           [1, 0, 0, 0,  1, 1, 1, 0]

m = LogisticRegression(C = 1e200)
m.fit(x, y)
print m.intercept_, m.coef_
@mblondel

This comment has been minimized.

Copy link
Member

@mblondel mblondel commented May 2, 2016

I notice that setting C too high, as in the following, will cause LogisticRegression.fit to hang

Yes this is to be expected as the problem becomes ill-posed when C is large. Iterative solvers are slow with ill-posed problems.

In your example, the algorithm takes forever to reach the desired tolerance. You either need to increase tol or hardcode max_iter.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Oct 11, 2016

@mblondel is there an alternative to "iterative solvers"?
You won't get exactly the unregularized option, right?

@Kodiologist why do you want this?

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented Oct 11, 2016

You're asking why would I want to do logistic regression without regularization? Because (1) sometimes the sample is large enough in proportion to the number of features that regularization won't buy one anything and (2) sometimes the best-fitting coefficients are of interest, as opposed to maximizing predictive accuracy.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Oct 11, 2016

Yes, that was my question.

(1) is not true. It will always buy you a faster solver.

(2) is more in the realms of statistical analysis, which is not really the focus of scikit-learn. I guess we could add this but I don't know what solver we would use. As a non-statistician, I wonder what good any coefficients are that change with a bit of regularization.

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented Oct 11, 2016

I can't say much about (1) since computation isn't my forte. For (2), I am a data analyst with a background in statistics. I know that scikit-learn focuses on traditional machine learning, but it is in my opinion the best Python package for data analysis right now, and I think it will benefit from not limiting itself too much. (I also think, following Larry Wasserman and Andrew Gelman, that statistics and machine learning would mutually benefit from intermingling more, but I guess that's its own can of worms.) All coefficients will change with regularization; that's what regularization does.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Oct 13, 2016

I'm not opposed to adding a solver without regularization. We can check what would be good, or just bail and use l-bfgs and check before-hand if it's ill-conditioned?

Yes, all coefficients change with regularization. I'm just honestly curious what you want to do with them afterwards.

@alexcombessie

This comment has been minimized.

Copy link

@alexcombessie alexcombessie commented Feb 9, 2018

Hey,
What is the status on this topic? I'd be really interested in an unpenalized Logistic Regression. This way p-values will mean something statistically speaking. Otherwise I will have to continue using R 😢 for such use cases...
Thanks,
Alex

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Feb 10, 2018

@mblondel

This comment has been minimized.

Copy link
Member

@mblondel mblondel commented Feb 12, 2018

What solvers do you suggest to implement? How would that be different from the solvers we already have with C -> infty ?

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented Feb 13, 2018

What solvers do you suggest to implement? How would that be different from the solvers we already have with C -> infty ?

You could try looking at R or statsmodels for ideas. I'm not familiar with their methods, but they're reasonably fast and use no regularization at all.

@alexcombessie

This comment has been minimized.

Copy link

@alexcombessie alexcombessie commented Feb 13, 2018

Yeah statsmodels does the job too if you use the QR algorithm for matrix inversion. My use case is around model interpretability. For performance, I would definitely use regularization.

@mblondel

This comment has been minimized.

Copy link
Member

@mblondel mblondel commented Feb 14, 2018

I don't think we need to add any new solver... Logistic regression doesn't enjoy a closed form solution, which means that statsmodel must use an iterative solver of some kind too (my guess would be iterative reweighted least squares, but I haven't checked). Setting C=np.inf (or equivalently alpha=0) should in principle work with our current solvers. My recommendation would be to switch to the L-BFGS or Newton-CG solver, since liblinear can indeed be very slow in this setting. Perhaps we can add a solver="auto" option and automatically switch to one of these when C=np.inf or equivalently penalty="none"?

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Feb 14, 2018

@arose13

This comment has been minimized.

Copy link

@arose13 arose13 commented Apr 20, 2018

For the folks that really want unregularised logistic regression (like myself). I've been having to settle with using statsmodels and making a wrapper class that mimics SKLearn API.

@shermstats

This comment has been minimized.

Copy link

@shermstats shermstats commented Nov 11, 2018

Any updates on this? This is a big blocker for my willingness to recommend scikit-learn to people. It's also not at all obvious to people coming from other libraries that scikit-learn does regularization by default and that there's no way to disable it.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 12, 2018

@shermstats suggestions how to improve the documentation on that? I agree that it might not be very obvious.
Does l-bfgs allow C=np.inf?

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 12, 2018

You can specify C=np.inf, though it'll give you the same result as C=large value. On the example I tried, it gave a better fit than statsmodel and statsmodel failed to converge with most other random seeds:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm

X, y = make_classification(random_state=2)
lr = LogisticRegression(C=np.inf, solver='lbfgs').fit(X, y)


logit = sm.Logit(y, X)
res = logit.fit()
Optimization terminated successfully.
         Current function value: 0.167162
         Iterations 10
from sklearn.metrics import log_loss
log_loss(y, lr.predict_proba(X))
log_loss(y, res.predict(X))
0.16197793224715606
0.16716164149746823

So I would argue we should just document that you can get an unpenalized model by setting C large or to np.inf.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 12, 2018

I'd suggest adding to the docstring and the user guide
"The LogisticRegregression model is penalized by default. You can obtain an unpenalized model by setting C=np.inf and solver='lbfgs'."

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented Nov 12, 2018

it gave a better fit than statsmodel and statsmodel failed to converge with most other random seeds

R's glm is more mature and may make for a better comparison.

I'd suggest adding to the docstring and the user guide
"The LogisticRegregression model is penalized by default. You can obtain an unpenalized model by setting C=np.inf and solver='lbfgs'."

Why not add allow penalty = "none" a la SGDClassifier?

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 12, 2018

@Kodiologist I'm not opposed to adding penalty="none" but I'm not sure what the benefit is to adding a redundant option.
And I think we'd welcome comparisons to glm. I'm not very familiar with glm so I'm probably not a good person to perform the comparison. However, we are optimizing the log-loss so there should really be no difference. Maybe they implement different solvers so having a benchmark would be nice.

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented Nov 12, 2018

I'm not opposed to adding penalty="none" but I'm not sure what the benefit is to adding a redundant option.

  1. It becomes clearer how to get an unpenalized model.
  2. It becomes clearer to the reader what code that's using an unpenalized model is trying to do.
  3. It allows sklearn to change its implementation of unregularized models in the future without breaking people's code.
@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 12, 2018

If you feel it adds to discoverability then we can add it, and 3 is a valid point (though we can actually not really change that without deprecations probably, see current change of the solver).
Do you want to send a PR?

@Kodiologist

This comment has been minimized.

Copy link
Author

@Kodiologist Kodiologist commented Nov 12, 2018

I don't have the round tuits for it; sorry.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 13, 2018

@Kodiologist at least you taught me an idiom I didn't know about ;)

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 13, 2018

So open for contributors: add penalty='none' as an option. Also possibly check what solvers support this / are efficient with this (liblinear is probably not) and restrict to those solvers.

@shermstats

This comment has been minimized.

Copy link

@shermstats shermstats commented Nov 13, 2018

I'd suggest adding to the docstring and the user guide
"The LogisticRegregression model is penalized by default. You can obtain an unpenalized model by setting C=np.inf and solver='lbfgs'."

This sounds reasonable to me. I'd also suggest bolding the first sentence because it's legitimately that surprising for people coming from other machine learning or data analysis environments.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 13, 2018

@shermstats So @Kodiologist suggested adding penalty="none" to make it more explicit, which would just be an alias for C=np.inf. It makes sense for me to make this more explicit in this way. Do you have thoughts on that?
Then that would be what's in the documentation. And I agree that bold might be a good idea.
I think for someone with a ML background this is (maybe?) expected, for someone with a stats background, this is seems very surprising.

@shermstats

This comment has been minimized.

Copy link

@shermstats shermstats commented Nov 13, 2018

Exactly! I have a stats background and have worked with many statistics people coming from R or even point and click interfaces, and this behavior is very surprising to us. I think for now that penalty=None (not sure about "none" vs. None) is a good solution. In the future, we should have a separate solver that's called automatically for unpenalized logistic regression to prevent the issues that @mblondel described.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 13, 2018

Sorry, which issue do you mean? We're switching to l-bfgs by default, and we can also internally switch the solver to l-bfgs automatically if someone specifies penalty='none' (often None is a special token we use for deprecated parameters, but we have stopped doing that. Still 'none' would be more consistent with the rest of the library).
We need solver="auto" anyway so changing the solver based on the penalty shouldn't be an issue.

@shermstats

This comment has been minimized.

Copy link

@shermstats shermstats commented Nov 14, 2018

This issue, which refers to the iterative algorithm becoming very slow for large C. I'm not a numerical analysis expert, but if l-bfgs prevents it from slowing down then that sounds like the right solution. penalty='none' also sounds like the right way to handle this.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Nov 14, 2018

@shermstats yes, with l-bfgs this doesn't seem to be an issue. I haven't run extensive benchmarks, though, and won't have time to. If anyone wants to run benchmarks, that would be a great help.

@lorentzenchr

This comment has been minimized.

Copy link
Contributor

@lorentzenchr lorentzenchr commented Jan 2, 2019

If penalty='none' is to be included, I suggest to add to the user guide the same warning about colinear X as for OLS (in particularly for one-hot encoded features).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.