Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different output than scikit-learn's LASSO on a weird example #173

Closed
rpetit opened this issue Dec 9, 2020 · 5 comments
Closed

Different output than scikit-learn's LASSO on a weird example #173

rpetit opened this issue Dec 9, 2020 · 5 comments

Comments

@rpetit
Copy link

rpetit commented Dec 9, 2020

Hi !

I am not sure this is the best place to report this, but I noticed a difference in the ouptut produced by your solver and scikit-learn's. I had a really hard time coming up with some minimal example, sorry if the one below is not really informative. I am using Python 3.7.4 and celer development version on Ubuntu.

import numpy as np
import sklearn.linear_model
import celer


X = np.array([[0, 0, 0, 0, 0, 0, 0, 0.001, 0, 0, 0.015, 0, 0, 0.046, 0, 0, 0.061, 0, 0, 0.062]]).T
y = np.array([0.008, 0, 0.001, 0.02, 0, 0.001, 0.024, 0.001, 0.001, 0.023,
              0.006, 0, 0.011, 0.032, 0, 0.002, 0.056, 0.001, 0.001, 0.062])

lasso_sklearn = sklearn.linear_model.Lasso(alpha=1e-4)
lasso_celer = celer.Lasso(alpha=1e-4)

lasso_sklearn.fit(X, y)
lasso_celer.fit(X, y)

So the coefficient I get from scikit-learn's solver is approximately 0.55 (dual gap is 0) and the one I get from your solver is 0 (dual gap is approximately 1e-4).

I know this is a really degenerate use case, so maybe there is no need to worry about it, but I wanted to report this just in case, and ask if there was any reason celer should not be used in such situation.

Thanks in advance for your help !

@mathurinm
Copy link
Owner

mathurinm commented Dec 9, 2020

Hi Romain,
It's the correct place ! I'm very happy to get your feedback.

I think both solvers are right: they return a coef_ which is less than some tolerance (which is rescaled version of clf.tol) away from the optimal one in terms of objective. sklearn does 1 iteration and celer does not (it evaluates the gap before doing any update)

With
true_sol = sklearn.linear_model.Lasso(alpha=1e-4, tol=1e-14).fit(X, y)

In [44]: p_celer = norm(y - X @ celer_sol.coef_ - celer_sol.intercept_) ** 2 / (2 * len(y)) + celer_sol.a
    ...: lpha * norm(celer_sol.coef_, ord=1)

In [45]: p_celer
Out[45]: 0.000165375

In [46]: p_star = norm(y - X @ true_sol.coef_ - true_sol.intercept_) ** 2 / (2 * len(y)) + true_sol.alpha
    ...:  * norm(true_sol.coef_, ord=1)

In [47]: p_star
Out[47]: 0.00010331658481530062

In [48]: p_sklearn = norm(y - X @ sklearn_sol.coef_ - sklearn_sol.intercept_) ** 2 / (2 * len(y)) + sklea
    ...: rn_sol.alpha * norm(sklearn_sol.coef_, ord=1)

In [49]: p_sklearn
Out[49]: 0.00010331658481530062

In [50]: p_celer - p_star
Out[50]: 6.205841518469937e-05

In [51]: p_sklearn - p_star
Out[51]: 0.0

Notice that your objective at 0 is quite low (and within the default tolerance, 1e-4, of the optimal value):

In [16]: from numpy.linalg import norm

In [17]: norm(y - y.mean()) ** 2 / (2 * len(y))  # primal at w = 0
Out[17]: 0.000165375

If you decrease tol you get the same results:

In [18]: lasso_celer = celer.Lasso(alpha=1e-4, tol=1e-10).fit(X, y)

In [19]: lasso_sklearn = sklearn.linear_model.Lasso(alpha=1e-4, tol=1e-10).fit(X, y)

In [20]: lasso_celer.coef_, lasso_celer.intercept_
Out[20]: (array([0.55034622]), 0.007409297501753958)

In [21]: lasso_sklearn.coef_, lasso_sklearn.intercept_
Out[21]: (array([0.55034622]), 0.0074092975017539565)

@mathurinm
Copy link
Owner

Also be aware that both celer and sklearn fit an intercept by default.

@rpetit
Copy link
Author

rpetit commented Dec 9, 2020

Thank you very much for your quick reply ! I'll try to carefully set the tolerance from now on then.

Also be aware that both celer and sklearn fit an intercept by default.

Thanks for pointing this out !

@rpetit rpetit closed this as completed Dec 9, 2020
@mathurinm
Copy link
Owner

The tolerance should be scaled with respect to norm(y) ** 2 or norm(y) ** / n_samples (as is done in sklearn) to make it easier to set. This has been on my todolist for a while: #125

Keep me posted if there's any feature you're missing. this solver should be must faster than sklearn if your data is high dimensional

@rpetit
Copy link
Author

rpetit commented Dec 9, 2020

The tolerance should be scaled with respect to norm(y) ** 2 or norm(y) ** / n_samples (as is done in sklearn) to make it easier to set. This has been on my todolist for a while: #125

Good to know !

Keep me posted if there's any feature you're missing. this solver should be must faster than sklearn if your data is high dimensional

I actually want to solve problems with a large number of samples in small dimension. I however have to use a weighted l1 norm in the objective, so that's why I am interested in celer, it turns out there are not many packages allowing to do that easily !

Thanks again for your help and for maintaining this package, it's really pleasant to use !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants