Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: reflexion about tol #125

Closed
mathurinm opened this issue May 23, 2020 · 7 comments · Fixed by #180
Closed

ENH: reflexion about tol #125

mathurinm opened this issue May 23, 2020 · 7 comments · Fixed by #180

Comments

@mathurinm
Copy link
Owner

  1. behaviour disagrees with sklearn because the latter scales tol by norm(y) ** 2 (or norm(y) ** 2 / n_samples ?)

  2. using tol < 1e-7 with float32 caused precision issues (found out in check_estimator, MCVE:

                  [1.9376824, 1.3127615, 2.675319, 2.8909883, 1.1503246],
                  [2.375175, 1.5866847, 1.7041336, 2.77679, 0.21310817],
                  [0.2613879, 0.06065519, 2.4978595, 2.3344703, 2.6100364],
                  [2.935855, 2.3974757, 1.384438, 2.3415875, 0.3548233],
                  [1.9197631, 0.43005985, 2.8340068, 1.565545, 1.2439859],
                  [0.79366684, 2.322701, 1.368451, 1.7053018, 0.0563694],
                  [1.8529065, 1.8362871, 1.850802, 2.8312442, 2.0454607],
                  [1.0785236, 1.3110958, 2.0928936, 0.18067642, 2.0003002],
                  [2.0119135, 0.6311477, 0.3867789, 0.946285, 1.0911323]],
                 dtype=np.float32)

    y = np.array([[1.],
                  [1.],
                  [2.],
                  [0.],
                  [2.],
                  [1.],
                  [0.],
                  [1.],
                  [1.],
                  [2.]], dtype=np.float32)

    params = dict(eps=1e-2, n_alphas=10, tol=1e-10, cv=2, n_jobs=1,
                  fit_intercept=False, verbose=2)

    clf = MultiTaskLassoCV(**params)
    clf.fit(X, y)

(casting X to float64 fixes it)

so maybe we can raise a warning if tol is low and X.dtype == np.float32

@agramfort
Copy link
Collaborator

agramfort commented May 23, 2020 via email

@mathurinm
Copy link
Owner Author

@agramfort in sklearn the scaling should be by norm(y) ** 2 / n_samples (i.e. the primal corresponding to a zero coef) in my opinion, not by norm(y) ** 2.

If I fit with taller and taller y, the tolerance used in practice increases while, bc of the scaling by n_samples, the objective value does not.

I think for usability it's best to go sklearn's way, but this is not the correct way IMO.

@QB3 as we discussed

@agramfort
Copy link
Collaborator

are you sure as the cython code where the scaling is done takes alpha already scaled by n_samples.

did you write a tiny script to demonstrate the claim?

@mathurinm
Copy link
Owner Author

mathurinm commented Jul 19, 2020

Summary: when you increase n_samples, p_obj and alpha_max remain similar but the effective tolerance tol ** norm(y) ** 2 increases. So you optimize less.
Script:

import numpy as np

from numpy.linalg import norm
from sklearn.linear_model import Lasso
from scipy.linalg import toeplitz

# Choose a large n_samples so that norm(y)**2 /  (2 n_samples) < tol = 1e-4
n_samples = 20000
n_features = 100


rho = 0.5
np.random.seed(24)
cov = toeplitz(rho ** np.arange(n_features))
X = np.random.multivariate_normal(
    mean=np.zeros(n_features), cov=cov, size=n_samples)
y = X[:, :50] @ (-1) ** np.arange(50)
snr = 10
noise = np.random.randn(n_samples)
y += noise / norm(noise) * norm(y) / snr
y /= norm(y)
tol = 1e-4

print("1st setup: full dataset")
alpha_max = norm(X.T @ y, ord=np.inf) / n_samples
print("Primal at 0 iteration = %.2e" % (norm(y) ** 2 / (2 * n_samples)))
print("Effective tol used in solver: %.2e" % (tol * norm(y) ** 2))
print(">>> Primal < effective_tol, we exit quickly")
clf1 = Lasso(alpha=alpha_max / 50, fit_intercept=False, tol=tol)
clf1.fit(X, y)
print("Iterations on first problem: %d" % clf1.n_iter_)


print("-" * 80)
print("2nd setup: only 100 first rows")
X_less, y_less = X[:100], y[:100]
alpha_max_less = norm(X_less.T @ y_less, ord=np.inf) / len(y_less)
print("alpha_max is roughly invariant: %.2e vs %.2e" %
      (alpha_max, alpha_max_less))
print("Primal at 0 iteration = %.2e" % (norm(y_less) ** 2 / (2 * len(y_less))))
print("Effective tol used in solver: %.2e" % (tol * norm(y_less) ** 2))
print(">>> Primal > effective_tol, we DO NOT exit quickly")
clf2 = Lasso(alpha=alpha_max / 50, fit_intercept=False, tol=tol)
clf2.fit(X_less, y_less)
print("Iterations on 2nd problem: %d" % clf2.n_iter_)

Output:

1st setup: full dataset
Primal at 0 iteration = 2.50e-05
Effective tol used in solver: 1.00e-04
>>> Primal < effective_tol, we exit quickly
Iterations on first problem: 14
--------------------------------------------------------------------------------
2nd setup: only 100 first rows
alpha_max is roughly invariant: 1.11e-03 vs 1.82e-03
Primal at 0 iteration = 2.17e-05
Effective tol used in solver: 4.34e-07
>>> Primal > effective_tol, we DO NOT exit quickly
Iterations on 2nd problem: 238

@josephsalmon
Copy link
Contributor

josephsalmon commented Jul 20, 2020

I agree with @mathurinm on that point.
See for instance a better (scaling) way to stop the algorithm in : https://web.stanford.edu/~boyd/papers/pdf/l1_ls.pdf ; page 5.
The main idea is to upper bound the ideal ratio (P(w^t) - P(w^))/ P(w^) by DualGap(w^t,\theta^t) / D(\theta^t) (where P, D are the primal / dual objectifves, w^t, \theta^t are the primal / dual iterates, and w^* is a dual optimal point.
The choice proposed by @mathurinm consists in simply using D(y / \lambda) instead of a more refined quantity D(\theta^t).
So ideally this should be changed in sklearn.

@QB3
Copy link

QB3 commented Jul 20, 2020

From the user point of view I would say that choosing the tol as in sklearn is the best. If one uses a Lasso as a block in a pipeline, one does not want to change parameters (he/she probably does not understand well). IMO user should just have to change the import of the Lasso solver, not the tol.

@agramfort
Copy link
Collaborator

agramfort commented Jul 24, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants