ENH: reflexion about tol #125

mathurinm · 2020-05-23T07:03:03Z

behaviour disagrees with sklearn because the latter scales tol by norm(y) ** 2 (or norm(y) ** 2 / n_samples ?)
using tol < 1e-7 with float32 caused precision issues (found out in check_estimator, MCVE:

                  [1.9376824, 1.3127615, 2.675319, 2.8909883, 1.1503246],
                  [2.375175, 1.5866847, 1.7041336, 2.77679, 0.21310817],
                  [0.2613879, 0.06065519, 2.4978595, 2.3344703, 2.6100364],
                  [2.935855, 2.3974757, 1.384438, 2.3415875, 0.3548233],
                  [1.9197631, 0.43005985, 2.8340068, 1.565545, 1.2439859],
                  [0.79366684, 2.322701, 1.368451, 1.7053018, 0.0563694],
                  [1.8529065, 1.8362871, 1.850802, 2.8312442, 2.0454607],
                  [1.0785236, 1.3110958, 2.0928936, 0.18067642, 2.0003002],
                  [2.0119135, 0.6311477, 0.3867789, 0.946285, 1.0911323]],
                 dtype=np.float32)

    y = np.array([[1.],
                  [1.],
                  [2.],
                  [0.],
                  [2.],
                  [1.],
                  [0.],
                  [1.],
                  [1.],
                  [2.]], dtype=np.float32)

    params = dict(eps=1e-2, n_alphas=10, tol=1e-10, cv=2, n_jobs=1,
                  fit_intercept=False, verbose=2)

    clf = MultiTaskLassoCV(**params)
    clf.fit(X, y)

(casting X to float64 fixes it)

so maybe we can raise a warning if tol is low and X.dtype == np.float32

The text was updated successfully, but these errors were encountered:

agramfort · 2020-05-23T08:10:34Z

you need to scale the tol so you have the same number of iterations if you scale Y differently. if you don't and you divide Y by 1e6 then the solver will run much faster for no good reason. +1 to raise a warning if tol < 1e-7 and you use float32

…

mathurinm · 2020-07-18T10:29:51Z

@agramfort in sklearn the scaling should be by norm(y) ** 2 / n_samples (i.e. the primal corresponding to a zero coef) in my opinion, not by norm(y) ** 2.

If I fit with taller and taller y, the tolerance used in practice increases while, bc of the scaling by n_samples, the objective value does not.

I think for usability it's best to go sklearn's way, but this is not the correct way IMO.

@QB3 as we discussed

agramfort · 2020-07-18T21:09:01Z

are you sure as the cython code where the scaling is done takes alpha already scaled by n_samples.

did you write a tiny script to demonstrate the claim?

mathurinm · 2020-07-19T08:01:27Z

Summary: when you increase n_samples, p_obj and alpha_max remain similar but the effective tolerance tol ** norm(y) ** 2 increases. So you optimize less.
Script:

import numpy as np

from numpy.linalg import norm
from sklearn.linear_model import Lasso
from scipy.linalg import toeplitz

# Choose a large n_samples so that norm(y)**2 /  (2 n_samples) < tol = 1e-4
n_samples = 20000
n_features = 100


rho = 0.5
np.random.seed(24)
cov = toeplitz(rho ** np.arange(n_features))
X = np.random.multivariate_normal(
    mean=np.zeros(n_features), cov=cov, size=n_samples)
y = X[:, :50] @ (-1) ** np.arange(50)
snr = 10
noise = np.random.randn(n_samples)
y += noise / norm(noise) * norm(y) / snr
y /= norm(y)
tol = 1e-4

print("1st setup: full dataset")
alpha_max = norm(X.T @ y, ord=np.inf) / n_samples
print("Primal at 0 iteration = %.2e" % (norm(y) ** 2 / (2 * n_samples)))
print("Effective tol used in solver: %.2e" % (tol * norm(y) ** 2))
print(">>> Primal < effective_tol, we exit quickly")
clf1 = Lasso(alpha=alpha_max / 50, fit_intercept=False, tol=tol)
clf1.fit(X, y)
print("Iterations on first problem: %d" % clf1.n_iter_)


print("-" * 80)
print("2nd setup: only 100 first rows")
X_less, y_less = X[:100], y[:100]
alpha_max_less = norm(X_less.T @ y_less, ord=np.inf) / len(y_less)
print("alpha_max is roughly invariant: %.2e vs %.2e" %
      (alpha_max, alpha_max_less))
print("Primal at 0 iteration = %.2e" % (norm(y_less) ** 2 / (2 * len(y_less))))
print("Effective tol used in solver: %.2e" % (tol * norm(y_less) ** 2))
print(">>> Primal > effective_tol, we DO NOT exit quickly")
clf2 = Lasso(alpha=alpha_max / 50, fit_intercept=False, tol=tol)
clf2.fit(X_less, y_less)
print("Iterations on 2nd problem: %d" % clf2.n_iter_)

Output:

1st setup: full dataset
Primal at 0 iteration = 2.50e-05
Effective tol used in solver: 1.00e-04
>>> Primal < effective_tol, we exit quickly
Iterations on first problem: 14
--------------------------------------------------------------------------------
2nd setup: only 100 first rows
alpha_max is roughly invariant: 1.11e-03 vs 1.82e-03
Primal at 0 iteration = 2.17e-05
Effective tol used in solver: 4.34e-07
>>> Primal > effective_tol, we DO NOT exit quickly
Iterations on 2nd problem: 238

josephsalmon · 2020-07-20T08:48:10Z

I agree with @mathurinm on that point.
See for instance a better (scaling) way to stop the algorithm in : https://web.stanford.edu/~boyd/papers/pdf/l1_ls.pdf ; page 5.
The main idea is to upper bound the ideal ratio (P(w^t) - P(w^))/ P(w^) by DualGap(w^t,\theta^t) / D(\theta^t) (where P, D are the primal / dual objectifves, w^t, \theta^t are the primal / dual iterates, and w^* is a dual optimal point.
The choice proposed by @mathurinm consists in simply using D(y / \lambda) instead of a more refined quantity D(\theta^t).
So ideally this should be changed in sklearn.

QB3 · 2020-07-20T13:16:14Z

From the user point of view I would say that choosing the tol as in sklearn is the best. If one uses a Lasso as a block in a pipeline, one does not want to change parameters (he/she probably does not understand well). IMO user should just have to change the import of the Lasso solver, not the tol.

agramfort · 2020-07-24T11:55:14Z

ok got it. I think you can make a case that it's a bug fix in sklearn and also the docstring could be clarified that the tol a relative tolerance. I did this as an intuition many years ago but explaining this as a relative tolerance makes it simpler to explain

…

mathurinm mentioned this issue Dec 9, 2020

Different output than scikit-learn's LASSO on a weird example #173

Closed

mathurinm mentioned this issue Jan 12, 2021

ENH: Scale tolerance (stopping criterion) as in sklearn #180

Merged

mathurinm closed this as completed in #180 Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: reflexion about tol #125

ENH: reflexion about tol #125

mathurinm commented May 23, 2020

agramfort commented May 23, 2020 via email

mathurinm commented Jul 18, 2020

agramfort commented Jul 18, 2020

mathurinm commented Jul 19, 2020 •

edited

josephsalmon commented Jul 20, 2020 •

edited

QB3 commented Jul 20, 2020

agramfort commented Jul 24, 2020 via email

ENH: reflexion about tol #125

ENH: reflexion about tol #125

Comments

mathurinm commented May 23, 2020

agramfort commented May 23, 2020 via email

mathurinm commented Jul 18, 2020

agramfort commented Jul 18, 2020

mathurinm commented Jul 19, 2020 • edited

josephsalmon commented Jul 20, 2020 • edited

QB3 commented Jul 20, 2020

agramfort commented Jul 24, 2020 via email

mathurinm commented Jul 19, 2020 •

edited

josephsalmon commented Jul 20, 2020 •

edited