[WIP] Implementation of logistic_regression_path and LogisticRegressionCV #2327

Closed
wants to merge 1 commit into
from

Projects

None yet

6 participants

@fabianp
scikit-learn member

This code implements a function logistic_regression_path and an object LogisticRegressionCV

This is mostly the code from @GaelVaroquaux logistic_cv branch. My work is only some minor cleanup some added tests and benchmarking.

Rationale

We currently use LIBLINEAR for LogisticRegression and while it is very fast on most problems it cannot be warm-restarted, which means that fitting a LogisticRegression on a grid of parameters takes as much time as fitting it independently on each parameter separately. In this pull request we implement the code to perform an efficient fit of logistic regression models across a sequence of regularization parameters by using the solution to the previous model as initialization point for the current optimization problem. We also get (almost for free) an efficient implementation of a cross-validated Logistic Regression estimator in LogisticRegressionCV

Code

The important functions are logistic_regression_path and LogisticRegressionCV, both in linear_models/logistic.py. Our benchmarks have shown that the most competitive method is either lbfgs or the newton-cg solver adapted from the SciPy codebase, depending on the problem. The changes made to the newton-cg code are essentially to allow the function to minimize to provide the gradient and hessian in the same call, thus being able to reuse several computations. These changes are extremely important for this model but unfortunately these changes are incompatible with the SciPy API and extending the SciPy API to support our changes is not straightforward. That is why I've added the solver to sklearn.utils.optimize.

The available solvers are (I've tried to use as much the vocabulary from scipy.optimize):

  • lbfgs: this uses scipy.optimize.fmin_lbfgs_b

  • newton-cg: this uses our modificed version of scipy's newton-cg implementation

  • liblinear: this uses the LIBLINEAR implementation, which in turn is a trust-region newton type algorithm.

The solvers make use of the private functions _logistic* defined in logistic.py.

Benchmarks

I have run benchmarks on real and synthetic datasets.

Each vertical line corresponds to the convergence of the method for a particular parameter and each axis is done with a different method. The important quantity is the timing for the last line, that is, when the algorithm finishes to optimize all models in the path (i.e. total time).

Real datasets

This if from the haxby dataset from https://github.com/nilearn/nilearn. 216 samples x 6222 features. I'll upload this data somewhere as soon as I have decent wifi connection. The notebook to run the benchmarks can be found here.

image

Synthetic dataset

From a linear model with gaussian noise (see notebook), 10K samples x 100 features.

image

TODO

  • I think the some of the private functions in logistic.py can be simplified. In particular, it is my intuition that we could squash some of the functions _logistic* and _logistic_*_intercept without sacrificing a significant amount of performance

  • Compatibility with old scipy that do not have keyword max_iter in the lbfgs solver (Travis build failure)

  • sparse matrix support (any help from the people who actually use sparse matrices would be great here)

  • implement multinomial logistic or raise a meaningful error for multiclass pointing to OvA, OvO meta-estimators

  • More tests for LogisticCV object (and some are failing)

  • Document tolerance and stopping criterions

@fabianp fabianp Implementation of logistic_regression_path.
This is mostly the code from Gael's logistic_cv branch. My work is only some
minor cleanup some added tests and benchmarking.
66678b2
@GaelVaroquaux
scikit-learn member

Thanks @fabianp . I updated the todo list.

@fabianp
scikit-learn member

@GaelVaroquaux I added some compatibility for sparse matrices, at least there's a test for sparse matrices that passes.

@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/logistic.py
+def _phi(t, copy=True):
+ # helper function: return 1. / (1 + np.exp(-t))
+ if copy:
+ t = np.copy(t)
+ t *= -1.
+ t = np.exp(t, t)
+ t += 1
+ t = np.reciprocal(t, t)
+ return t
+
+
+def _logistic_loss_and_grad(w, X, y, alpha):
+ # the logistic loss and its gradient
+ z = X.dot(w)
+ yz = y * z
+ out = np.empty(yz.shape, yz.dtype)
@agramfort
agramfort Jul 30, 2013

out = np.empty_like(yz)

@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/logistic.py
+ idx = yz > 0
+ out[idx] = np.log(1 + np.exp(-yz[idx]))
+ out[~idx] = (-yz[~idx] + np.log(1 + np.exp(yz[~idx])))
+ out = out.sum() + .5 * alpha * w.dot(w)
+
+ z = _phi(yz, copy=False)
+ z0 = (z - 1) * y
+ grad = X.T.dot(z0) + alpha * w
+ return out, grad
+
+
+def _logistic_loss(w, X, y, alpha):
+ # the logistic loss and
+ z = X.dot(w)
+ yz = y * z
+ out = np.empty(yz.shape, yz.dtype)
@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/logistic.py
+ yz = y * z
+ out = np.empty(yz.shape, yz.dtype)
+ idx = yz > 0
+ out[idx] = np.log(1 + np.exp(-yz[idx]))
+ out[~idx] = (-yz[~idx] + np.log(1 + np.exp(yz[~idx])))
+ out = out.sum() + .5 * alpha * w.dot(w)
+ #print 'Loss %r' % out
+ return out
+
+
+def _logistic_loss_grad_hess(w, X, y, alpha):
+ # the logistic loss, its gradient, and the matvec application of the
+ # Hessian
+ z = X.dot(w)
+ yz = y * z
+ out = np.empty(yz.shape, yz.dtype)
@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/logistic.py
+ out = out.sum() + .5 * alpha * w.dot(w)
+
+ z = _phi(yz, copy=False)
+ z0 = (z - 1) * y
+ grad = X.T.dot(z0) + alpha * w
+
+ # The mat-vec product of the Hessian
+ d = z * (1 - z)
+ if sparse.issparse(X):
+ def Hs(s):
+ ret = d * X.dot(s)
+ return X.T.dot(ret) + alpha * s
+ else:
+ # Precompute as much as possible
+ d = np.sqrt(d, d)
+ # XXX: how to do this with sparse matrices?
@agramfort
agramfort Jul 30, 2013

do a dot with a diagonal sparse matrix.

@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/logistic.py
+ def Hs(s):
+ ret = dX.T.dot(dX.dot(s))
+ ret += alpha * s
+ return ret
+ #print 'Loss/grad/hess %r, %r' % (out, grad.dot(grad))
+ return out, grad, Hs
+
+
+def _logistic_loss_and_grad_intercept(w_c, X, y, alpha):
+ w = w_c[:-1]
+ c = w_c[-1]
+
+ z = X.dot(w)
+ z += c
+ yz = y * z
+ out = np.empty(yz.shape, yz.dtype)
@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/logistic.py
+ Coefficient of the features in the decision function.
+
+ `coef_` is readonly property derived from `raw_coef_` that \
+ follows the internal memory layout of liblinear.
+
+ `intercept_` : array, shape = [n_classes-1]
+ Intercept (a.k.a. bias) added to the decision function.
+ It is available only when parameter intercept is set to True.
+
+ See also
+ --------
+ LogisticRegression
+
+ """
+
+
@agramfort
agramfort Jul 30, 2013

to many blank lines

@agramfort agramfort commented on the diff Jul 30, 2013
sklearn/linear_model/tests/test_logistic.py
+ n_features = X_ref.shape[1]
+
+ X_sp = X_ref.copy()
+ X_sp[X_sp < .1] = 0
+ X_sp = sp.csr_matrix(X_sp)
+ for X in (X_ref, X_sp):
+ w = np.zeros(n_features)
+
+ # First check that our derivation of the grad is correct
+ loss, grad = logistic._logistic_loss_and_grad(w, X, y, alpha=1.)
+ approx_grad = optimize.approx_fprime(w,
+ lambda w: logistic._logistic_loss_and_grad(w, X, y,
+ alpha=1.)[0],
+ 1e-3
+ )
+ np.testing.assert_array_almost_equal(grad, approx_grad, decimal=2)
@agramfort
agramfort Jul 30, 2013

I would import assert_array_almost_equal from np.testing for consitency

@agramfort
scikit-learn member

almost there @fabianp ! really nice work !

@mblondel
scikit-learn member

@fabianp your plots never cease to impress me!

@fabianp
scikit-learn member

Thanks @mblondel !

@fabianp
scikit-learn member

Thanks @agramfort for your comments, I've taken them all into account and updated my branch.

@mblondel
scikit-learn member

Any plan to implement the squared hinge loss? (a.k.a L2-SVM loss)

@GaelVaroquaux
scikit-learn member
@mblondel
scikit-learn member

It might be interesting, but I suspect that it is a different optimization problem: it is not strictly convex, and thus the optimizers would behave differently.

The L2-regularizer should make the problem strongly convex.

@agramfort
scikit-learn member
@mblondel
scikit-learn member
@mblondel
scikit-learn member
@GaelVaroquaux
scikit-learn member
@GaelVaroquaux
scikit-learn member
@larsmans larsmans commented on the diff Aug 12, 2013
sklearn/linear_model/logistic.py
+ if fit_intercept:
+ w0 = np.zeros(X.shape[1] + 1)
+ func = _logistic_loss_and_grad_intercept
+ else:
+ w0 = np.zeros(X.shape[1])
+ func = _logistic_loss_and_grad
+ coefs = list()
+
+ for C in Cs:
+ if callback is not None:
+ callback(w0, X, y, 1. / C)
+ if method == 'lbfgs':
+ out = optimize.fmin_l_bfgs_b(
+ func, w0, fprime=None,
+ args=(X, y, 1./C),
+ iprint=verbose > 0, pgtol=gtol, maxiter=max_iter)
@larsmans
larsmans Aug 12, 2013

fmin_l_bfgs_b has the annoying habit of leaving iterate.dat files around when iprint>0. Also, iprint=0 is not the silent mode, that's iprint=-1, so iprint=(verbose > 0) - 1 would be a better idea.

@larsmans larsmans and 1 other commented on an outdated diff Aug 23, 2013
sklearn/linear_model/logistic.py
+ out[~idx] = (-yz[~idx] + np.log(1 + np.exp(yz[~idx])))
+ out = out.sum() + .5 * alpha * w.dot(w)
+
+ z = _phi(yz, copy=False)
+ z0 = (z - 1) * y
+ grad = X.T.dot(z0) + alpha * w
+ return out, grad
+
+
+def _logistic_loss(w, X, y, alpha):
+ # the logistic loss and
+ z = X.dot(w)
+ yz = y * z
+ out = np.empty_like(yz)
+ idx = yz > 0
+ out[idx] = np.log(1 + np.exp(-yz[idx]))
@larsmans
larsmans Aug 23, 2013

np.log1p should be more stable.

@GaelVaroquaux
GaelVaroquaux Jun 17, 2014
@larsmans
larsmans Jun 17, 2014

Actually we now have sklearn.utils.fixes.expit for the logistic sigmoid, so this comment is outdated.

@MechCoder
scikit-learn member

@fabianp Can we close this?

@agramfort agramfort closed this Jul 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment