[MRG+2] Multinomial Logistic Regression #3490

MechCoder · 2014-07-28T10:03:27Z

Refactor code to let multinomial to be an option.
Test with further data, (e.g make blobs)
Try speeding up

Benchmarks for the haxby dataset.

When X is dense and across 9 classes (972, 577) (removing most of the zero features)

When X is sparse and acroos 9 classes (972, 163839)

Benchmarks for a synthetic dataset, using make_classification (50000, 2000)

For the arcene data (100 * 10000)

MechCoder · 2014-07-29T11:58:04Z

@larsmans I have done a few benches, I have timed the newsgroup dataset,

It takes around 195 seconds around which 80 seconds are spent in the _loss_grad function.

A huge amount of these 80 seconds (64.8%) are spent in these two operations

p = safe_sparse_dot(X, w.T)
grad = safe_sparse_dot(diff.T, X)

I tried splitting the loss function into functions that separately calculate the loss and gradient, but it slows down even more, probably because p gets calculated repeatedly. Do you have any tips to proceed?

cc: @agramfort

larsmans · 2014-07-30T08:33:08Z

I think I actually already optimized those two lines of code. If X is CSR and w is C-ordered (so w.T is Fortran-ordered), safe_sparse_dot(X, w.T) takes the fastest path through scipy.sparse.

Similarly, while grad could be computed as safe_sparse_dot(X.T, diff).T, that would mean multiplying with a CSC matrix on the left, which is slow. One thing to try is

XT = X.tocsc().T
grad = safe_sparse_dot(XT, diff).T

but I'm not convinced that would actually be faster.

Ping @ogrisel, who knows a lot about sparse matmul as well.

larsmans · 2014-07-30T08:39:14Z

sklearn/linear_model/logistic.py

+
+def _sqnorm(x):
+    x = x.ravel()
+    return np.dot(x, x)


We have this in sklearn.utils.extmath.squared_norm now.

larsmans · 2014-07-30T08:50:23Z

Hold on: w is actually Fortran-ordered because that makes w[:, :-1] Fortran-ordered and contiguous, but as a result, w[:, :-1].T is C-ordered.

scipy.sparse doesn't care very much about contiguous arrays, so maybe making w C-ordered initially can shave off the 9% that it does in my IPython timings. OTOH, that might destroy performance on dense arrays...

MechCoder · 2014-07-30T13:21:35Z

@larsmans

There was a difference in the tol and max_iter parameter in the fmin_lbfgs_b and the Logisic Regression solver, which I have fixed (which created the impression of it being much slower).

I played around with a few datasets, and it seems that though MultinomialLR is slower in sparse settings, it is much better than the LogisticRegression solver, especially for multi-class problems.
the first timings are logistic regression, the second ones are for multinomial

For the newsgroup dataset
11.2755260468 # Logistic Regression
59.4959349632 # Multinomial

For a synthetic dataset got from make_classification
n_samples=10000, n_features=1000, n_informative=500
5.37381219864 # Logistic Regression
4.84601402283 # Multinomial

n_samples=10000, n_features=1000, n_informative=2
0.776676177979
1.7113292217

n_samples=10000, n_features=1000, n_classes=4
19.1921730042 
5.38009214401

n_samples=10000, n_features=1000, n_classes=4, n_informative=2
6.80555915833
4.28371906281

Thanks for the wonderful comments on memory-contiguity. I tried those but it does not seem to speed up at all. However, I thought that we need to worry about those only for cython dependent operations, and these are internally optimised by np.dot. Is it not so?

Also, I tried out this problem, http://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_weighted_samples.html for multinomial and logistic for the same setting, but I get different lines. I suppose this is not expected (the dahed line is multinomial)

?

MechCoder · 2014-07-30T13:25:04Z

Please ignore the graph above, I made a mistake (I was thinking that the lbfgs solver is the default one). This is the new graph, (the lines are slightly different from one another), is that ok?

larsmans · 2014-07-30T13:37:44Z

I don't know if this is ok; the buildbots don't seem to like the code... I didn't test the sample weights very well.

Re: contiguity, that doesn't actually involve NumPy in the sparse case. scipy.sparse has its own matmul routines. These accept (AFAIK) arbitrarily strided NumPy arrays, but then still they may be slow if the strides are too big. Like with NumPy, and BLAS in fact, performance is best when data are packed tightly and presented in the order that they get processed, because that minimizes the number of cache misses.

(Also, if you inspect the hairy beast that is numpy/core/blasdot/_dotblas.c closely, you'll find that sometimes it copies matrix inputs to get them properly contiguous, and for vector dot products it may back off to a very slow routine because BLAS won't handle negative strides the way NumPy wants.)

MechCoder · 2014-07-30T13:43:07Z

thanks for the info. yes, I think sample weights is broken. I will fix it.

jnothman · 2014-07-30T13:44:31Z

These accept (AFAIK) arbitrarily strided NumPy arrays, but then still they may be slow if the strides are too big.

For csr_matvecs, it will copy if the data isn't fortran contiguous, as far as I can see.

MechCoder · 2014-07-30T14:28:41Z

I tested the scores on the 20 newsgroup datasets vectorized,

I get a score of 0.81704726500265534 for the Logistic Regression (OvA) and 0.79660116834838024 for the multinomial model setting C=10000.

agramfort · 2014-07-30T14:58:38Z

What value of C? Did you cross-validate? You should report the results for a grid of C

MechCoder · 2014-07-30T15:00:30Z

Yes, that was the best C that I got by using the Logistic Regression CV model.

MechCoder · 2014-07-30T15:01:57Z

I shall do that, I'll plot scores on the y axis and C on the x axis in a bit.

MechCoder · 2014-07-30T16:28:34Z

I've added support for class weights. Tests pass now.

coveralls · 2014-07-30T16:28:43Z

Coverage increased (+0.0%) when pulling 0f27c84 on MechCoder:multinomial_logreg into 1b2833a on scikit-learn:master.

larsmans · 2014-07-30T17:22:57Z

Nice!

MechCoder · 2014-07-31T18:26:38Z

sklearn/linear_model/logistic.py

+    grad *= C
+    grad += w
+    if fit_intercept:
+        grad = np.hstack([grad, diff.sum(axis=0).reshape(-1, 1)])


@larsmans I suppose this should be

grad = np.hstack([grad, C* diff.sum(axis=0).reshape(-1, 1)])

or even better we could just do alpha = 1./C . and multiply it with the penalty term to make it less confusing.

i.e

grad += alpha * w

and

loss = ... + 0.5 * alpha * squared_norm(w)

I specifically chose not to penalize the intercept because it means I don't typically center my data. My reasoning was that a flat estimator would still learn the class distribution, not just be all zeros.

I guess I've got the math wrong.
I thought we are not multiplying C with the grad term corresponding to the intercept, and its not the penalty term corresponding to the intercept.

Does multiplying the grad term with C count as penalising the intercept too?

I'm not sure I get you. C is multiplied into grad a few lines up, before the intercept is stacked in.

I was just asking if the total loss was

C * (entire loss) + penalty term (without the intercept) OR

C* (loss without the intercept) + (loss due to intercept) + penalty term (without the intercept)

I was thinking it was 1, since the loss is

-C * (sample_weight * Y * p)

and p includes the intercept here.
If it were 1 when I do a derivative wrt intercept, the C would remain right? Sorry if my questions are dumb.

thats why I was suggesting it would be better if we write it this day

Loss = Total Loss + alpha * penalty ( without intercept)

It would be a bit clearer, since alpha = 1. / C

@larsmans Does this make sense or am I crazy?

No, you're making perfect sense. Using alpha is more in line with most of the LR literature and makes it easier to check the derivations, as I just noticed.

MechCoder · 2014-07-31T19:40:46Z

@larsmans @agramfort

I ran a few benchmarks for the multinomial vs logistic regression code.

agramfort · 2014-07-31T19:48:47Z

How many classes did you use?

On 31 juil. 2014, at 21:40, Manoj Kumar notifications@github.com wrote:

@larsmans @agramfort

I ran a few benchmarks for the multinomial vs logistic regression code.

—
Reply to this email directly or view it on GitHub.

MechCoder · 2014-07-31T19:50:01Z

@agramfort I used all 20 classes of the 20 newsgroup data.

coveralls · 2014-08-01T12:50:29Z

Coverage increased (+0.0%) when pulling 8606ebf on MechCoder:multinomial_logreg into 6c7f029 on scikit-learn:master.

MechCoder · 2014-08-01T13:39:23Z

@larsmans @agramfort I could not make the newsgroup example any faster by making changes in memory. What do you think would be a good argument, to take this PR ahead?

agramfort · 2014-08-02T22:13:21Z

sklearn/linear_model/logistic.py

+
+        Parameters
+        ----------
+        X : array-like, shape = [n_samples, n_features]


shape (n_samples, n_features)

agramfort · 2014-08-02T22:25:05Z

what we really miss is an example that uses MLR

also I am not sure about the name MultinomialLR... or even the need for an extra class.

what would it take to have

multi_class: string, 'ovr' or 'multinomial' (default='ovr')

in LogisticRegression (inspired by LinearSVC)

thoughts?

MechCoder · 2014-08-18T08:50:52Z

@jnothman Should I go ahead and replace it?

jnothman · 2014-08-18T09:28:07Z

@jnothman Should I go ahead and replace it?

As @vene says it is messy in master, and should probably be done separately from this PR.

GaelVaroquaux · 2014-08-18T09:29:27Z

@jnothman Should I go ahead and replace it?
As @vene says it is messy in master, and should probably be done separately
from this PR.

I agree. I think that we should merge this guy (I wanted to give it a
last look, but if it has two 👍...) and then you should work on a
cleanup.

MechCoder · 2014-08-18T09:30:03Z

@GaelVaroquaux Ok just a second, I will address your last comment.

MechCoder · 2014-08-18T09:46:23Z

@GaelVaroquaux Please merge if the last commit is ok with you.

coveralls · 2014-08-18T09:58:15Z

Coverage increased (+0.03%) when pulling b2133eb on MechCoder:multinomial_logreg into 6d8ccbc on scikit-learn:master.

coveralls · 2014-08-18T16:27:14Z

Coverage increased (+0.03%) when pulling bc0a9a0 on MechCoder:multinomial_logreg into 6d8ccbc on scikit-learn:master.

coveralls · 2014-08-18T16:56:36Z

Coverage increased (+0.02%) when pulling 2f722bb on MechCoder:multinomial_logreg into 05aa804 on scikit-learn:master.

GaelVaroquaux · 2014-08-19T05:11:12Z

sklearn/linear_model/logistic.py

@@ -308,6 +372,13 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
        To lessen the effect of regularization on synthetic feature weight
        (and therefore on the intercept) intercept_scaling has to be increased.

+    multi_class : str, optional default 'ovr'


Here, the usual standard is to write, instead of "str", "{'ovr', 'multinomial'}". I think that I prefer that latter option.

MechCoder · 2014-08-19T09:00:18Z

@GaelVaroquaux Are we good to go now?

coveralls · 2014-08-19T09:13:00Z

Coverage increased (+0.01%) when pulling 9171baf on MechCoder:multinomial_logreg into 05aa804 on scikit-learn:master.

agramfort · 2014-08-20T12:54:45Z

+1 for merge on my side.

MechCoder · 2014-08-20T15:27:40Z

@GaelVaroquaux I suppose we can move cleaning up to another PR. So I can haz merge?

agramfort · 2014-08-21T15:29:46Z

shall I rebase and merge?

GaelVaroquaux · 2014-08-21T15:31:23Z

shall I rebase and merge?

Yeah. I am currently in a criss-cross lock-in of urgent matters (papers,
code reviews, grant submission). Completely unproductive...

MechCoder · 2014-08-21T15:33:41Z

@agramfort Sure. But is there a necessity? 13 commits always look better than a single one :p

agramfort · 2014-08-21T15:36:44Z

rebase != squash :)

I'll do it now.

agramfort · 2014-08-21T15:39:04Z

merged by rebase.

nice work @MechCoder !

MechCoder · 2014-08-21T16:05:06Z

Thanks everyone for reviews.

MechCoder mentioned this pull request Jul 28, 2014

[WIP] Multinomial logistic regression #2814

Closed

MechCoder changed the title ~~Multinomial Logistic Regression~~ [WIP] Multinomial Logistic Regression Jul 29, 2014

larsmans reviewed Jul 30, 2014
View reviewed changes

MechCoder reviewed Jul 31, 2014
View reviewed changes

agramfort reviewed Aug 2, 2014
View reviewed changes

DOC: Made the docs for LogisticRegression clearer

b2133eb

MechCoder changed the title ~~[MRG+1] Multinomial Logistic Regression~~ [MRG+2] Multinomial Logistic Regression Aug 18, 2014

FIX: PEP8 Errors and unused imports

2f722bb

GaelVaroquaux reviewed Aug 19, 2014
View reviewed changes

DOC: Changed docstring style for optional arguments

9171baf

agramfort closed this Aug 21, 2014

MechCoder deleted the multinomial_logreg branch August 21, 2014 15:39

[MRG+2] Multinomial Logistic Regression #3490

[MRG+2] Multinomial Logistic Regression #3490

Conversation

MechCoder commented Jul 28, 2014

MechCoder commented Jul 29, 2014

larsmans commented Jul 30, 2014

Choose a reason for hiding this comment

larsmans commented Jul 30, 2014

MechCoder commented Jul 30, 2014

MechCoder commented Jul 30, 2014

larsmans commented Jul 30, 2014

MechCoder commented Jul 30, 2014

jnothman commented Jul 30, 2014

MechCoder commented Jul 30, 2014

agramfort commented Jul 30, 2014

MechCoder commented Jul 30, 2014

MechCoder commented Jul 30, 2014

MechCoder commented Jul 30, 2014

coveralls commented Jul 30, 2014

larsmans commented Jul 30, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MechCoder commented Jul 31, 2014

agramfort commented Jul 31, 2014

MechCoder commented Jul 31, 2014

coveralls commented Aug 1, 2014

MechCoder commented Aug 1, 2014

Choose a reason for hiding this comment

agramfort commented Aug 2, 2014

MechCoder commented Aug 18, 2014

jnothman commented Aug 18, 2014

GaelVaroquaux commented Aug 18, 2014

MechCoder commented Aug 18, 2014

MechCoder commented Aug 18, 2014

coveralls commented Aug 18, 2014

coveralls commented Aug 18, 2014

coveralls commented Aug 18, 2014

Choose a reason for hiding this comment

MechCoder commented Aug 19, 2014

coveralls commented Aug 19, 2014

agramfort commented Aug 20, 2014

MechCoder commented Aug 20, 2014

agramfort commented Aug 21, 2014

GaelVaroquaux commented Aug 21, 2014

MechCoder commented Aug 21, 2014

agramfort commented Aug 21, 2014

agramfort commented Aug 21, 2014

MechCoder commented Aug 21, 2014