[MRG] Just in time SAGA. #40

zermelozf · 2015-11-16T17:03:14Z

A squashed version of #38 ontaining:

SAGA algorithm in cython.
Basic python version of SAG and SAGA for testing.
Support for proximity operators through the Penalty base class.
L1 proximity operator with just in time update for sparse data.

mblondel · 2015-11-17T03:39:54Z

lightning/impl/sag_fast.pyx

+                                double* w,
+                                int* indices,
+                                double stepsize,
+                                double* w_scale,


w_scale isn't used. Does it mean that you don't support elastic net?

It does, the scale is used through scale_cum. The w_scale is maintained in case some prox wants to overwrite it.

Thanks for the clarification. I would remove w_scale from the function signature until we have an actual use case. Also, I don't see any test for elastic net so this means that scale_cum is not tested.

fabianp · 2015-11-17T09:57:40Z

This might also be of interest to @adefazio @agramfort @TomDLT

agramfort · 2015-11-17T15:39:20Z

thx

how does it compare in terms of perf with sklearn SAG version?

zermelozf · 2015-11-18T13:59:51Z

@agramfort We have updated this gist with sklearn's LogisticRegression.

Multiclass (OVR)	time (sec.)	score
SAGClassifier	8.727	0.1676
SAGAClassifier	12.245	0.1757
(SAG)LogisticRegression	12.061	0.1750

I have assumed that lightning does OVR for multi class problems (@mblondel correct me if I misunderstood).

The score discrepancy seems to be caused by the different stopping criteria. We have not (yet) rigorously compared the convergence speed (to the optimum) of SAG vs SAGA, the advantage of SAGA over SAG essentially being possibility to specify an arbitrary proximity operator.

In term of code speed, adding tol=0 leads to a runtime advantage for sklearn for 100 iterations over the whole 20newsgroup dataset.

100 iterations	time (sec.)	score
SAGClassifier	198	0.174872665535
SAGAClassifier	215	0.174872665535
(SAG)LogisticRegression	117	0.174872665535

agramfort · 2015-11-18T14:13:17Z

ok for the benefit of SAGA.

in terms of computation time what I read is that it's pretty much the same.

thx

On Wed, Nov 18, 2015 at 2:59 PM, Arnaud Rachez notifications@github.com
wrote:

@agramfort https://github.com/agramfort We have updated this gist
https://gist.github.com/zermelozf/d4670d6cffea09f6e6f3 with sklearn's
LogisticRegression.
Multiclass (OVR) time (sec.) score SAGClassifier 8.727 0.1676
SAGAClassifier 12.245 0.1757 LogisticRegression 12.061 0.1750

I have assumed that lightning does OVR for multi class problems (@mblondel
https://github.com/mblondel correct me if I misunderstood).

The score discrepancy seems to be caused by the different stopping
criteria. We have not (yet) rigorously compared the convergence speed (to
the optimum) of SAG vs SAGA. The advantage of SAGA over SAG is the
possibility to specify an arbitrary proximity operator.

In term of code speed, adding tol=0 leads to a runtime advantage for
sklearn for 100 iterations over the whole 20newsgroup dataset.
100 iterations time (sec.) score SAGClassifier 198 0.174872665535
SAGAClassifier 215 0.174872665535 LogisticRegression 117 0.174872665535

—
Reply to this email directly or view it on GitHub
https://github.com/mblondel/lightning/pull/40#issuecomment-157720960.

zermelozf · 2015-11-18T14:16:43Z

@agramfort Yes, that is if we don't use any prox in SAGA as benchmarked above. Otherwise the jit updates associated with the prox will slow the computation down.

fabianp · 2015-11-18T14:51:44Z

The interest of SAGA (for me) is in the support for composite loss functions. For smooth problems they read more or less the same in my experience.

fabianp · 2015-11-18T14:53:58Z

Also, comparing them is tricky since it ends up depending on how you choose the step size. Once we have adaptive step size (as described in the Schmidt paper) we could do more meaningful benchmarks

mblondel · 2015-11-18T15:26:52Z

Also IIRC sklearn's implementation of SAG uses a line search.

TomDLT · 2015-11-18T16:09:49Z

No, the sklearn's implementation of SAG uses a constant step size computed using the maximum Lipschitz constant over all samples (cf. get_auto_step_size).

adefazio · 2015-11-19T03:39:08Z

One trick that might help you speed SAGA up is to remove the random sampling of data points. Instead, at the beginning of the algorithm make a copy of the data set with the rows permuted at random (i.e. shuffled). Then every odd epoch access the datapoints in the original dataset in order, and every even epoch access them from the shuffled copy of the dataset in order also. This greatly reduces the number of TLB misses at the expense of requiring twice as much memory.
This trick only works for SAGA or SDCA, if you use it with SAG it will diverge.

fabianp · 2015-11-19T08:38:05Z

Thanks for the information @adefazio, I did not know about it.

@zermelozf can your take into account Mathieu's comments?

mblondel · 2015-11-19T08:51:52Z

Interesting trick but intuitively it should work only for dense data. There is also the cost of copying the data which could take a few dozens of seconds for very large data. So not copying the data actually gives a head start.

zermelozf · 2015-11-19T10:30:47Z

I just removed jj in the last squashed commit.

fabianp · 2015-11-19T12:00:36Z

Can you remove the w_scale from projection_lagged as Mathieu suggested?

fabianp · 2015-11-19T12:02:24Z

Also, you need to add a test for elastic-net (where alpha != 0 AND beta != 0)

zermelozf · 2015-11-19T12:13:41Z

Elastic net test added and w_scale dependency removed in lagged update.

mblondel · 2015-11-19T12:35:01Z

Sounds great. Thanks for the awesome contrib. Are we good to go?

By the way, do you plan to add group lasso later?

fabianp · 2015-11-19T13:31:24Z

Good for me. Just needs to add SAGAClassifier and SAGARegression into doc/references.rst.

I do have a prox for group lasso penalty that I'm using. Will contribute once this is merged if you think its worth (I do think it would be a nice addition).

mblondel · 2015-11-19T14:12:07Z

+1

zermelozf · 2015-11-19T15:12:50Z

@fabianp @mblondel Doc reference updated.

mblondel · 2015-11-19T15:20:25Z

Alright, pressing the green button :)

[MRG] Just in time SAGA.

mblondel · 2015-11-19T15:25:23Z

If you've got time, a SAGA vs. SDCA vs. Adagrad comparison for elastic net would be nice. You can build an example based on this:
http://www.mblondel.org/lightning/auto_examples/plot_l2_solvers.html

FYI SDCA doesn't work well without l2 regul (e.g. l1 regul only).

zermelozf · 2015-11-19T15:29:09Z

Yes, I was planning to do that as suggested by @fabianp as well. Thanks for the link, it seems like you have done 99% of the work already. I'll have a look at it next week after I finish a couple of things.

fabianp · 2015-11-19T15:30:25Z

Great work @zermelozf

zermelozf mentioned this pull request Nov 16, 2015

[WIP] Adding prox capability to SAGA. #38

Closed

3 tasks

mblondel reviewed Nov 17, 2015
View reviewed changes

zermelozf force-pushed the fabian-saga-squashed branch from 2ac44ae to f93fa0c Compare November 19, 2015 10:27

zermelozf force-pushed the fabian-saga-squashed branch from b098612 to 8738319 Compare November 19, 2015 12:12

SAGA algorithm.

93a2f97

zermelozf force-pushed the fabian-saga-squashed branch from 8738319 to 93a2f97 Compare November 19, 2015 15:09

mblondel added a commit that referenced this pull request Nov 19, 2015

Merge pull request #40 from zermelozf/fabian-saga-squashed

4a455f8

[MRG] Just in time SAGA.

mblondel merged commit 4a455f8 into scikit-learn-contrib:master Nov 19, 2015

fabianp deleted the fabian-saga-squashed branch November 19, 2015 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Just in time SAGA. #40

[MRG] Just in time SAGA. #40

zermelozf commented Nov 16, 2015

mblondel Nov 17, 2015

fabianp Nov 17, 2015

mblondel Nov 17, 2015

fabianp commented Nov 17, 2015

agramfort commented Nov 17, 2015

zermelozf commented Nov 18, 2015

agramfort commented Nov 18, 2015

zermelozf commented Nov 18, 2015

fabianp commented Nov 18, 2015

fabianp commented Nov 18, 2015

mblondel commented Nov 18, 2015

TomDLT commented Nov 18, 2015

adefazio commented Nov 19, 2015

fabianp commented Nov 19, 2015

mblondel commented Nov 19, 2015

zermelozf commented Nov 19, 2015

fabianp commented Nov 19, 2015

fabianp commented Nov 19, 2015

zermelozf commented Nov 19, 2015

mblondel commented Nov 19, 2015

fabianp commented Nov 19, 2015

mblondel commented Nov 19, 2015

zermelozf commented Nov 19, 2015

mblondel commented Nov 19, 2015

mblondel commented Nov 19, 2015

zermelozf commented Nov 19, 2015

fabianp commented Nov 19, 2015

[MRG] Just in time SAGA. #40

[MRG] Just in time SAGA. #40

Conversation

zermelozf commented Nov 16, 2015

mblondel Nov 17, 2015

Choose a reason for hiding this comment

fabianp Nov 17, 2015

Choose a reason for hiding this comment

mblondel Nov 17, 2015

Choose a reason for hiding this comment

fabianp commented Nov 17, 2015

agramfort commented Nov 17, 2015

zermelozf commented Nov 18, 2015

agramfort commented Nov 18, 2015

zermelozf commented Nov 18, 2015

fabianp commented Nov 18, 2015

fabianp commented Nov 18, 2015

mblondel commented Nov 18, 2015

TomDLT commented Nov 18, 2015

adefazio commented Nov 19, 2015

fabianp commented Nov 19, 2015

mblondel commented Nov 19, 2015

zermelozf commented Nov 19, 2015

fabianp commented Nov 19, 2015

fabianp commented Nov 19, 2015

zermelozf commented Nov 19, 2015

mblondel commented Nov 19, 2015

fabianp commented Nov 19, 2015

mblondel commented Nov 19, 2015

zermelozf commented Nov 19, 2015

mblondel commented Nov 19, 2015

mblondel commented Nov 19, 2015

zermelozf commented Nov 19, 2015

fabianp commented Nov 19, 2015