FIX for SAG with sparse samples. #36

fabianp · 2015-09-07T14:26:22Z

The problem was that when the solution was updated just in time
the different scaling accumulated were not considered. They were
treated as if they had been constant in the last iterations.

This should fix issue #33 , although because of some python 3 incompatibility I've not yet run the full test suite.

The problem was that when the solution was updated just in time the different scaling accumulated were not considered. They were treated as if they had been constant in the last iterations.

fabianp · 2015-09-07T14:27:24Z

The drawback of this solution is that there is a need for an extra n_samples array, but I don't see any other solution. Note that the scikit-learn version uses a similar strategy.

mblondel · 2015-09-07T14:54:35Z

The drawback of this solution is that there is a need for an extra n_samples array

This is not big deal.

Thanks for the fix. Tell me when you run the tests.

mblondel · 2015-09-07T14:57:35Z

Could you add a small test which does the following: run the algorithm on the same data, once with a numpy array and once with a sparse matrix and check that the learned coefficients are the same. The data should include some zero features.

fabianp · 2015-09-08T14:44:51Z

This still needs some work, I realised with the tests that the problem persists for big alphas (alpha > 10).

mblondel · 2015-09-09T05:36:51Z

@TomDLT any idea?

fabianp · 2015-09-09T07:37:44Z

Just realized that the same change needs to be done in the finalize block. I'll try it ASAP

TomDLT · 2015-09-09T08:13:46Z

I mentioned this problem in scikit-learn's SAG PR, and this PR seems to fix it.

Just realized that the same change needs to be done in the finalize block

Correct.

mblondel · 2015-09-09T08:40:54Z

@TomDLT Fabian will add SAGA to lightning, which is why we need to fix SAG first.

fabianp · 2015-09-09T09:44:53Z

Changed that and now the tests pass. It should be fixed.

On a related notice, I noticed that violation_init can be zero and so violation_ratio = violation / violation_init is undefined. There's no exception because of the cython: cdivision=True. I don't know the logic to enough to see what is the best way to deal with this. @mblondel ?

mblondel · 2015-09-09T10:48:04Z

The computation of the violation measure could be wrong too but it's difficult for me to check right now since I am at a conference. Basically the violation measure is just the l2 norm of the gradient. Could you check if this is actually what is being computed?

fabianp · 2015-09-09T10:55:30Z

no problem, I'll look into it.

fabianp · 2015-09-18T15:33:54Z

I solved this by a distinction of cases when violation == 0. This way there is no division by zero.

It now looks good to me.

This was failing ~1 out of 10 times

mblondel · 2015-09-18T16:11:10Z

Thanks, merging.

FIX for SAG with sparse samples.

mblondel · 2015-09-18T16:29:27Z

I made a small cosmit in master.

FIX for SAG with sparse samples.

681f6f9

The problem was that when the solution was updated just in time the different scaling accumulated were not considered. They were treated as if they had been constant in the last iterations.

fabianp added 4 commits September 8, 2015 16:21

add test

617e234

Merge branch 'Py3k' into sag_fix

0622a17

Add test case

878312a

Add failing test

ef0afeb

FIX: Correct scaling in finalize

16195ff

fabianp force-pushed the sag_fix branch from c136828 to 16195ff Compare September 9, 2015 09:42

Distinction of cases with violation == 0.

7970b2e

Make tests deterministic

d4ce7c1

This was failing ~1 out of 10 times

mblondel added a commit that referenced this pull request Sep 18, 2015

Merge pull request #36 from fabianp/sag_fix

fc443f5

FIX for SAG with sparse samples.

mblondel merged commit fc443f5 into scikit-learn-contrib:master Sep 18, 2015

mblondel mentioned this pull request Sep 18, 2015

SAG incorrect with sparse data #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX for SAG with sparse samples. #36

FIX for SAG with sparse samples. #36

fabianp commented Sep 7, 2015

fabianp commented Sep 7, 2015

mblondel commented Sep 7, 2015

mblondel commented Sep 7, 2015

fabianp commented Sep 8, 2015

mblondel commented Sep 9, 2015

fabianp commented Sep 9, 2015

TomDLT commented Sep 9, 2015

mblondel commented Sep 9, 2015

fabianp commented Sep 9, 2015

mblondel commented Sep 9, 2015

fabianp commented Sep 9, 2015

fabianp commented Sep 18, 2015

mblondel commented Sep 18, 2015

mblondel commented Sep 18, 2015

FIX for SAG with sparse samples. #36

FIX for SAG with sparse samples. #36

Conversation

fabianp commented Sep 7, 2015

fabianp commented Sep 7, 2015

mblondel commented Sep 7, 2015

mblondel commented Sep 7, 2015

fabianp commented Sep 8, 2015

mblondel commented Sep 9, 2015

fabianp commented Sep 9, 2015

TomDLT commented Sep 9, 2015

mblondel commented Sep 9, 2015

fabianp commented Sep 9, 2015

mblondel commented Sep 9, 2015

fabianp commented Sep 9, 2015

fabianp commented Sep 18, 2015

mblondel commented Sep 18, 2015

mblondel commented Sep 18, 2015