[MRG] Parallelize OvR method in primal_cd #13

MechCoder · 2014-09-17T15:18:57Z

@mblondel I was trying to get some speed gains by parallelizing the OvR method. However when I set n_jobs>1 it keeps failing with this error, TypeError: __cinit__() takes exactly 1 positional argument (0 given). Note that it works like how it is supposed to for n_jobs=1

mblondel · 2014-09-18T01:04:35Z

I think the function to parallelize must be pure (no side effects).

MechCoder · 2014-09-18T09:07:38Z

@mblondel Do you mean somehow X gets deleted in between __init__ and __cinit__ . or is it because we are modifying coef_ in place across all processes. Or is it something else?

mblondel · 2014-09-18T09:15:46Z

Because coef_ is modified in place.

mblondel · 2014-09-18T09:17:47Z

I'd love to remove the gil in the CD code in lightning. This would allow to use threading like for random forests in scikit-learn.

mblondel · 2014-09-18T09:20:13Z

I think the function to parallelize must be pure (no side effects).

Please confirm with Gael.

MechCoder · 2014-09-18T09:20:30Z

If the GIL is disabled, it will allow to have shared memory right then we would not run into such problems?
Also the error message is highly non-intuitive to me.

mblondel · 2014-09-18T09:21:19Z

Can you make your contributions to lightning first? We can discuss merging to scikit-learn eventually.

MechCoder · 2014-09-18T09:21:30Z

@ogrisel @GaelVaroquaux Can you please have a quick look?

ogrisel · 2014-09-18T15:35:27Z

The type error message is strange but I an not familiar with this code.

If the GIL is disabled, it will allow to have shared memory right then we would not run into such problems?

The GIL is never "disabled". When the GIL is released, it means that you can use threading efficiently on multiple CPUs (Parallel with backend='threading') instead of using the default multiprocessing. And yes the main difference between threading and multiprocessing is that the memory is shared by default between threads of the same process while is not shared by default between different processes.

Releasing the GIL will not change anything if you don't use multiple threads.

MechCoder · 2014-09-19T15:36:19Z

@ogrisel Thanks for your explanation (as usual).

My question was that when we use a multiprocessing backend, we should make sure that the data shared among all processes should not be modified in place right? For instance, here coef is modified across all processes. So I wanted to clarify if this could be the source of an error.

ogrisel · 2014-09-19T16:54:18Z

As I said, when you use the multiprocessing backend of joblib (the default backend), nothing is shared, unless you use memory mapping for the input data (usually readonly so not a problem) or some fitted parameters such as coef. It's better to use the threading backend to concurrently update fitted parameters. Concurrent writes on memory mapped data is likely to cause platform specific technical issues (although I have not much experience with that.

Having concurrent parameter updates can break theoretical convergence. It depends on the algo. I think for coordinate descent you typically have many more coordinates than CPUs. If you select the coordinate to update randomly, the likelyhood of having 2 threads or processes update the same coordinate concurrently is very low, so it should not cause much problems in practice.

HogWild! for running concurrent SGD (not CD) on very sparse data demonstrates that in practice this can be very beneficial. My intuition is that doing lock free concurrent stochastic Coordinate Descent should work very well as well as long as the number of features is significantly larger than the number of threads.

ogrisel · 2014-09-19T16:56:45Z

Also the speed up might be harmed by the problem of false sharing but it would require extensive experiments to see how detrimental it is in practice for CD on non-toy problems.

MechCoder · 2014-09-19T17:32:09Z

Okay, I was just confused because of this comment by Mathieu,

Because coef_ is modified in place.

.

Having concurrent parameter updates can break theoretical convergence.

Well here the parameter updates are technically independent across every job, since it is a OvA implementation, and the update done by each job should be independent of the other.

So the solution now is to have a GIL released threading based backend? But I still do not understand the error when the multiprocessing backend is enabled. What could be the possible reason as no memory is shared among all processes?

My intuition according to my work in the cd_fast solver was that across each job, the initialized coefs supplied to each job (and which are modified in place) are different. But in this code, the same coef is supplied across all jobs, which is modified simultaneously. So I think this would be a problem right?

ogrisel · 2014-09-19T17:45:07Z

What could be the possible reason as no memory is shared among all processes?

~~Again: processes do not share any memory by default. Only thread do. This is the main difference between a thread and a process.~~

Sorry I did not read the sentence correctly. I don't know about the cython error but it seems completely unrelated to sharing memory or not.

MechCoder · 2014-09-19T17:58:48Z

Thats what I had meant! :)

On Fri, Sep 19, 2014 at 7:45 PM, Olivier Grisel notifications@github.com
wrote:

What could be the possible reason as no memory is shared among all
processes?

Again: processes do not share any memory by default. Only thread do. This
is the main difference between a thread and a process.

—
Reply to this email directly or view it on GitHub
https://github.com/mblondel/lightning/pull/13#issuecomment-56210315.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

MechCoder · 2014-09-19T17:59:18Z

I see you have edited your comment. sorry

mblondel · 2014-09-19T20:02:32Z

Well, I've never seen joblib.Parellel used like that in scikit-learn with the multi-processing backend. So perhaps the error message is unrelated but in any case what you've tried won't work as memory is not shared (and the fact that the code works with n_jobs=1 but not with n_jobs > 1 does seem to indicate that the problem is with the way you parallelize!). What we usually do in scikit-learn is to allocate memory to be written from within the function to be parallelized and return the array (not write it in place). Then we agglomerate the arrays to form a 2d array.

@ogrisel What @MechCoder is doing is embarassingly parallel.

MechCoder · 2014-09-19T20:07:02Z

What we usually do in scikit-learn is to allocate memory to be written from within the function to be parallelized and return the array

Yes I do know that, I will give it a shot now.

MechCoder · 2014-09-19T20:11:22Z

I thought "embarassingly" was an adjective, till I googled both together :)

mblondel · 2014-09-19T20:26:50Z

Technically, it's an adverb :)

MechCoder · 2014-09-25T09:51:56Z

@mblondel I'm sorry that "now" meant 6 days later, but I believe that I have made the data that each OvR loop writes in place is independent of one other, Can you just have a quick look? If what I have done is right, then its not with the parallelization but with something deeper.

MechCoder · 2014-09-25T13:07:13Z

@mblondel This seems to be a bug with joblib, I filed a bug report here, joblib/joblib#169

MechCoder · 2014-09-27T18:32:16Z

@mblondel The only solution is to have a threading based backend to overcome this error. wdyt?

mblondel · 2014-09-28T12:16:38Z

I would start by trying to find why the dataset classes don't pickle.

MechCoder · 2014-09-28T12:25:44Z

This is because of the cinit constructor. I have provided a minimal
example in the joblib issue report.

On Sun, Sep 28, 2014 at 2:16 PM, Mathieu Blondel notifications@github.com
wrote:

I would start by trying to find why the dataset classes don't pickle.

—
Reply to this email directly or view it on GitHub
https://github.com/mblondel/lightning/pull/13#issuecomment-57083720.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

mblondel · 2014-09-28T12:49:11Z

How can this be fixed?

MechCoder · 2014-09-28T22:17:25Z

How can this be fixed?

I'm not sure if this question is rhetoric, but I think I figured it out, https://github.com/mblondel/lightning/pull/15

mblondel · 2014-09-29T00:58:34Z

No that was an actual question ;-)

Return the coefs and errors from primal_cd modified inplace.

MechCoder · 2014-10-06T15:12:49Z

@mblondel Finally done with this. All tests pass, so I assume it works. I did not test the timing vigorously, but it does cut down the timing from this example http://www.mblondel.org/lightning/auto_examples/document_classification_news20.html#document-classification-news20-py from 10 to 5 odd seconds.

mblondel · 2014-10-07T02:47:37Z

lightning/impl/primal_cd.py

@@ -194,7 +200,7 @@ def __init__(self, loss="squared_hinge", penalty="l2", multiclass=False,
                 warm_debiasing=False,
                 selection="cyclic", permute=True,
                 callback=None, n_calls=100,
-                 random_state=None, verbose=0):
+                 random_state=None, verbose=0, n_jobs=-1):


I would prefer n_jobs=1 by default.

Is there any reason for this?

Is this because the overhead to do Parallel computation, would be much more than the benefits in small datasets?

we don't want to use all cores silently

multiprocessing has some issues on Windows

for consistency with scikit-learn

Perhaps @ogrisel or @GaelVaroquaux know other more compelling arguments.

MechCoder · 2014-10-07T10:50:36Z

@mblondel done. merge?

mblondel · 2014-10-08T00:28:12Z

Give me some time. I want to carefully review this.

MechCoder · 2014-10-08T07:34:22Z

Give me some time.

Sure :)

MechCoder · 2014-10-21T18:07:42Z

@mblondel Any news on this? ;)

mblondel · 2014-11-18T13:52:43Z

Finally had the time to review and merge. Sorry it took so long! Thank you @MechCoder :)

MechCoder · 2014-11-18T13:54:36Z

great, I thought there were merge conflicts. did you remove them manually?

mblondel · 2014-11-18T14:11:05Z

The conflicts were in the generated cpp so I just regenerated it.

MechCoder · 2014-11-18T14:19:58Z

thanks !

On Tue, Nov 18, 2014 at 3:11 PM, Mathieu Blondel notifications@github.com
wrote:

The conflicts were in the generated cpp so I just regenerated it.

—
Reply to this email directly or view it on GitHub
https://github.com/mblondel/lightning/pull/13#issuecomment-63475830.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

MechCoder changed the title ~~Parallelize OvR method in primal_cd~~ [WIP] Parallelize OvR method in primal_cd Sep 18, 2014

MechCoder mentioned this pull request Sep 25, 2014

Better error if non-picklable objects are passed to Parallel with a multiprocessing backend joblib/joblib#169

Closed

MechCoder force-pushed the ovr branch 4 times, most recently from cbbc2b9 to ae80b66 Compare September 29, 2014 13:59

MechCoder mentioned this pull request Oct 6, 2014

FIX: Make RandomState picklable #17

Closed

Parallelize OvR method in primal_cd

382471c

MechCoder force-pushed the ovr branch from ae80b66 to 382471c Compare October 6, 2014 13:54

FIX: Parallelised implementation finally works.

36952ae

Return the coefs and errors from primal_cd modified inplace.

MechCoder changed the title ~~[WIP] Parallelize OvR method in primal_cd~~ [MRG] Parallelize OvR method in primal_cd Oct 6, 2014

mblondel reviewed Oct 7, 2014
View reviewed changes

FIX: Change default n_jobs from -1 to 1

1239b8f

MechCoder force-pushed the ovr branch from e07a310 to 1239b8f Compare October 7, 2014 10:50

mblondel merged commit 1239b8f into scikit-learn-contrib:master Nov 18, 2014

MechCoder deleted the ovr branch November 18, 2014 13:54

[MRG] Parallelize OvR method in primal_cd #13

[MRG] Parallelize OvR method in primal_cd #13

Conversation

MechCoder commented Sep 17, 2014

mblondel commented Sep 18, 2014

MechCoder commented Sep 18, 2014

mblondel commented Sep 18, 2014

mblondel commented Sep 18, 2014

mblondel commented Sep 18, 2014

MechCoder commented Sep 18, 2014

mblondel commented Sep 18, 2014

MechCoder commented Sep 18, 2014

ogrisel commented Sep 18, 2014

MechCoder commented Sep 19, 2014

ogrisel commented Sep 19, 2014

ogrisel commented Sep 19, 2014

MechCoder commented Sep 19, 2014

ogrisel commented Sep 19, 2014

MechCoder commented Sep 19, 2014

MechCoder commented Sep 19, 2014

mblondel commented Sep 19, 2014

MechCoder commented Sep 19, 2014

MechCoder commented Sep 19, 2014

mblondel commented Sep 19, 2014

MechCoder commented Sep 25, 2014

MechCoder commented Sep 25, 2014

MechCoder commented Sep 27, 2014

mblondel commented Sep 28, 2014

MechCoder commented Sep 28, 2014

mblondel commented Sep 28, 2014

MechCoder commented Sep 28, 2014

mblondel commented Sep 29, 2014

MechCoder commented Oct 6, 2014

mblondel Oct 7, 2014

Choose a reason for hiding this comment

MechCoder Oct 7, 2014

Choose a reason for hiding this comment

MechCoder Oct 7, 2014

Choose a reason for hiding this comment

mblondel Oct 7, 2014

Choose a reason for hiding this comment

MechCoder commented Oct 7, 2014

mblondel commented Oct 8, 2014

MechCoder commented Oct 8, 2014

MechCoder commented Oct 21, 2014

mblondel commented Nov 18, 2014

MechCoder commented Nov 18, 2014

mblondel commented Nov 18, 2014

MechCoder commented Nov 18, 2014