[MRG] Parallelize OvR method in primal_cd #13
Conversation
I think the function to parallelize must be pure (no side effects). |
@mblondel Do you mean somehow X gets deleted in between |
Because |
I'd love to remove the gil in the CD code in lightning. This would allow to use threading like for random forests in scikit-learn. |
Please confirm with Gael. |
If the GIL is disabled, it will allow to have shared memory right then we would not run into such problems? |
Can you make your contributions to lightning first? We can discuss merging to scikit-learn eventually. |
@ogrisel @GaelVaroquaux Can you please have a quick look? |
The type error message is strange but I an not familiar with this code.
The GIL is never "disabled". When the GIL is released, it means that you can use threading efficiently on multiple CPUs (Parallel with Releasing the GIL will not change anything if you don't use multiple threads. |
@ogrisel Thanks for your explanation (as usual). My question was that when we use a multiprocessing backend, we should make sure that the data shared among all processes should not be modified in place right? For instance, here |
As I said, when you use the multiprocessing backend of joblib (the default backend), nothing is shared, unless you use memory mapping for the input data (usually readonly so not a problem) or some fitted parameters such as Having concurrent parameter updates can break theoretical convergence. It depends on the algo. I think for coordinate descent you typically have many more coordinates than CPUs. If you select the coordinate to update randomly, the likelyhood of having 2 threads or processes update the same coordinate concurrently is very low, so it should not cause much problems in practice. HogWild! for running concurrent SGD (not CD) on very sparse data demonstrates that in practice this can be very beneficial. My intuition is that doing lock free concurrent stochastic Coordinate Descent should work very well as well as long as the number of features is significantly larger than the number of threads. |
Also the speed up might be harmed by the problem of false sharing but it would require extensive experiments to see how detrimental it is in practice for CD on non-toy problems. |
Okay, I was just confused because of this comment by Mathieu,
.
Well here the parameter updates are technically independent across every job, since it is a OvA implementation, and the update done by each job should be independent of the other. So the solution now is to have a GIL released threading based backend? But I still do not understand the error when the multiprocessing backend is enabled. What could be the possible reason as no memory is shared among all processes? My intuition according to my work in the |
Sorry I did not read the sentence correctly. I don't know about the cython error but it seems completely unrelated to sharing memory or not. |
Thats what I had meant! :) On Fri, Sep 19, 2014 at 7:45 PM, Olivier Grisel notifications@github.com
Godspeed, |
I see you have edited your comment. sorry |
Well, I've never seen @ogrisel What @MechCoder is doing is embarassingly parallel. |
Yes I do know that, I will give it a shot now. |
I thought "embarassingly" was an adjective, till I googled both together :) |
Technically, it's an adverb :) |
@mblondel I'm sorry that "now" meant 6 days later, but I believe that I have made the data that each OvR loop writes in place is independent of one other, Can you just have a quick look? If what I have done is right, then its not with the parallelization but with something deeper. |
@mblondel This seems to be a bug with joblib, I filed a bug report here, joblib/joblib#169 |
@mblondel The only solution is to have a threading based backend to overcome this error. wdyt? |
I would start by trying to find why the dataset classes don't pickle. |
This is because of the cinit constructor. I have provided a minimal On Sun, Sep 28, 2014 at 2:16 PM, Mathieu Blondel notifications@github.com
Godspeed, |
How can this be fixed? |
I'm not sure if this question is rhetoric, but I think I figured it out, https://github.com/mblondel/lightning/pull/15 |
No that was an actual question ;-) |
cbbc2b9
to
ae80b66
Compare
Return the coefs and errors from primal_cd modified inplace.
@mblondel Finally done with this. All tests pass, so I assume it works. I did not test the timing vigorously, but it does cut down the timing from this example http://www.mblondel.org/lightning/auto_examples/document_classification_news20.html#document-classification-news20-py from 10 to 5 odd seconds. |
@@ -194,7 +200,7 @@ def __init__(self, loss="squared_hinge", penalty="l2", multiclass=False, | |||
warm_debiasing=False, | |||
selection="cyclic", permute=True, | |||
callback=None, n_calls=100, | |||
random_state=None, verbose=0): | |||
random_state=None, verbose=0, n_jobs=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer n_jobs=1
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the overhead to do Parallel computation, would be much more than the benefits in small datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we don't want to use all cores silently
- multiprocessing has some issues on Windows
- for consistency with scikit-learn
Perhaps @ogrisel or @GaelVaroquaux know other more compelling arguments.
@mblondel done. merge? |
Give me some time. I want to carefully review this. |
Sure :) |
@mblondel Any news on this? ;) |
Finally had the time to review and merge. Sorry it took so long! Thank you @MechCoder :) |
great, I thought there were merge conflicts. did you remove them manually? |
The conflicts were in the generated cpp so I just regenerated it. |
thanks ! On Tue, Nov 18, 2014 at 3:11 PM, Mathieu Blondel notifications@github.com
Godspeed, |
@mblondel I was trying to get some speed gains by parallelizing the OvR method. However when I set
n_jobs>1
it keeps failing with this error,TypeError: __cinit__() takes exactly 1 positional argument (0 given)
. Note that it works like how it is supposed to forn_jobs=1