New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Parallelization for Training and Inference #129
Conversation
Right, would have add check for sklearn version to ensure it is available. You could move the creation of self.pool into SSVM base object. I think even just using a pre-instantiated Pool would work, just have to do more of the annoying error handling and stuff required when you use multiprocessing directly. You could also conditionally instantiate and terminate self.pool in the fit method. |
Thanks for the PR. |
Hey, sure! So currently I am playing around with this on an r3.4xlarge ec2 box that I have running for other tasks, so 16 vcpu and 122 gb RAM. So not every day capacity. I can't really share data that I am using, but it is text data and I am currently just looking at the MultilabelClf and the GridCrf models. I will mock up a more detailed example to show you what I'm talking about on data that looks more similar to mine. But just using your multi_label.py example, adding ``n_jobs=-1 running first with my current master branch, using vanilla Pool not MemmapingPool:
and then with your master, using Parallel:
I'll run a couple other examples to get similar comparisons for other datasets and models. |
Sorry, did you mean |
Yes |
These are the changes to the multi_label.py example: full_ssvm = OneSlackSSVM(full_model, inference_cache=50, C=.1, tol=0.01, n_jobs=-1)
tree_ssvm = OneSlackSSVM(tree_model, inference_cache=50, C=.1, tol=0.01, n_jobs=-1)
independent_ssvm = OneSlackSSVM(independent_model, C=.1, tol=0.01, n_jobs=-1) |
That looks pretty promising. I want to run a couple more tests on the larger graphs (or you can, if you like). I'd like to understand what is happening a bit better. maybe @ogrisel can help. I did not think |
No, it's even worse on larger models. The models I'm playing around with are much larger than these and the problem gets worse with larger input matrices and actually worse with each iteration. I have run into this before in other contexts. I think it is just because when Parallel reinstantiates the pool with every call and so stuff has to get copied to the subprocesses again each loop. Or whatever the correct way of saying that actually is. :) Yeah I'd be happy to run them, did you have any specific examples in mind? Btw, here's the benchmarks for multilabel.py for the yeast dataset: My master, using Pool:
and current master using Parallel:
Sidenote: kudos on this package, it's very exciting! |
Thanks. I hope to improve it a lot in the near future, with respect to documentation, fexibility and sparse matrix support. |
Ok those two are running now. Taking the approach I have here, with self.pool, I'd have to use https://docs.python.org/2/library/copy_reg.html to register a pickler that is aware of the pool and ignores it when dumping and recreates it if necessary when loading. I've done that before for very similar use cases, it's not that painful. I think that's why the checks are failing? |
Not seeing as much of difference in these examples: image segmentation benchmarks using Pool:
and using Parallel:
and plot_snakes.py, using Pool:
and using Parallel:
|
Also following up on sidenote, I have done some work to get sparse matrix support, have you already made lots of progress on that? If so, maybe a feature branch would make sense, so no one reinvents your wheels? |
Yes I would like to experiment with a version of joblib that would be able to reuse an existing instance of a pool of worker processes. Typically I would like joblib to create a single pool with |
The default multiprocessing Pool is very restricted. The
|
@ogrisel yes my intention definitely would be to use MemmapingPool in any real implementation, I have reverted to Pool just because one of the travis checks is using a version of scikit-learn without the |
@ogrisel How were you thinking of referencing the pool, namespace wise? I have had difficulty figuring out a way to do this in other applications that didn't involve ugly global hacks, if that makes sense, but I may be missing some key piece. |
@amueller so if i did some further work to try and implement this approach more fully, would you consider inclusion in the project, obviously after review and approval and whatever else? or would you want to build it out yourself? |
if it does interest you, happy to take marching orders before undertaking it. will probably do some version of it for use internally at my job, but would rather do it in a way that can be pushed back into the project and conform to your standards and practices. |
I enabled use of the learner pool for inference as well, got a few more seconds for the multilabel examples and some noticeable improvement on the other examples, rerunning those now. Still failing four tests in the crammer singer test file, can't figure out why. multilabel, scenes dataset:
multilabel, yeast dataset:
|
|
@amueller i still can't figure out why those four crammer-singer tests are failing. probably something simple I just don't know about. |
multilabel.py benchmarks, for
respectively: (nlp-venv)nlp@ip-10-234-187-58:~/benchmarks$ time python multi_label.py
fitting independent model...
fitting full model...
fitting tree model...
Training loss independent model: 0.066612
Test loss independent model: 0.111204
Training loss tree model: 0.059868
Test loss tree model: 0.106048
Training loss full model: 0.049408
Test loss full model: 0.097408
real 1m32.018s
user 2m1.699s
sys 0m13.477s
(nlp-venv)nlp@ip-10-234-187-58:~/benchmarks$ time python multi_label.py
fitting independent model...
fitting full model...
fitting tree model...
Training loss independent model: 0.066612
Test loss independent model: 0.111204
Training loss tree model: 0.059868
Test loss tree model: 0.106048
Training loss full model: 0.049408
Test loss full model: 0.097408
real 0m49.551s
user 1m29.923s
sys 0m2.540s
(nlp-venv)nlp@ip-10-234-187-58:~/benchmarks$ time python multi_label.py
fitting independent model...
fitting full model...
fitting tree model...
Training loss independent model: 0.066612
Test loss independent model: 0.111204
Training loss tree model: 0.059868
Test loss tree model: 0.106048
Training loss full model: 0.049408
Test loss full model: 0.097408
real 1m46.923s
user 2m53.827s
sys 0m16.664s
(nlp-venv)nlp@ip-10-234-187-58:~/benchmarks$ |
After some extensive testing, I'm actually leaning towards having vanilla |
Hey. Just wanted to say this is not forgotten ;) I'll try to work on it next week. I had a bunch of scikit-learn stuff to do as we want to release soon. Btw, have you looked into sparse matrix support any more? |
No worries, I follow scikit-learn repo, so I figured that was it, given the crazy number of times you've popped up in the newsfeed. :) RE: sparse matrix support, I think I am missing some fundamental pieces of how to get there from here, honestly. The stuff I came up with is pretty janky and probably not a good starting point. I have been using LSA (TruncatedSVD) in my prototype, to temporarily get around the sparse support issue. I am definitely willing and able to work on it but I think I would need a bit of guidance from you to make real progress. If you have a bit of time, maybe you could do a high level brain dump of how you envision the best way to implement it and I could work off of that. |
If you don't have a working prototype, I think I'll just try and work it out. I wouldn't want to point in a wrong direction. |
Yes, I think that's probably best, if you have the time. |
Conflicts: pystruct/learners/subgradient_latent_ssvm.py pystruct/utils/inference.py
Hey so I am trying to get my master (with parallelization changes) up to date with your changes and i am failing some tests in the docs and in python3. I am currently using my fork as the dependency for stuff I am working on. Totally willing to do the work to make those tests pass, but didn't want to waste time if you think my implementation is a nonstarter. Again, no worries if so, I am actually reusing the parallel utils I created here in other stuff so it wasn't a waste of time on my end regardless :). |
I'm sorry I still haven't looked at your contribution. On 04/23/2015 11:47 AM, Robert Mealey wrote:
|
It looks like the doctest failure are actually caused by your changes. Cheers, On 04/23/2015 11:47 AM, Robert Mealey wrote:
|
Sorry, yeah, my comment wasn't all that clear. |
I think your approach is good, and it would be great if you could try to On 04/23/2015 03:55 PM, Robert Mealey wrote:
|
Ok will do. I actually have made a bit more progress on sparse matrix support, still pretty hacky but I will put it up in a feature branch if you want to take a look. |
On 04/25/2015 12:32 PM, Robert Mealey wrote:
Cheers, |
It would be interesting to bench your approach against the improved joblib here: It still allocates many more pools then necessary, so it might not be as good. |
Also, pystruct has gotten not enough attention lately, sorry :-/ |
the branch I mentioned above was merged into scikit-learn. Can you check your patch against current pystruct with the dev branch of scikit-learn please? |
Also there is a new context manager API in the work at: joblib/joblib#221. Feel free to test that branch with the monkey-patch mentioned in the first comment. |
I'll try and do this by the end of this weekend. Bit swamped at the moment. |
Hey a long time later, is this still an optimization you're interested in pursuing? If you've come up with a better solution that I missed or this isn't the direction you want to go, happy to close this PR. |
oh I see that I said I would try against latest joblib. I can do that it if you haven't, if this is still something worth looking into. |
Hey. Yeah so the latest joblib has pools, so that might be a simpler solution than yours. I haven't benchmarked it, and I think it would be worth to try. I'd love to have the better of the two in ;) |
Jan 8, 2015 EDIT: Adding TL;DR to this original comment to make it easier to review, since this PR has become something of a monster.
Summary of changes:
Added mixin class
ParallelMixin
in new pystruct.utils.parallel that does the following:_spawn_pool
to spawn a pool as attributeThreadPool
,MemmapingPool
orPool
, depending on ecosystem and parametersparallel
with same api as standard librarymap
, taking as args function to be mapped and mappable args iterable__getstate__
and__setstate__
handle pool attributeparallel
method handlesKeyboardInterrupt
somewhat gracefully, allowing for user interruption that stops training but preserves model object and pool in usable forms.Parallel
calls and any parallelization logic from anywhere in module other thanutils.parallel
Original comment: