-
Notifications
You must be signed in to change notification settings - Fork 22
Switch to parallel gem #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Yeah I'd rather see this against master, otherwise there's random changes that make it hard to see what's actually changing here. Is file descriptor contention an issue you're seeing with the current multi-process scheme rather than oversubscription of the processors? |
11092ac to
d66324a
Compare
|
I'm pretty sure the issue is that the current master code does is :
I remember getting a sluggish machine, and it would sometimes start throwing "Too many open file descriptors errors" out of this. Don't know if it's CPU contention, SSD thrashing or whatever, but docurium was definitely mostly-unusable here, as there are no guarantee any run will be a complete success, thus sometimes causing complete versions to be missing from the final repository. |
|
Right, I was curious as to how you pinpointed file description contention as the source of the issue. I'm pretty sure it's due to CPU oversaturation, but I'd be impressed if we managed to thrash on an SSD ;) The issue with too many open files is most likely due to lack of garbage collection on the repository, unless your ulimits are low. This gem seems really useful, but I'd still want the ability to set how many processes I want to run. |
|
It's possible, via either |
d66324a to
a8b9abf
Compare
a8b9abf to
f2f084d
Compare
c6dc701 to
3a6b097
Compare
3a6b097 to
3856485
Compare
|
It looks like we had some mismerges into master but as I'm currently restricted to my work laptop, the master branch makes it really unhappy so a patch is gonna make it here to fix things |
Due to the `if` we'd return `nil` if the message was not an array.
#26 originally removed all traces of parallelism in favor of less file-descriptor contention. It turns out Ruby has gems, and some of those provide ready-made, easy-to-use wrappers for parallel processing.
Use https://github.com/grosser/parallel instead of rolling our own, or removing parallelization altogether.
(Note that this is based on #39, for historic reasons. I can rebase that on top of master if it makes it easier to go in).