Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a clustermq backend with persistent workers #425

Closed
wlandau opened this issue Jun 19, 2018 · 6 comments
Closed

Implement a clustermq backend with persistent workers #425

wlandau opened this issue Jun 19, 2018 · 6 comments

Comments

@wlandau
Copy link
Member

wlandau commented Jun 19, 2018

https://github.com/mschubert/clustermq. Looks promising for fast transient workers.

@wlandau
Copy link
Member Author

wlandau commented Jun 26, 2018

I just tried clustermq on SGE, and workers spin up super fast! I am eager to develop clustermq-based backends for both persistent and transient workers. May be awhile before I can make the time, though.

@wlandau
Copy link
Member Author

wlandau commented Jun 27, 2018

From #431, just as I thought, clustermq does not solve the target-level overhead problems for persistent workers. The best way to use clustermq is transient workers with caching on the master process.

@kendonB
Copy link
Contributor

kendonB commented Jun 29, 2018

As I've mentioned in another issue, caching on the master process is going to be slow for I/O heavy jobs. I've seen speed improvements on I/O heavy jobs on a GPFS file system using up to about 200 simultaneous workers. If those jobs were caching using a single process, it wouldn't be much better than doing the compute stage with a single process.

This is not to discourage you from pursuing clustermq, just to consider having an option to having caching done by the workers.

@wlandau
Copy link
Member Author

wlandau commented Jun 29, 2018

Sure, I understand. I am thinking clustermq is be ideal for situations with large numbers of small targets, where the overhead of launching remote jobs and accessing NFS is the bottleneck.

@kendonB
Copy link
Contributor

kendonB commented Jul 4, 2018

One thing for the todo here would be to expose the template argument to clustermq::workers

@wlandau
Copy link
Member Author

wlandau commented Aug 8, 2018

Re #425 (comment): great idea, @kendonB. I did not have enough bandwidth to address this detail back in July, but I just added it in 3288dc5.

@wlandau wlandau changed the title Consider a new clustermq backend Implement a clustermq backend with persistent workers Aug 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants