Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a nonblocking addprocs_sge (and others) #80

Closed
cako opened this issue Dec 11, 2017 · 5 comments
Closed

Add a nonblocking addprocs_sge (and others) #80

cako opened this issue Dec 11, 2017 · 5 comments

Comments

@cako
Copy link
Contributor

cako commented Dec 11, 2017

Currently the use of addprocs_sge (and the others) is blocking. It must obtain the node before it returns. For example, applying @async to a command which calls addprocs_sge is useless.

I would like to know if is it possible to add a non-blocking version of addprocs_sge. I am happy to help with the coding but am not overly familiar with the addprocs structure.

I suggested a workaround in a question on StackOverflow but am not sure my approach to the issue is recommended.

@bjarthur
Copy link
Collaborator

if your SGE cluster supports qrsh, you might try addprocs_qrsh instead. IIRC, that does not block.

@cako
Copy link
Contributor Author

cako commented Dec 13, 2017

Thanks @bjarthur, this works for me. Unfortunately addprocs_qrsh cannot take res_list like the others, but that should be easy enough to change. If I have time in the next few days I will try to sort it out.

I should add that since addprocs_qrsh creates an ssh tunnel to the worker, it is wise to disable .bash_profile on the worker. Otherwise commands defined inside it can mess up the communication between master and worker.

EDIT: Added pull request #82 to include res_list support in addprocs_qrsh

@cako
Copy link
Contributor Author

cako commented Dec 13, 2017

Actually, upon trying this with more workers, it still does not submit the jobs asynchronously. Each processor must be added before the next job can be queued in the gist I linked. You can test it by running the script with, say 10 jobs, and executing watch qstat in another window. You will notice that new jobs are only submitted after the previous started running. The only way in which I've managed to remove that restriction is by substituting

id = addprocs_qrsh(1)

by

qrsh = QRSHManager(1, "");
Distributed.cluster_mgmt_from_master_check()
id = Distributed.addprocs_locked(qrsh; qsub_env="")

In this case, all jobs will be queued at the same time. For some reason SGEManager is a bit more stable for me when running many jobs, so for now I am sticking to that.

@juliohm
Copy link
Collaborator

juliohm commented Oct 6, 2020

@cako it is a nice feature to have a non-blocking addprocs_*. We are trying to get this package back in shape. Please submit a PR if it is not yet available. I am closing this issue as too old. Feel free to add another one or the PR directly.

@juliohm juliohm closed this as completed Oct 6, 2020
@oameye
Copy link

oameye commented Feb 1, 2023

Has a non-blocking addprocs ever been added? Is there a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants