Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use "immediate-submit" and "job dependencies"? #10

Open
kimin0402 opened this issue Apr 5, 2020 · 7 comments
Open

Comments

@kimin0402
Copy link

Hi,

Thank you for this wonderful profile. I was wondering whether it is possible to set "immediate-submit: true" in config.yaml file and still use job dependencies setting for LSF profile.

This kind of setting worked well in PBS profile but seems to trouble LSF profile as lsf-submit.py file cannot deal with -w options of LSF.

How can I use this profile so that my snakemake script submits multiple job command with specified dependency options?

@mbhall88
Copy link
Member

mbhall88 commented Apr 6, 2020

Hi @kimin0402 ,

If I understand your enquiry correctly, you are wondering whether we could support job dependencies i.e. only submit a job if a given dependency expression is true (described in the documentation for -w here)?

Snakemake is effectively handling job dependencies already for you, isn't it? I am just wondering what an example use-case for this is?

@kimin0402
Copy link
Author

Hi @mbhall88,

Yes, I was wondering whether LSF profile could support job dependencies by utilizing -w option of LSF. Snakemake is handling the job dependencies, but not without "--immediate-submit" option set true in the config file (see this link for --immediate-submit option: https://snakemake.readthedocs.io/en/stable/executing/cli.html).

With this option, you can execute a snakemake script and all bash scripts created from it will be submitted to the cluster at once, and among all bash scripts, those requiring dependencies will be bsub with '-w {dependencies}' with the submit.py wrapper. Right now, when I execute a snakemake script with this LSF profile, the shell in which I executed is constantly running snakemake, waiting until the dependent job is finished. If immediate-submit is set true, all scripts will be submitted to the cluster and I can do other jobs after snakemake completes submitting all bash scripts. Those scripts requiring dependencies will be submitted with -w argument and shown as "PEND" (or "H" in PBS cluster) in a que list.

@mbhall88
Copy link
Member

mbhall88 commented Apr 6, 2020

I see. And how would you describe the dependencies?

Regarding your use case where your shell is constantly running snakemake I would strongly recommend submitting the "master" snakemake process as a job rather than letting it run on the login node. See an example script I use for exactly this with all of my snakemake pipelines.

@kimin0402
Copy link
Author

Thank you for your example script. In the case where 'immediate submit' is not activated, this seems to be the best way to run snakemake with dependencies. However if you run this master script, and your master script has rules (A, B, C) where dependencies are set to execute A -> B -> C, my understanding is that snakemake does not submit job script for B unless job script for A is finished.

In this case, if other people submitted job scripts while job A was running, there will be lots of scripts queing in between job A and B. What I want to do is submit job script A and B together, and specify dependencies for job B with -w argument of lsf. So my jobs would appear like this:
RUN job.A.sh
PEND job.B.sh

and then
DONE job.A.sh
RUN job.B.sh
RUN other_script_1.sh
RUN other_script_2.sh
RUN other_script_3.sh

whereas in the case with a master script submitted, job queue list would appear like this first:
RUN master_script.sh
RUN job.A.sh

and then
RUN master_script.sh
DONE job.A.sh
RUN other_script_1.sh
RUN other_script_2.sh
RUN other_script_3.sh
RUN job.B.sh

I hope my explanation clarified the question a little bit. This kind of action is possible with pbs-torque profile, so I was wondering whether lsf profile could do this too.

@mbhall88
Copy link
Member

mbhall88 commented Apr 7, 2020

Yes, I understand your use-case. Would something like I am proposing in #7 and #13 meet your needs? I think it should but I am just not sure about how the dependecy specification would work in that case.

@kimin0402
Copy link
Author

kimin0402 commented Apr 10, 2020

Hey @mbhall88 , sorry for a late reply. Posts #7 and #13 indeed are good ideas. Especially post #7 helped me figure out how to assign different -q options to different rules. Thanks a lot.

By the way, I think I figured out this dependency issue, mainly by adopting scripts from pbs-torque profile. This is what I changed:

1) Open ~/.config/snakemake/lsf/config.yaml,
1-1) change cluster specification:
cluster: "lsf-submit.py" to => cluster: "lsf-submit.py --depend \"{dependencies}\""

1-2) add:

immediate-submit: true
notemp: true

2) Edit ~/.config/snakemake/lsf/lsf-submit.py,
2-1) Add following lines:

parser=argparse.ArgumentParser(add_help=False)
parser.add_argument("--depend", help="Space separated list of ids for jobs this job should depend on.")
parser.add_argument("positional",action="append",nargs="?")
args = parser.parse_args()

depend=""

if args.depend:
    depend = args.depend
    depend = depend.replace(" ", " && ")
if depend:
    depend = f" -w '{depend}' "

2-2) change the variable "submit_cmd" so that it includes a string variable "depend" e.g.)

submit_cmd = "bsub {resources} {job_info} {queue} {dep} {jobscript}".format(
    resources=resources_cmd,
    job_info=jobinfo_cmd,
    queue=queue_cmd,
    dep=depend,
    jobscript=jobscript,
)

I removed cluster_cmd variable because it kept reading arguments specified by --depend option of argparse as system arguments.

This seems to submit all job scripts at once with dependency specified. There might be some problems in the future but it seems to work fine for me so far now.

@mbhall88
Copy link
Member

@kimin0402 this is super cool!! Really neat solution.

Would you be interested in creating a PR on the development branch to add this support? If you start a PR I can help with tweaking this a little to work in combination with the lsf.yaml and add some testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants