Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced DAG partitioning #232

Merged
merged 5 commits into from Sep 16, 2020
Merged

Advanced DAG partitioning #232

merged 5 commits into from Sep 16, 2020

Conversation

johanneskoester
Copy link
Contributor

@johanneskoester johanneskoester commented Feb 14, 2020

  • Allow to define groups via the command line (--group rulename1=groupname rulename2=groupname)
  • Allow groups to span multiple connected components (only 1 by default) (--group-components groupname=3 groupname2=5)

This solves all remaining use cases for DAG partitioning.

cc @cmeesters

@github-actions
Copy link
Contributor

Please format your code with black: black snakemake tests/*.py.

@sonarcloud
Copy link

sonarcloud bot commented Feb 14, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@mdshw5
Copy link
Contributor

mdshw5 commented Apr 13, 2020

Hey @johanneskoester I really like this idea. I have some pretty large DAGs that I currently manage by using --batch. One thing I've noticed is that the snakemake process has trouble tracking the outputs from a large (>10k) number of targets, and so "finishing" jobs becomes rate limiting. Would this PR address that? Otherwise I was wondering about adding an "automatic batching" flag, where the user can decide how many times to split the DAG, then have snakemake automatically run each batch sequentially.

@johanneskoester johanneskoester marked this pull request as ready for review September 16, 2020 14:13
@sonarcloud
Copy link

sonarcloud bot commented Sep 16, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@johanneskoester
Copy link
Contributor Author

Hey @johanneskoester I really like this idea. I have some pretty large DAGs that I currently manage by using --batch. One thing I've noticed is that the snakemake process has trouble tracking the outputs from a large (>10k) number of targets, and so "finishing" jobs becomes rate limiting. Would this PR address that? Otherwise I was wondering about adding an "automatic batching" flag, where the user can decide how many times to split the DAG, then have snakemake automatically run each batch sequentially.

Hi @mdshw5. This is rather meant for submitting jobs together to one physical node (which already works vertically, while this PR enables to do this for not connected parts of the DAG). However, an automatic batching option is surely a good idea. In principle, you can get that now already by having a for loop over snakemake with --batch. Nevertheless, it doesn't hurt to add this internally.

@johanneskoester johanneskoester merged commit aff0b57 into master Sep 16, 2020
@johanneskoester johanneskoester deleted the group-components branch September 16, 2020 15:34
@lpla
Copy link
Contributor

lpla commented Sep 25, 2020

Hi. Would this feature allow to submit jobs as a "job array" instead of a job group in a single node? #301

Job arrays would be able to run many grouped small jobs in a larger scale (e.g. all available nodes, instead of only one) with only one submit to the scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants