Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snakemake 8 not usable with Slurm clusters offering GPUs #2701

Closed
w8jcik opened this issue Feb 15, 2024 · 10 comments
Closed

Snakemake 8 not usable with Slurm clusters offering GPUs #2701

w8jcik opened this issue Feb 15, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@w8jcik
Copy link

w8jcik commented Feb 15, 2024

In order to use GPUs one needs to be able to ask Slurm for a GPU.

The usual way to do this is --gres gpu:1 or --gpus 1 switches, which both go to slurm_extra in the Snakemake profile.

slurm_extra does not work in Snakemake 8, it is rejected.

You might be aware of the issue, but maybe you think it is already fixed.

I am creating this issue as a reminder.

Prior discussion snakemake/snakemake-executor-plugin-slurm#18

Some workflows really need GPUs, otherwise they do not finish in a feasible time.

@w8jcik w8jcik added the bug Something isn't working label Feb 15, 2024
@cmeesters
Copy link
Contributor

As far as I can see, a profile with

slurm_extra: "'--gres gpu:1'" 

gets accepted on submission, but fails in a job context, right?

@cmeesters
Copy link
Contributor

I will prepare a simple PR.

cmeesters added a commit that referenced this issue Feb 16, 2024
The issue is caused by the attempt to parse CLI strings like

'--set-resources "<rule>:slurm_extra='--flag=arg'"'

The solution is simply to ignore all values starting with
a hyphen.

Will also fix issues #18 and #19 of the SLURM executor plugin.
@w8jcik
Copy link
Author

w8jcik commented Feb 16, 2024

As far as I can see, a profile with

slurm_extra: "'--gres gpu:1'" 

gets accepted on submission, but fails in a job context, right?

That is correct.

cmeesters added a commit that referenced this issue Feb 16, 2024
As workflow users may put a space before a hyphen, the previous
fix might fail! As hyphens may not occur in any Pyhton variable
checking for hyphens is more stable.
cmeesters added a commit that referenced this issue Feb 16, 2024
As users may enter a space before the hyphen, we just lstrip the
value.
@w8jcik
Copy link
Author

w8jcik commented Feb 21, 2024

Should this PR #2711 fix the issue?

various bug fixes for resource parsing

I have tried the change by updating Snakemake to 8.4.12

  1. With following profile:
default-resources:
  ...
  slurm_extra: "'--gres=gpu:1'"

it fails when it gets a slot in the Slurm queue

Error:
  WorkflowError:
    Failed to evaluate resources value '--gres=gpu:1'.
        String arguments may need additional quoting. E.g.: --default-resources "tmpdir='/home/user/tmp'" or --set-resources "compute1:slurm_extra='--nice=100'".
    SyntaxError: invalid syntax (<string>, line 1)
  1. With following profile
default-resources:
  ...
  slurm_extra: "--gres=gpu:1"

it fails right away

Error:
  WorkflowError:
    Failed to evaluate resources value '--gres=gpu:1'.
        String arguments may need additional quoting. E.g.: --default-resources "tmpdir='/home/user/tmp'" or --set-resources "compute1:slurm_extra='--nice=100'".

@cmeesters
Copy link
Contributor

cmeesters commented Feb 21, 2024

Please try: slurm_extra: "'--gres=gpu:1'". Why? Because slurm_extra: "'--gres=gpu:1 --nice=100'" ought to work, too.

However, it seems to me that this is only a partial fix. Human-readable runtime specs in a workflow profile still raise an error.

@w8jcik
Copy link
Author

w8jcik commented Feb 21, 2024

Please try: slurm_extra: "'--gres=gpu:1'".

I am not sure if I understand. This is exactly what I used and described in the previous message. Am I missing some detail?

@cmeesters
Copy link
Contributor

In these threads there might be someone overlooking something and this someone might be me. Such things just happen. I only looked at point two, when writing. Sorry.

I am, however, confused altogether regarding the parsing state right now. Therefore, I will submit a few additional test cases.

@aryarm
Copy link
Member

aryarm commented Mar 12, 2024

does this issue still exist? I'm still running into it with a profile like this:

default-resources:
    slurm_partition: condo
    slurm_extra: "'--qos=condo'"

downgrading to snakemake 7 seems to fix the issue

@cmeesters
Copy link
Contributor

no, the issue is fixed. I do not run into any issues in this regard. If you run into any, please report in the slurm-executor repo - be sure to be running snakemake +v8.6

@sahuno
Copy link

sahuno commented May 18, 2024

this should work!
in your config.yaml file, specify gpu for individual rules slurm_extra: "'--gpus=2'"

set-resources:
    rule_name:
        #....
        slurm_extra: "'--gpus=2'"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants