Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the definition of process resources using a custom category annotation #623

Closed
lucacozzuto opened this issue Feb 26, 2018 · 28 comments
Milestone

Comments

@lucacozzuto
Copy link

It would be nice to have a way to classify processes according to their resources allocation for reducing the size of config file.
For example:
processes 1, 2,3,4,5 can be grouped as huge_demanding_ processes and processes 6,7,8,9,10 as little_processes. In this way you need to specify only two configurations within the config.file

@maxulysse
Copy link
Contributor

Very good idea!

@lucacozzuto
Copy link
Author

This could also be extended to containers... imagine that I have a container for processes 1,2,3 and another for 4,5,6 etc...

@pditommaso
Copy link
Member

pditommaso commented Feb 26, 2018

Yes, the point is to decouple the resource annotations from the process to make it more flexible and less verbose. What about this:

  • add the ability to annotate a process with one or more labels directive;

  • add a new label selector in the config file, eg:

      process {
          withLabel: big {
              cpus = 8 
              memory = 10.GB
              queue = 'long-queue'
          }
          withLabel: small {
              cpus = 1 
              memory = 2.GB    
              queue = 'short-queue'
          }
          withName: YourProcessName {
              cpus = 4 
              container = 'image/name'
          }
      }
    
  • the withLabel category allows the assigned of the grouped resources to all process
    declaring the specified label;

  • the withName category allows the assignment of the grouped resources/directives to the process having the named specified;

  • the withName category overrides the withLabel assignment;

  • each process defines an implicit label as the process name;

  • deprecated the current process.$name.xxx syntax.

This should address would also address the issue #621.

@pditommaso pditommaso changed the title Allow categories for processes Allow the definition of process resources using a custom category annotation Feb 26, 2018
@pditommaso
Copy link
Member

A first implementation of new process configuration via label annotations is ready to tested.

This version allow a process to define one or more labels eg:

process foo {
  label 'gpu'

  """
  your_command
  """
}

Then in the configuration you can use the the withLabel or withName categories eg:

process {
  withLabel: gpu {
     queue = 'the-gpu-queue'
     memory = 10.GB
  }
}

See also the previous comment for other details. You can give a try using the version 0.29.0-SNAPSHOT, eg:

NXF_VER=0.29.0-SNAPSHOT nextflow run .. etc

@ewels @maxulysse @lucacozzuto @msmootsgi You feedback is welcome.

@maxulysse
Copy link
Contributor

Great, I'll look into it ASAP

@apeltzer
Copy link
Contributor

Same here :-)

@lucacozzuto
Copy link
Author

lucacozzuto commented Mar 19, 2018

Nice! A simple question: why not to use the same tag: (i.e. label) for both process definitions (in both main and configuration file)?

process foo {
  label 'gpu'

  """
  your_command
  """
}


process {
  label: gpu {
     queue = 'the-gpu-queue'
     memory = 10.GB
  }
}

This will be easier to remember.

@pditommaso
Copy link
Member

pditommaso commented Mar 19, 2018

That's how it is supposed to work.

@pditommaso
Copy link
Member

Ahh, no sorry, I've misread.

Actually I've tried also that, but at the I've found more readable withLabel and withName.

@lucacozzuto
Copy link
Author

Well, let's ask also to the other guys :)

@maxulysse
Copy link
Contributor

Don't really mind.
It should probably be easier with both name being the same tag.
But as one is defining the label and the other one using it, it really make sense not to use the same tag.
I'm for withLabel

@pditommaso
Copy link
Member

pditommaso commented Mar 19, 2018

Things to do to complete this issue:

  • apply declared tag to executed K8s pod/jobs
  • include the process labels in the workflow execution report and trace file
  • should withLabel: setting overwrite specific directives defined in the main process script?
  • when printing configuration in flat format, category should be wrapped by quotes eg.
    process.'withLabel:foo'.xxx = 1
    process.'withName:bar'.yyy = 2
    
  • add integration tests

@ewels
Copy link
Member

ewels commented Mar 19, 2018

Great initiative!

deprecated the current process.$name.xxx syntax.

Are we talking about breaking backwards compatibility here? For the record, I also really like the current setup - I think it's very clear. The above would be a great addition, but personally I'd like to continue using the current syntax as well if possible.

@pditommaso
Copy link
Member

No, no breaking existing pipeline, but give a window to allow users to migrate to the new syntax eg

process {
  withName: foo { xxx = 1 }
}

The problem with process.$name.xxx syntax is that it's incredible confusing for newcomers, because people tend to confuse the $ to escape the process name with $ to interpolate config variables.

@ewels
Copy link
Member

ewels commented Mar 19, 2018

Ok, gotcha 👍 Sounds good!

@pditommaso
Copy link
Member

Nice!

@msmootsgi
Copy link
Contributor

Question about the new syntax. Say I've got a label defined like this in nextflow.config:

process {
  withName: normal {
     cpus 1
     queue 'normal'
     ...
  }
}

And then I use the label in a process. Will I be able to override values set in the label in the process itself? For example, override the number of CPUs requested like this:

process somethingHard {
    label 'normal'
    cpus 4
    input:
    ...
}

@pditommaso
Copy link
Member

pditommaso commented Mar 19, 2018

Currently it overrides the cpus setting defined in the process (main script). Do you think it should not?

@msmootsgi
Copy link
Contributor

This could just be how I write pipelines, but I put default values (and labels in the future) in nextflow.config and then process specific setting in the process in main.nf. That makes me think label values should be overridden by the process.

I find these override hierarchies kind of tricky to get right, so I think it's worth some thought. And I make no claims here - I haven't given this a ton of thought!

@pditommaso
Copy link
Member

pditommaso commented Mar 19, 2018

I tend to agree, that it could be useful it can be quite tricky to remember all these rules. Maybe just keep it simple, that setting defined in the config just overwrites the ones in the process is the a better choice. The withName category would be give a chance to specify settings in a punctual manner.

Anyhow I will keep that as an open topic.

@msmootsgi
Copy link
Contributor

As long as the rules are clearly stated, I'm OK with this.

@emi80
Copy link
Contributor

emi80 commented Mar 20, 2018

I am not sure whether we might also think about more specific settings overwriting general ones. I mean that configuration values defined at process level (e.g. withName) would overwrite settings defined in labels, irrespective of where they have been set (main pipeline script vs nextflow.config). Maybe it is a more linear approach and not so hard to remember.

Also, what about processes with multiple labels that touch common options? I think it would be better to avoid it, but in case it won't be possible what would be the order of precedence?

@pditommaso
Copy link
Member

I am not sure whether we might also think about more specific settings overwriting general ones.

If I have understood well, that's already possible. Take in consideration the following config:

process {
  cpus = 2 
  withLabel: big {
    cpus = 8 
  }
  withName: foo {
    cpus = 4 
  }
}

The above snippet defines:

  • 2 as the default number of cpus for all process (unless something else is declared in the main script).
  • 8 cpus for all processes annotated with the big label (overriding any setting in the main script)
  • 4 cpus for the process named foo

Also, what about processes with multiple labels that touch common options?

Currently the last won.

@pditommaso
Copy link
Member

pditommaso commented Apr 3, 2018

OK, another round adding the support for regulation expression and not operator! This allows to define label/process selectors with regex patterns, eg:

process {
  withLabel: 'foo.*' { cpus = 8 }
  withName: '!bar' { memory = 32.GB }
}

Please give it a try using the following command

CAPSULE_RESET=1 NXF_VER=0.29.0-SNAPSHOT nextflow info # ONLY THE FIRST TIME
NXF_VER=0.29.0-SNAPSHOT nextflow run ...etc

You can find the documentation at this links:
https://github.com/nextflow-io/nextflow/blob/master/docs/process.rst#label
https://github.com/nextflow-io/nextflow/blob/master/docs/config.rst#process-selectors

pditommaso added a commit that referenced this issue Apr 8, 2018
pditommaso added a commit that referenced this issue Apr 8, 2018
pditommaso added a commit that referenced this issue Apr 8, 2018
@pditommaso
Copy link
Member

Available in version 0.29.0-RC1. Please give a try to existing configuration files.

Note: release 0.29.x will support both .$ and the new notation to allow the migration to the new system.
Release 0.30.x will deprecated the .$ showing a warning message, and it will be removed in a future release.

@IdoBar
Copy link

IdoBar commented Mar 21, 2021

@pditommaso is it possible to define a new label and after that assign a job to the new label?
I tried that with the following, but it didn't take affect.

process {
        withLabel: 'mc_medium_long' {
          cpus = { check_max( 4 * task.attempt, 'cpus' ) }
          memory = { check_max( 8.GB * task.attempt, 'memory' ) }
          time = { check_max( 16.h * task.attempt, 'time' ) }
        }
	withName: 'genotyping*|bwamem|bowtie2|makeBT2Index' {
	  label = 'mc_medium_long'
	}

Thanks, Ido

@pditommaso
Copy link
Member

Think it's related to #1786

@IdoBar
Copy link

IdoBar commented Mar 22, 2021

@pditommaso but I'm not defining the label in a variable, that's plain text.
I'm not getting any errors, it's just that the processes are not getting assigned the new label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants