Option to limit process requirements instead of exiting with error #640

ewels · 2018-03-21T22:38:30Z

If Nextflow encounters a process that requests resources that are not available on the current system, it fails. For example, if 10 processors are requested and the system only has 8, we exit with an error. This is A Good Thing 👍

However, when making config files for a pipeline that aim to have generic works-for-most-situations requests, it can be nice to be able to throttle these for small genomes / test datasets / small machines for example.

In most of our pipelines, we now have this little helper function in nextflow.config:

// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
  if(type == 'memory'){
    try {
      if(obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
        return params.max_memory as nextflow.util.MemoryUnit
      else
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
      return obj
    }
  } else if(type == 'time'){
    try {
      if(obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
        return params.max_time as nextflow.util.Duration
      else
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
      return obj
    }
  } else if(type == 'cpus'){
    try {
      return Math.min( obj, params.max_cpus as int )
    } catch (all) {
      println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
      return obj
    }
  }
}

Then, in each config we wrap the request in the above function, as so:

cpus = { check_max( 1 * task.attempt, 'cpus') }
memory = { check_max( 8.GB * task.attempt, 'memory') }
time = { check_max( 2.h * task.attempt, 'time') }

This way, the defaults work most of the time. But if you pass --max_memory "8.GB" then all processes will use a maximum of 8.GB and the pipeline will run on your tiny machine. This is really helpful for test datasets and allows us to have a default "base" config file that is throttled by other system-specific configs.

Is this something that could be built into Nextflow as a new feature? Keep the current default behaviour, but have new command line arguments and config scopes to be able to throttle the maximum resources without failing..

Phil

The text was updated successfully, but these errors were encountered:

ewels · 2018-03-21T22:47:04Z

As well as / instead of these manually specified limits, there could perhaps be something similar to errorStrategy:

process {
    notEnoughResources: 'terminate' // default behaviour
    notEnoughResources: 'limit'     // throttle to available resources
    notEnoughResources: {           // limit to specified values
        cpus: 4
        memory: 8.GB
        time: 16.h
    }
}

pditommaso · 2018-03-27T15:39:23Z

So, as first thing I would suggest to use this simplified version of check_max function that infers that infers the argument type and takes the max value as second argument.

def check_max(obj, max) {
    if( obj instanceof nextflow.util.MemoryUnit ) {
        try {
            def other = max as nextflow.util.MemoryUnit
            return obj.compareTo(other) == 1 ? other : obj
        }   
        catch( all ) {
            println "   ### ERROR ###   Max memory '${max}' is not valid! Using default value: $obj"
            return obj
        }   
    }
    if( obj instanceof nextflow.util.Duration ) {
        try {
            def other = max as nextflow.util.Duration
            return obj.compareTo(other) == 1 ? other : obj
        }   
        catch( all ) {
            println "   ### ERROR ###   Max time '${max}' is not valid! Using default value: $obj"
            return obj
        }         
    }
    if( obj instanceof Integer ) {
        try {
            return Math.min( obj, max as int )
        }   
        catch( all ) {
            println "   ### ERROR ###   Max cpus '${max}' is not valid! Using default value: $obj"
           return obj
        }         
    }    
}

Still not sure if the best thing is to add a build-in function like the above or implement a move advanced feature as you are suggesting in the second comment.

ewels · 2018-03-27T16:55:17Z

Ace! Thanks for the improved function, I will use that in the short term at least 👍

Still think that the core feature would be better though 😉

pditommaso · 2018-03-27T18:52:02Z

also the try catch should be useless, therefore it could be simplified as:

def check_max(obj, max) {
    if( obj instanceof nextflow.util.MemoryUnit ) {
          def other = max as nextflow.util.MemoryUnit
          return obj.compareTo(other) == 1 ? other : obj
    }
    if( obj instanceof nextflow.util.Duration ) {
          def other = max as nextflow.util.Duration
          return obj.compareTo(other) == 1 ? other : obj     
    }
    if( obj instanceof Integer ) {
          return Math.min( obj, max as int )  
    }    
    
    println "   ### ERROR ###  invalid check_max value=$obj"
    return obj
}

Still think that the core feature would be better though

I agree !

ewels · 2018-03-27T19:42:17Z

Aha, because we're now testing the type - nice! But if supplied on the command line, will these not always be strings? I think that's why I was trying to coerce the type (and catching the exception if it failed)..

pditommaso · 2018-03-27T19:44:17Z

I was a bit too optimistic, yes the try catch is need to handler and invalid max value (I was thinking to obj instead).

ewels · 2018-03-27T19:48:39Z

Could probably just have one try/ catch statement wrapping the whole function though at least...

pditommaso · 2018-03-27T19:51:05Z

Yep, but want be able anymore to have specific error message then

ewels · 2018-03-27T19:52:47Z

You could check the type of obj to customise it.. ;) But may end up not saving many lines of code..

pditommaso · 2018-03-27T19:53:35Z

lol

ewels · 2018-03-28T18:35:50Z

I can't resist a challenge 😆How about this?

def check_max(obj, max) {
    def obj_types = [
        'memory': nextflow.util.MemoryUnit,
        'time': nextflow.util.Duration,
        'cpus': Integer
    ]
    obj_types.each { key, obj_type ->
        try {
            if( obj instanceof obj_type )
                return obj.compareTo(max as obj_type) == 1 ? max as obj_type : obj
        catch( all ) {
            println "   ### ERROR ###   Max $key '${max}' is not valid! Using default value: $obj"
            return obj
        }
    }
    println "   ### ERROR ###   Object type not recognised! Using default value: $obj"
    return obj
}

pditommaso · 2018-03-29T13:15:11Z

if it works, it's good :)

ewels · 2018-03-29T15:34:36Z

Haven't tested it yet.. ;)

Copying my main request back down here as we got a bit carried away by function refactoring 😁

process {
    notEnoughResources: 'terminate' // default behaviour
    notEnoughResources: 'limit'     // throttle to available resources
    notEnoughResources: {           // limit to specified values
        cpus: 4
        memory: 8.GB
        time: 16.h
    }
}

pditommaso · 2018-04-09T08:45:15Z

I was thinking at this, however I'm not understanding how resource are modified? Do you still need to provide a dynamic (ie closure) rule to increase the request when the task re-submitted or you were thinking to something different ?

ewels · 2018-04-09T09:55:22Z

Not sure I totally understand the question. But I was thinking that the syntax could remain the same as it is currently - static values would be limited (modified) at runtime by nextflow if needed to make them smaller, dynamic rules could be used with task.attempt etc and then limited if needed.

pditommaso · 2018-04-09T12:24:09Z

I mean, basically your proposal is to provide a mechanism that allows the setting of a max limits for some resources.

Whenever the resource request overcome this limit (either statically or dynamically defined), it fallback to this limit.

Is that right?

ewels · 2018-04-09T13:15:16Z

Yup!

ewels · 2018-10-02T08:39:15Z

Proposed syntax after discussion with default value and also a range (minimum and maximum values).

process {
    withName:fastqc {
        memory: { 1.GB * input.size(), range: 8.GB..24.GB },
        time: { 2.4.h * input.size(), range: 16.h..48.h },
        cpus: { 2  * input.size(), range: 4..16 }
    }
}

The input.size() dynamic computation here is just an example of when we could have a variable default. Here the maximum of the range would prevent the requested resources from going above a sensible default.

Another scenario is when people want to limit the requested resources:

process {
    memory: 4.GB
    withName:fastqc {
        memory: 16.GB, range: 8.GB..24.GB
    }
    withName:star {
        memory: 4.GB, range: 2.GB..6.GB
    }
}

Then have nextflow run --max_memory 8.GB which would work - using 8.GB memory for fastqc and 4.GB memory for star.

However, doing nextflow run --max_memory 2.GB would exit with an error because this is outside the range of fastqc.

Phil

stale · 2020-04-27T05:50:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ewels · 2020-04-27T08:53:06Z

This is still probably my number-1 feature request issue 😄 The @nf-core check_max function is a constant thorn in the side, but it's too useful to remove. Would be great if we could move this functionality into Nextflow 👍

pditommaso · 2020-04-27T09:36:06Z

I agree we need definitively better here, soon I need to tackle this problem. Pinning this issue.

ewels · 2022-05-27T06:00:37Z

Are the additional attributes (sorry I forget the name in Groovy) inherited separately to the main value? For example, would my process 'foo' here also have a max of 10.GB?

process.memory = 3.GB, max: 10.GB
process.withName['foo'].memory = 20.GB

bentsherman · 2022-05-27T13:20:02Z

I don't plan to implement that kind of behavior at the moment. Are there any other directives that can inherit individual attributes like that? For example I don't think publishDir does that.

ewels · 2022-05-27T16:25:53Z

Not that I know of. Would it be acceptable to use a distinct namespace instead to be able to achieve this? eg. Something along the lines of my original suggestion way up there 👆🏻

#640 (comment)

process {
    notEnoughResources: 'terminate' // default behaviour
    notEnoughResources: 'limit'     // throttle to available resources
    notEnoughResources: {           // limit to specified values
        cpus: 4
        memory: 8.GB
        time: 16.h
    }
}

Then it can be defined once to work across an entire workflow (desired behaviour 99% of the time I suspect) but also at process level if more granular control is needed.

bentsherman · 2022-05-27T18:39:09Z

I like this idea. No need to refactor the resource directives, Nextflow just needs to apply these limits before submitting a task. I would make it even simpler:

process {
    resourceLimits = null   // default behaviour
    resourceLimits = [      // limit to specified values
        cpus: 4,
        memory: 8.GB,
        time: 16.h
    ]
}

ewels · 2022-05-27T19:18:16Z

Yeah I think the 'limit' / // throttle to available resources thing was me being hopeful about Nextflow auto-detecting available cpus, memory etc I think. But that's probably super problematic as many people will be running Nextflow on a different nodes. So yeah, let's forget about that (could probably do it with closures anyway..?)

So I'm fully behind your suggested syntax 👍🏻 (was {} instead of [] intentional?)

pditommaso · 2022-05-31T08:19:46Z

This is interesting, but I don't fully understand the semantics of this. Is it meant to throttle the resources on retry?

bentsherman · 2022-05-31T14:57:42Z

Yes. If you specify the resource limits, Nextflow will throttle any resource requests that exceed the limits. If you don't specify a limit, then such a request would fail.

However it just occurred to me that on some systems, an unschedulable task will just sit in the queue forever, so we might want a way to specify whether Nextflow should throttle or fail, like the example Phil gave.

aydemiro · 2023-10-26T18:24:59Z

Hi,

Would below code in a config file work currently as an alternative to defining a check_max function?

params.max_memory = 32.GB
memory = { min( 6.GB * task.attempt, params.max_memory ) }

Best,
Ozkan

bentsherman · 2023-10-27T18:16:24Z

That should work. I believe check_max() just adds a bunch of extra logic for type checking and error messages, but your example implements the core logic.

pditommaso · 2024-03-05T08:17:21Z

@bentsherman If I'm understanding correctly #2911 does not implement any throttling, and it's a mere config attribute that should be managed by the user via a dynamic rule (closure). Is that correct?

bentsherman · 2024-03-05T08:28:47Z

It does throttle the task resources, see TaskConfig

pditommaso · 2024-03-05T08:34:52Z

Maybe we are using throttling with different meaning. Stating the docs

Resource limits are a useful way to specify environment-specific limits alongside tasks with dynamic resources. Normally, if a task requests more resources than can be provisioned (e.g. a task requests 32 cores but the largest node in the cluster has 24), the task will either fail or cause the pipeline to hang forever as it will never be scheduled. If the resourceLimits directive is defined with these limits, the task resources will be automatically reduced to comply with these limits before the job is submitted.

I understand this as constraint mechanism. Instead I was expecting throttling as mechanism so increase automatically the resource depending the retry

bentsherman · 2024-03-05T08:48:08Z

It is a constraint mechanism. You could also see it as throttling, though a different form than what you are talking about

pditommaso · 2024-03-11T08:25:19Z

What is preventing to add a third parameter to the check_max function to specify the upper bound? this can be done now and does not require any support at nextflow level.

Nextflow should aim instead to provide a built-in replacement for the check_max thing.

ewels · 2024-03-11T14:48:16Z

Nextflow should aim instead to provide a built-in replacement for the check_max thing.

That's what this issue and PR #2911 is about.. I'd like to remove that config function entirely if possible.

pditommaso · 2024-03-11T15:58:05Z

Not really, #2911 adds a directive to allow the specify the upper bound, check_max it's meant to stay with that

bentsherman · 2024-03-11T17:18:38Z

Paolo I think you are misunderstanding. #2911 adds a directive to replace check_max so that nf-core pipelines are not defining functions in the config file. The upper bound is provided by params.max_cpus, params.max_memory, etc

pditommaso · 2024-03-11T17:28:29Z

Can you show me an example a process is expected to be written using #2911 to automatically increase the memory on a failure

bentsherman · 2024-03-11T17:33:18Z

Just the normal dynamic resource pattern:

process {
    memory = { 8.GB * task.attempt }
    resourceLimits = [
        memory: 64.GB
    ]
}

pditommaso · 2024-03-11T17:35:17Z

How the increment is determined? I don't see anything obvious in the PR

bentsherman · 2024-03-11T17:45:09Z

The increment is independent of the resource limit. The example I just gave means "increase the task memory by 8 GB on each attempt until the system-wide limit of 64 GB is reached"

ewels · 2024-03-12T11:02:30Z

Edit: Had an old browser tab and didn't see the more recent replies sorry. Wrote the below in response to this comment.

Gotcha, sorry it took @bentsherman explaining your comment to me.

I'd prefer a global level setting rather than continued use of the check_max function because it's less verbose. Currently the function has to be specified in every single specification (see here for an example 🤢 ). We can clean up that syntax with the upper bound but we'd still need to call the function every time.

I don't think that there's ever a time when you don't want these limits applied, as they're there to prevent Nextflow from crashing due to unsatisfiable requests.

There's also no time when you want the limits to vary on a process-level. They are there to meet computational environment limits so they don't vary by process. If you want to change that, you do it with the regular process directives.

So because of this, to me it makes more sense to have a workflow-level setting vs. a function that can be used within multiple process-level directives.

ewels · 2024-03-12T11:05:35Z

Basically instead of this:

process {
    withName:'MYPROCESS_1' {
        memory = { check_max( 72.GB * task.attempt, 'memory'  ) }
        time   = { check_max( 24.h  * task.attempt, 'time'    ) }
        cpus   = { check_max( 4 * task.attempt, 'cpus'  ) }
    }
    withName:'MYPROCESS_2' {
        memory = { check_max( 72.GB, 'memory'  ) }
        time   = { check_max( 24.h, 'time'    ) }
        cpus   = { check_max( 4, 'cpus'  ) }
    }
    withName:'MYPROCESS_3' {
        memory = { check_max( 72.GB * task.attempt, 'memory'  ) }
        time   = { check_max( 24.h  * task.attempt, 'time'    ) }
        cpus   = { check_max( 4 * task.attempt, 'cpus'  ) }
    }
}

I'd much rather have this:

process {
    resourceLimits = [ memory: 64.GB, time: 24.h, cpus: 12 ]
    withName:'MYPROCESS_1' {
        memory = { 72.GB * task.attempt }
        time   = { 24.h  * task.attempt }
        cpus   = { 4 * task.attempt }
    }
    withName:'MYPROCESS_2' {
        memory = 72.GB
        time   = 24.h
        cpus   = 4
    }
    withName:'MYPROCESS_3' {
        memory = { 72.GB * task.attempt }
        time   = { 24.h  * task.attempt }
        cpus   = { 4 * task.attempt }
    }
}

Especially as the resourceLimits statement will nearly always be in a separate config file.

ewels · 2024-04-18T11:36:26Z

pditommaso added the feature request label Apr 9, 2018

pditommaso mentioned this issue Oct 1, 2018

kuberun does not export function defined in project nextflow.config #870

Closed

ewels mentioned this issue Oct 12, 2018

Proper way of running with own profile nf-core/rnaseq#106

Closed

pditommaso added the priority label Jan 7, 2019

stale bot added the wontfix label Apr 27, 2020

pditommaso added pinned stale and removed wontfix labels Apr 27, 2020

stale bot closed this as completed Jun 27, 2020

bentsherman mentioned this issue May 27, 2022

Add resourceLimits directive #2911

Merged

bentsherman self-assigned this Jul 12, 2022

pditommaso removed the priority label Sep 22, 2023

bentsherman removed the feature label Oct 31, 2023

erflynn mentioned this issue Dec 11, 2023

Misc Run Setup: Add freecycle, adjust default errorStrategy, and fix resume/cleanup UCSF-DSCOLAB/data_processing_pipelines#60

Open

pditommaso closed this as completed in #2911 Apr 18, 2024

Option to limit process requirements instead of exiting with error #640

Option to limit process requirements instead of exiting with error #640

Comments

ewels commented Mar 21, 2018

ewels commented Mar 21, 2018

pditommaso commented Mar 27, 2018

ewels commented Mar 27, 2018

pditommaso commented Mar 27, 2018 • edited Loading

ewels commented Mar 27, 2018

pditommaso commented Mar 27, 2018

ewels commented Mar 27, 2018

pditommaso commented Mar 27, 2018

ewels commented Mar 27, 2018

pditommaso commented Mar 27, 2018

ewels commented Mar 28, 2018

pditommaso commented Mar 29, 2018

ewels commented Mar 29, 2018

pditommaso commented Apr 9, 2018

ewels commented Apr 9, 2018

pditommaso commented Apr 9, 2018

ewels commented Apr 9, 2018

ewels commented Oct 2, 2018

stale bot commented Apr 27, 2020

ewels commented Apr 27, 2020

pditommaso commented Apr 27, 2020

ewels commented May 27, 2022

bentsherman commented May 27, 2022

ewels commented May 27, 2022

bentsherman commented May 27, 2022 • edited Loading

ewels commented May 27, 2022

pditommaso commented May 31, 2022

bentsherman commented May 31, 2022

aydemiro commented Oct 26, 2023 • edited Loading

bentsherman commented Oct 27, 2023

pditommaso commented Mar 5, 2024

bentsherman commented Mar 5, 2024

pditommaso commented Mar 5, 2024

bentsherman commented Mar 5, 2024

pditommaso commented Mar 11, 2024

ewels commented Mar 11, 2024 • edited Loading

pditommaso commented Mar 11, 2024

bentsherman commented Mar 11, 2024

pditommaso commented Mar 11, 2024

bentsherman commented Mar 11, 2024 • edited Loading

pditommaso commented Mar 11, 2024

bentsherman commented Mar 11, 2024

ewels commented Mar 12, 2024 • edited Loading

ewels commented Mar 12, 2024 • edited Loading

ewels commented Apr 18, 2024

pditommaso commented Mar 27, 2018 •

edited

Loading

bentsherman commented May 27, 2022 •

edited

Loading

aydemiro commented Oct 26, 2023 •

edited

Loading

ewels commented Mar 11, 2024 •

edited

Loading

bentsherman commented Mar 11, 2024 •

edited

Loading

ewels commented Mar 12, 2024 •

edited

Loading

ewels commented Mar 12, 2024 •

edited

Loading