Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog: Tweak config examples #115

Merged
merged 3 commits into from
May 25, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Depending on the workloads a cluster is designed to support, compute hosts may b

HPC workload managers have been around for decades. Initial efforts date back to the original [Portable Batch System](https://www.chpc.utah.edu/documentation/software/pbs-scheduler.php) (PBS) developed for NASA in the early 1990s. While modern workload managers have become enormously sophisticated, many of their core principles remain unchanged.

Workload managers are designed to share resources efficiently between users and groups. Modern workload managers support many different scheduling policies and workload types — from parallel jobs to array jobs to interactive jobs to affinity/NUMA-aware scheduling. As a result, schedulers have many "knobs and dials" to support various applications and use cases. While complicated, all of this configurability makes them extremely powerful and flexible in the hands of a skilled cluster administrator.
Workload managers are designed to share resources efficiently between users and groups. Modern workload managers support many different scheduling policies and workload types — from parallel jobs to array jobs to interactive jobs to affinity/NUMA-aware scheduling. As a result, schedulers have many "knobs and dials" to support various applications and use cases. While complicated, all of this configurability makes them extremely powerful and flexible in the hands of a skilled cluster administrator.

### Some notes on terminology

Expand All @@ -63,7 +63,7 @@ To ensure that pipelines are portable across clouds and HPC clusters, Nextflow u

You can specify the executor to use in the [nextflow.config](https://nextflow.io/docs/latest/config.html?highlight=queuesize#configuration-file) file, inline in your pipeline code, or by setting the shell variable `NXF_EXECUTOR` before running a pipeline.

```
```groovy
process.executor = 'slurm'
```

Expand Down Expand Up @@ -238,17 +238,12 @@ Most HPC workload managers support the notion of queues. In a small cluster with

Workload managers typically have default queues. For example, `normal` is the default queue in LSF, while `all.q` is the default queue in Grid Engine. Slurm supports the notion of partitions that are essentially the same as queues, so Slurm partitions are referred to as queues within Nextflow. You should ask your HPC cluster administrator what queue to use when submitting Nextflow jobs.

Like the executor, queues are part of the process scope. The queue to dispatch jobs to is usually defined once in the `nextflow.config` file and applied to all processes in the workflow. However, they can be defined individually for each process as shown:
Like the executor, queues are part of the process scope. The queue to dispatch jobs to is usually defined once in the `nextflow.config` file and applied to all processes in the workflow as shown below, or it can be set per-process.

```
process grid_job {
queue 'long'
executor 'sge'


"""
your task script here
"""
process {
queue = 'myqueue'
executor = 'sge'
}
```

Expand All @@ -266,9 +261,10 @@ Depending on the executor, you can pass various resource requirements for each p

When writing pipelines, it is a good practice to consolidate per-process resource requirements in the `nextflow.config` file, and use process selectors to indicate what resource requirements apply to what process steps. For example, in the example below, processes will be dispatched to the Slurm cluster by default. Each process will require two cores, 4 GB of memory, and can run for no more than 10 minutes. For the foo and long-running bar jobs, process-specific selectors can override these default settings as shown below:

```
```groovy
process {
executor = 'slurm'
queue = 'general'
cpus = 2
memory = '4 GB'
time = '10m'
Expand All @@ -281,10 +277,11 @@ process {


withName: bar {
queue = 'long'
cpus = 32
memory = '8 GB'
time = '1h 30m'
}
time = '1h 30m'
}
}
```

Expand All @@ -294,19 +291,16 @@ Sometimes, organizations may want to take advantage of syntax specific to a part

These scheduler-specific commands can get very detailed and granular. They can apply to all processes in a workflow or only to specific processes. As an LSF-specific example, suppose a deep learning model training workload is a step in a Nextflow pipeline. The deep learning framework used may be GPU-aware and have specific topology requirements.

In this example, we specify a job consisting of two tasks where each task runs on a separate host and requires exclusive use of two GPUs. We also impose a resource requirement that we want to schedule the CPU portion of each CUDA job in physical proximity to the GPU to improve performance (on a processor core close to the same PCIe or NVLink connection, for example).
In this example, we specify a job consisting of two tasks where each task runs on a separate host and requires exclusive use of two GPUs. We also impose a resource requirement that we want to schedule the CPU portion of each CUDA job in physical proximity to the GPU to improve performance (on a processor core close to the same PCIe or NVLink connection, for example).

```
process dl_workload {
executor 'lsf'
queue 'gpu_hosts'
memory '16B'
clusterOptions '-gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] affinity[core(1)]"'


"""
your task script here
"""
```groovy
process {
withName: dl_workload {
executor = 'lsf'
queue = 'gpu_hosts'
memory = '16B'
clusterOptions = '-gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] affinity[core(1)]"'
}
}
```

Expand All @@ -327,8 +321,7 @@ $ cat submit_pipeline.sh
#BSUB -e err.%J
#BSUB -J headjob
#BSUB -R "rusage[mem=16GB]"
export NFX_OPTS="-Xms=512m -Xmx=8g"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as this isn't explained until the following section. Alternatively could reorder the sections, or just leave it here without explanation is probably fine too.

nextflow run nextflow=io/hello -bg -c my.config -ansi-log false
nextflow run nextflow=io/hello -c my.config -ansi-log false
ewels marked this conversation as resolved.
Show resolved Hide resolved


$ bsub < submit_pipeline.sh
Expand All @@ -344,8 +337,8 @@ Setting the JVM’s max heap size is another good practice when running on an HP

These can be specified using the `NXF_OPTS` environment variable.

```
$ export NFX_OPTS="-Xms=512m -Xmx=8g"
```bash
export NFX_OPTS="-Xms=512m -Xmx=8g"
```

The `-Xms` flag specifies the minimum heap size, and -Xmx specifies the maximum heap size. In the example above, the minimum heap size is set to 512 MB, which can grow to a maximum of 8 GB. You will need to experiment with appropriate values for each pipeline to determine how many concurrent head jobs you can run on the same host.
Expand All @@ -358,15 +351,15 @@ Nextflow requires a shared file system path as a working directory to allow the

Nextflow implements this best practice which can be enabled by adding the following setting in your `nextflow.config` file.

```
```groovy
process.scratch = true
```

By default, if you enable `process.scratch`, Nextflow will use the directory pointed to by `$TMPDIR` as a scratch directory on the execution host.

You can optionally specify a specific path for the scratch directory as shown:

```
```groovy
process.scratch = '/ssd_drive/scratch_dir'
```

Expand All @@ -385,8 +378,8 @@ To learn more about Nextflow and how it works with various storage architectures

If you are launching your pipeline from a login node or cluster head node, it is useful to run pipelines in the background without losing the execution output reported by Nextflow. You can accomplish this by using the -bg switch in Nextflow and redirecting *stdout* to a log file as shown:

```
$ nextflow run <pipeline> -bg > my-file.log
```bash
nextflow run <pipeline> -bg > my-file.log
```

This frees up the interactive command line to run commands such as [squeue](https://slurm.schedmd.com/squeue.html) (Slurm) or [qstat](https://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html) (Grid Engine) to monitor job execution on the cluster. It is also beneficial because it prevents network connection issues from interfering with pipeline execution.
Expand All @@ -399,22 +392,15 @@ Getting resource requirements such as cpu, memory, and time is often challenging

To address this problem, Nextflow provides a mechanism that allows you to modify the amount of computing resources requested in the case of a process failure on the fly and attempt to re-execute it using a higher limit. For example:

```
process foo {


memory { 2.GB * task.attempt }
time { 1.hour * task.attempt }


errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries 3

```groovy
process {
withName: foo {
memory = { 2.GB * task.attempt }
time = { 1.hour * task.attempt }

script:
"""
your_job_command --here
"""
errorStrategy = { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries = 3
}
}
```

Expand Down Expand Up @@ -463,7 +449,7 @@ There are several additional Nextflow configuration options that are important t

`submitRateLimit` – Depending on the scheduler, having many users simultaneously submitting large numbers of jobs to a cluster can overwhelm the scheduler on the head node and cause it to become unresponsive to commands. To mitigate this, if your pipeline submits a large number of jobs, it is a good practice to throttle the rate at which jobs will be dispatched from Nextflow. By default the job submission rate is unlimited. If you wanted to allow no more than 50 jobs to be submitted every two minutes, set this parameter as shown:

```
```groovy
executor.submitRateLimit = '50/2min'
executor.queueSize = 50
```
Expand All @@ -472,7 +458,7 @@ executor.queueSize = 50

When using these tools, it is helpful to associate a meaningful name with each job. Remember, a job in the context of the workload manager maps to a process or task in Nextflow. Use the `jobName` property associated with the executor to give your job a name. You can construct these names dynamically as illustrated below so the job reported by the workload manager reflects the name of our Nextflow process step and its unique ID.

```
```groovy
executor.jobName = { "$task.name - $task.hash" }
```

Expand Down