Skip to content

Commit

Permalink
Merge branch 'master' into 4871-azure-managed-identity
Browse files Browse the repository at this point in the history
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
  • Loading branch information
adamrtalbot committed May 2, 2024
2 parents ebdcc5b + e0e9422 commit 127376a
Show file tree
Hide file tree
Showing 56 changed files with 478 additions and 197 deletions.
4 changes: 4 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ allprojects {
java {
toolchain {
languageVersion = JavaLanguageVersion.of(19)
// note: the use of Java 21 causes the error "NoClassDefFoundError: java/util/SequencedCollection"
// see also
// https://aphyr.com/posts/369-classnotfoundexception-java-util-sequencedcollection
// https://www.baeldung.com/java-21-sequenced-collections
}
}

Expand Down
31 changes: 28 additions & 3 deletions docs/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,19 +157,43 @@ See the [Batch documentation](https://docs.microsoft.com/en-us/azure/batch/quick

### Pools configuration

When using the `autoPoolMode` option, Nextflow automatically creates a `pool` of compute nodes to execute the jobs in your pipeline. By default, it only uses one compute node of the type `Standard_D4_v3`.
When using the `autoPoolMode` option, Nextflow automatically creates a `pool` of compute nodes appropriate for your pipeline.

By default, the `cpus` and `memory` directives are used to find the smallest machine type that fits the requested resources in the Azure machine family, specified by `machineType`. If `memory` is not specified, 1 GB of memory is allocated per CPU. When no options are specified, it only uses one compute node of the type `Standard_D4_v3`.

To specify multiple Azure machine families, use a comma separated list with glob (`*`) values in the `machineType` directive. For example, the following will select any machine size from D or E v5 machines, with additional data disk, denoted by the `d` suffix:

```config
process.machineType = "Standard_D*d_v5,Standard_E*d_v5"
```

For example, the following process will create a pool of `Standard_E4d_v5` machines based when using `autoPoolMode`:

```nextflow
process EXAMPLE_PROCESS {
machineType "Standard_E*d_v5"
cpus 16
memory 8.GB
script:
"""
echo "cpus: ${task.cpus}"
"""
}
```

Note when creating tasks that use fewer than 4 CPUs, Nextflow will create a pool with machines that have 4 times the number of CPUs required in order to pack more tasks onto each machine. This means the pipeline spends less time waiting for machines to be created, startup and join the Azure Batch pool. Similarly, if a process requires fewer than 8 CPUs Nextflow will use a machine with double the number of CPUs required. If you wish to override this behaviour you can use a specific `machineType` directive, e.g. using a `machineType` directive of `Standard_E2d_v5` will use always use a Standard_E2d_v5 machine.

The pool is not removed when the pipeline terminates, unless the configuration setting `deletePoolsOnCompletion = true` is added in your Nextflow configuration file.

Pool specific settings, such as VM type and count, should be provided in the `auto` pool configuration scope, for example:
Pool specific settings should be provided in the `auto` pool configuration scope. If you wish to specify a single machine size for all processes, you can specify a fixed `vmSize` for the `auto` pool.

```groovy
azure {
batch {
pools {
auto {
vmType = 'Standard_D2_v2'
vmCount = 10
}
}
}
Expand Down Expand Up @@ -248,6 +272,7 @@ When Nextflow is configured to use a pool already available in the Batch account

1. The pool must be declared as `dockerCompatible` (`Container Type` property).
2. The task slots per node must match the number of cores for the selected VM. Otherwise, Nextflow will return an error like "Azure Batch pool 'ID' slots per node does not match the VM num cores (slots: N, cores: Y)".
3. Unless you are using [Fusion](./fusion.md), all tasks must have [AzCopy](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10) available in the path. If `azure.batch.copyToolInstallMode = 'node'` this will require every node to have the azcopy binary located at `$AZ_BATCH_NODE_SHARED_DIR/bin/`.

### Pool autoscaling

Expand Down
45 changes: 26 additions & 19 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@

## Configuration file

When a pipeline script is launched, Nextflow looks for configuration files in multiple locations. Since each configuration file can contain conflicting settings, the sources are ranked to determine which settings are applied. Possible configuration sources, in order of priority:
When a pipeline script is launched, Nextflow looks for configuration files in multiple locations. Since each configuration file may contain conflicting settings, they are applied in the following order (from lowest to highest priority):

1. Parameters specified on the command line (`--something value`)
2. Parameters provided using the `-params-file` option
3. Config file specified using the `-c my_config` option
4. The config file named `nextflow.config` in the current directory
5. The config file named `nextflow.config` in the workflow project directory
6. The config file `$HOME/.nextflow/config`
7. Values defined within the pipeline script itself (e.g. `main.nf`)
1. Parameters defined in pipeline scripts (e.g. `main.nf`)
2. The config file `$HOME/.nextflow/config`
3. The config file `nextflow.config` in the project directory
4. The config file `nextflow.config` in the launch directory
5. Config file specified using the `-c <config-file>` option
6. Parameters specified in a params file (`-params-file` option)
7. Parameters specified on the command line (`--something value`)

When more than one of these options for specifying configurations are used, they are merged, so that the settings in the first override the same settings appearing in the second, and so on.

:::{tip}
If you want to ignore any default configuration files and use only a custom one, use `-C <config file>`.
You can use the `-C <config-file>` option to use a single configuration file and ignore all other files.
:::

### Config syntax
Expand Down Expand Up @@ -1383,9 +1383,9 @@ process {
}
```

:::{note}
The `withName` selector applies to a process even when it is included from a module under an alias. For example, `withName: hello` will apply to any process originally defined as `hello`, regardless of whether it is included under an alias. Similarly, it will not apply to any process not originally defined as `hello`, even if it is included under the alias `hello`.
:::
The `withName` selector applies both to processes defined with the same name and processes included under the same alias. For example, `withName: hello` will apply to any process originally defined as `hello`, as well as any process included under the alias `hello`.

Furthermore, selectors for the alias of an included process take priority over selectors for the original name of the process. For example, given a process defined as `foo` and included as `bar`, the selectors `withName: foo` and `withName: bar` will both be applied to the process, with the second selector taking priority over the first.

:::{tip}
Label and process names do not need to be enclosed with quotes, provided the name does not include special characters (`-`, `!`, etc) and is not a keyword or a built-in type identifier. When in doubt, you can enclose the label name or process name with single or double quotes.
Expand Down Expand Up @@ -1424,24 +1424,31 @@ The above configuration snippet sets 2 cpus for the processes annotated with the

#### Selector priority

When mixing generic process configuration and selectors the following priority rules are applied (from lower to higher):
Process configuration settings are applied to a process in the following order (from lowest to highest priority):

1. Process generic configuration.
2. Process specific directive defined in the workflow script.
3. `withLabel` selector definition.
4. `withName` selector definition.
1. Process configuration settings (without a selector)
2. Process directives in the process definition
3. `withLabel` selectors matching any of the process labels
4. `withName` selectors matching the process name
5. `withName` selectors matching the process included alias
6. `withName` selectors matching the process fully qualified name

For example:

```groovy
process {
cpus = 4
withLabel: foo { cpus = 8 }
withName: bar { cpus = 32 }
withName: bar { cpus = 16 }
withName: 'baz:bar' { cpus = 32 }
}
```

Using the above configuration snippet, all workflow processes use 4 cpus if not otherwise specified in the workflow script. Moreover processes annotated with the `foo` label use 8 cpus. Finally the process named `bar` uses 32 cpus.
With the above configuration:
- All processes will use 4 cpus (unless otherwise specified in their process definition).
- Processes annotated with the `foo` label will use 8 cpus.
- Any process named `bar` (or imported as `bar`) will use 16 cpus.
- Any process named `bar` (or imported as `bar`) invoked by a workflow named `baz` with use 32 cpus.

(config-report)=

Expand Down
10 changes: 10 additions & 0 deletions docs/developer/diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
(diagram-page)=

# Workflow Diagram

The following diagram is a high-level overview of the Nextflow source code in a similar style as the {ref}`workflow diagram <dag-visualisation>` visualization for Nextflow pipelines. Each node and subgraph is a class. Arrows depict the flow of data and/or communication between classes.

In general, nodes with sharp corners are "record" classes that simply hold information, while nodes with rounded edges are "function" classes that transform some input into an output. Subgraphs are either long-running classes, i.e. "places where things happen", or one of the other two types for which it was useful to expand and show internal details.

```{mermaid} diagrams/overview.mmd
```
2 changes: 1 addition & 1 deletion docs/developer/diagrams/merge-diagrams.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ packages+=("nextflow.cli")
# packages+=("nextflow.cloud.aws.nio")
# packages+=("nextflow.cloud.azure")
# packages+=("nextflow.cloud.google")
# packages+=("nextflow.config")
packages+=("nextflow.config")
# packages+=("nextflow.container")
packages+=("nextflow.dag")
# packages+=("nextflow.executor")
Expand Down
1 change: 0 additions & 1 deletion docs/developer/diagrams/nextflow.cache.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ classDiagram
%% nextflow.cache
%%
Session --* CacheDB
%% CacheFactory --> CacheDB : createInstance

CacheDB --* CacheStore

Expand Down
2 changes: 2 additions & 0 deletions docs/developer/diagrams/nextflow.cloud.aws.nio.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ classDiagram
%%
%% nextflow.cloud.aws.nio
%%
FileSystemProvider <|-- S3FileSystemProvider

S3FileSystemProvider --> S3FileSystem : newFileSystem

class S3FileSystem {
Expand Down
5 changes: 1 addition & 4 deletions docs/developer/diagrams/nextflow.config.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@ classDiagram
%%
%% nextflow.config
%%
CmdRun --> ConfigMap : run
Session --* ConfigMap

ConfigBuilder --> ConfigParser : build
CmdRun --> ConfigBuilder : run
ConfigBuilder --> ConfigMap : build
ConfigParser --> ConfigBase : parse
40 changes: 12 additions & 28 deletions docs/developer/diagrams/nextflow.executor.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,16 @@ classDiagram
%%
%% nextflow.executor
%%
ProcessDef --> Executor : run
%% ExecutorFactory --> Executor : getExecutor
ExecutorFactory --> Executor : getExecutor

TaskProcessor --* Executor

%% class Executor {
%% name : String
%% monitor : TaskMonitor
%% }
%% Executor --* TaskMonitor
%% Executor --> TaskHandler : submit
class Executor {
name : String
monitor : TaskMonitor
}
Executor --* TaskMonitor
Executor --> TaskHandler : submit

%% TaskMonitor <|-- TaskPollingMonitor
TaskMonitor <|-- TaskPollingMonitor

class TaskPollingMonitor {
capacity : int
Expand All @@ -23,17 +20,13 @@ classDiagram
dumpInterval : Duration
}

%% TaskPollingMonitor <|-- LocalPollingMonitor
TaskPollingMonitor <|-- LocalPollingMonitor

class LocalPollingMonitor {
maxCpus : int
maxMemory : long
}

%% class TaskHandler {
%% task : TaskRun
%% }

Executor <|-- AbstractGridExecutor
Executor <|-- LocalExecutor
%% Executor <|-- NopeExecutor
Expand All @@ -49,19 +42,10 @@ classDiagram
%% PbsExecutor <|-- PbsProExecutor
%% SgeExecutor <|-- CrgExecutor

LocalExecutor --> LocalPollingMonitor : init
LocalExecutor --> LocalTaskHandler : submit
LocalExecutor --> NativeTaskHandler : submit
LocalTaskHandler --> BashWrapperBuilder : submit

AbstractGridExecutor --> TaskPollingMonitor : init
AbstractGridExecutor --> GridTaskHandler : submit
GridTaskHandler --> BashWrapperBuilder : submit

%% TaskHandler <|-- CachedTaskHandler
%% TaskHandler <|-- GridTaskHandler
%% TaskHandler <|-- LocalTaskHandler
%% TaskHandler <|-- NativeTaskHandler
TaskHandler <|-- GridTaskHandler
TaskHandler <|-- LocalTaskHandler
TaskHandler <|-- NativeTaskHandler
%% TaskHandler <|-- NopeTaskHandler
%% TaskHandler <|-- StoredTaskHandler

Expand Down
2 changes: 2 additions & 0 deletions docs/developer/diagrams/nextflow.extension.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@ classDiagram
OperatorImpl --> ToListOp : toList, toSortedList
OperatorImpl --> TransposeOp : transpose
OperatorImpl --> UntilOp : until

WorkflowBinding --> OpCall : invokeMethod
34 changes: 14 additions & 20 deletions docs/developer/diagrams/nextflow.script.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ classDiagram
session : Session
}
ScriptRunner --* ScriptFile
ScriptRunner --* Session
ScriptRunner --> ScriptParser : execute
ScriptParser --> BaseScript : parse

Expand All @@ -22,8 +21,15 @@ classDiagram
projectName : String
}

Session --* BaseScript
Session --* ScriptBinding
class BaseScript {
meta : ScriptMeta
entryFlow : WorkflowDef
}
BaseScript --* ScriptBinding
BaseScript --* ScriptMeta
BaseScript --> IncludeDef : include

IncludeDef --> ScriptParser : load0

class ScriptBinding {
scriptPath : Path
Expand All @@ -33,16 +39,6 @@ classDiagram
entryName : String
}

IncludeDef --> BaseScript : load0

class BaseScript {
meta : ScriptMeta
entryFlow : WorkflowDef
}
BaseScript --* ScriptMeta
%% BaseScript --> ProcessDef : process
%% BaseScript --> WorkflowDef : workflow

class ScriptMeta {
scriptPath : Path
definitions : Map
Expand All @@ -68,11 +64,9 @@ classDiagram
baseName : String
rawBody : Closure~BodyDef~
}
ProcessDef --> ProcessConfig : run
ProcessDef --> BodyDef : run
ProcessDef --> Executor : run
ProcessDef --> TaskProcessor : run
ProcessDef --> ChannelOut : run
ProcessDef --* ProcessConfig
ProcessDef --* BodyDef
ProcessDef --* ChannelOut

class WorkflowDef {
name : String
Expand All @@ -82,8 +76,8 @@ classDiagram
variableNames : Set~String~
}
WorkflowDef --* BodyDef
WorkflowDef --> WorkflowBinding : run
WorkflowDef --> ChannelOut : run
WorkflowDef --* WorkflowBinding
WorkflowDef --* ChannelOut

class ProcessConfig {
configProperties : Map
Expand Down
4 changes: 3 additions & 1 deletion docs/developer/diagrams/nextflow.secret.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ classDiagram
%%
%% nextflow.secret
%%
CmdRun --> SecretsProvider : run
ConfigBuilder --> SecretsLoader : build
BaseScript --> SecretsLoader : run
BashWrapperBuilder --> SecretsLoader : build

SecretsLoader --> SecretsProvider : load
SecretsProvider --> Secret : getSecret
Expand Down
25 changes: 8 additions & 17 deletions docs/developer/diagrams/nextflow.trace.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,12 @@ classDiagram
%%
%% nextflow.trace
%%
direction LR
Session --> TraceObserverFactory : init

%% TraceObserverFactory "1" --> "*" TraceObserver : create
%% TraceObserver <|-- AnsiLogObserver
%% TraceObserver <|-- GraphObserver
%% TraceObserver <|-- ReportObserver
%% TraceObserver <|-- TimelineObserver
%% TraceObserver <|-- TraceFileObserver
%% TraceObserver <|-- WebLogObserver
%% TraceObserver <|-- WorkflowStatsObserver

Session --> AnsiLogObserver : init
Session --> GraphObserver : init
Session --> ReportObserver : init
Session --> TimelineObserver : init
Session --> TraceFileObserver : init
Session --> WebLogObserver : init
Session --> WorkflowStatsObserver : init
TraceObserverFactory "1" --> "*" TraceObserver : create
TraceObserver <|-- AnsiLogObserver
TraceObserver <|-- GraphObserver
TraceObserver <|-- ReportObserver
TraceObserver <|-- TimelineObserver
TraceObserver <|-- TraceFileObserver
TraceObserver <|-- WorkflowStatsObserver

0 comments on commit 127376a

Please sign in to comment.