Skip to content

Commit

Permalink
Update developer diagrams (#4922)
Browse files Browse the repository at this point in the history
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
  • Loading branch information
bentsherman committed Apr 17, 2024
1 parent f1cffd1 commit dcff41a
Show file tree
Hide file tree
Showing 17 changed files with 132 additions and 79 deletions.
10 changes: 10 additions & 0 deletions docs/developer/diagram.md
@@ -0,0 +1,10 @@
(diagram-page)=

# Workflow Diagram

The following diagram is a high-level overview of the Nextflow source code in a similar style as the {ref}`workflow diagram <dag-visualisation>` visualization for Nextflow pipelines. Each node and subgraph is a class. Arrows depict the flow of data and/or communication between classes.

In general, nodes with sharp corners are "record" classes that simply hold information, while nodes with rounded edges are "function" classes that transform some input into an output. Subgraphs are either long-running classes, i.e. "places where things happen", or one of the other two types for which it was useful to expand and show internal details.

```{mermaid} diagrams/overview.mmd
```
2 changes: 1 addition & 1 deletion docs/developer/diagrams/merge-diagrams.sh
Expand Up @@ -9,7 +9,7 @@ packages+=("nextflow.cli")
# packages+=("nextflow.cloud.aws.nio")
# packages+=("nextflow.cloud.azure")
# packages+=("nextflow.cloud.google")
# packages+=("nextflow.config")
packages+=("nextflow.config")
# packages+=("nextflow.container")
packages+=("nextflow.dag")
# packages+=("nextflow.executor")
Expand Down
1 change: 0 additions & 1 deletion docs/developer/diagrams/nextflow.cache.mmd
Expand Up @@ -3,7 +3,6 @@ classDiagram
%% nextflow.cache
%%
Session --* CacheDB
%% CacheFactory --> CacheDB : createInstance

CacheDB --* CacheStore

Expand Down
2 changes: 2 additions & 0 deletions docs/developer/diagrams/nextflow.cloud.aws.nio.mmd
Expand Up @@ -2,6 +2,8 @@ classDiagram
%%
%% nextflow.cloud.aws.nio
%%
FileSystemProvider <|-- S3FileSystemProvider

S3FileSystemProvider --> S3FileSystem : newFileSystem

class S3FileSystem {
Expand Down
5 changes: 1 addition & 4 deletions docs/developer/diagrams/nextflow.config.mmd
Expand Up @@ -2,9 +2,6 @@ classDiagram
%%
%% nextflow.config
%%
CmdRun --> ConfigMap : run
Session --* ConfigMap

ConfigBuilder --> ConfigParser : build
CmdRun --> ConfigBuilder : run
ConfigBuilder --> ConfigMap : build
ConfigParser --> ConfigBase : parse
40 changes: 12 additions & 28 deletions docs/developer/diagrams/nextflow.executor.mmd
Expand Up @@ -2,19 +2,16 @@ classDiagram
%%
%% nextflow.executor
%%
ProcessDef --> Executor : run
%% ExecutorFactory --> Executor : getExecutor
ExecutorFactory --> Executor : getExecutor

TaskProcessor --* Executor

%% class Executor {
%% name : String
%% monitor : TaskMonitor
%% }
%% Executor --* TaskMonitor
%% Executor --> TaskHandler : submit
class Executor {
name : String
monitor : TaskMonitor
}
Executor --* TaskMonitor
Executor --> TaskHandler : submit

%% TaskMonitor <|-- TaskPollingMonitor
TaskMonitor <|-- TaskPollingMonitor

class TaskPollingMonitor {
capacity : int
Expand All @@ -23,17 +20,13 @@ classDiagram
dumpInterval : Duration
}

%% TaskPollingMonitor <|-- LocalPollingMonitor
TaskPollingMonitor <|-- LocalPollingMonitor

class LocalPollingMonitor {
maxCpus : int
maxMemory : long
}

%% class TaskHandler {
%% task : TaskRun
%% }

Executor <|-- AbstractGridExecutor
Executor <|-- LocalExecutor
%% Executor <|-- NopeExecutor
Expand All @@ -49,19 +42,10 @@ classDiagram
%% PbsExecutor <|-- PbsProExecutor
%% SgeExecutor <|-- CrgExecutor

LocalExecutor --> LocalPollingMonitor : init
LocalExecutor --> LocalTaskHandler : submit
LocalExecutor --> NativeTaskHandler : submit
LocalTaskHandler --> BashWrapperBuilder : submit

AbstractGridExecutor --> TaskPollingMonitor : init
AbstractGridExecutor --> GridTaskHandler : submit
GridTaskHandler --> BashWrapperBuilder : submit

%% TaskHandler <|-- CachedTaskHandler
%% TaskHandler <|-- GridTaskHandler
%% TaskHandler <|-- LocalTaskHandler
%% TaskHandler <|-- NativeTaskHandler
TaskHandler <|-- GridTaskHandler
TaskHandler <|-- LocalTaskHandler
TaskHandler <|-- NativeTaskHandler
%% TaskHandler <|-- NopeTaskHandler
%% TaskHandler <|-- StoredTaskHandler

Expand Down
2 changes: 2 additions & 0 deletions docs/developer/diagrams/nextflow.extension.mmd
Expand Up @@ -26,3 +26,5 @@ classDiagram
OperatorImpl --> ToListOp : toList, toSortedList
OperatorImpl --> TransposeOp : transpose
OperatorImpl --> UntilOp : until

WorkflowBinding --> OpCall : invokeMethod
34 changes: 14 additions & 20 deletions docs/developer/diagrams/nextflow.script.mmd
Expand Up @@ -9,7 +9,6 @@ classDiagram
session : Session
}
ScriptRunner --* ScriptFile
ScriptRunner --* Session
ScriptRunner --> ScriptParser : execute
ScriptParser --> BaseScript : parse

Expand All @@ -22,8 +21,15 @@ classDiagram
projectName : String
}

Session --* BaseScript
Session --* ScriptBinding
class BaseScript {
meta : ScriptMeta
entryFlow : WorkflowDef
}
BaseScript --* ScriptBinding
BaseScript --* ScriptMeta
BaseScript --> IncludeDef : include

IncludeDef --> ScriptParser : load0

class ScriptBinding {
scriptPath : Path
Expand All @@ -33,16 +39,6 @@ classDiagram
entryName : String
}

IncludeDef --> BaseScript : load0

class BaseScript {
meta : ScriptMeta
entryFlow : WorkflowDef
}
BaseScript --* ScriptMeta
%% BaseScript --> ProcessDef : process
%% BaseScript --> WorkflowDef : workflow

class ScriptMeta {
scriptPath : Path
definitions : Map
Expand All @@ -68,11 +64,9 @@ classDiagram
baseName : String
rawBody : Closure~BodyDef~
}
ProcessDef --> ProcessConfig : run
ProcessDef --> BodyDef : run
ProcessDef --> Executor : run
ProcessDef --> TaskProcessor : run
ProcessDef --> ChannelOut : run
ProcessDef --* ProcessConfig
ProcessDef --* BodyDef
ProcessDef --* ChannelOut

class WorkflowDef {
name : String
Expand All @@ -82,8 +76,8 @@ classDiagram
variableNames : Set~String~
}
WorkflowDef --* BodyDef
WorkflowDef --> WorkflowBinding : run
WorkflowDef --> ChannelOut : run
WorkflowDef --* WorkflowBinding
WorkflowDef --* ChannelOut

class ProcessConfig {
configProperties : Map
Expand Down
4 changes: 3 additions & 1 deletion docs/developer/diagrams/nextflow.secret.mmd
Expand Up @@ -2,7 +2,9 @@ classDiagram
%%
%% nextflow.secret
%%
CmdRun --> SecretsProvider : run
ConfigBuilder --> SecretsLoader : build
BaseScript --> SecretsLoader : run
BashWrapperBuilder --> SecretsLoader : build

SecretsLoader --> SecretsProvider : load
SecretsProvider --> Secret : getSecret
Expand Down
25 changes: 8 additions & 17 deletions docs/developer/diagrams/nextflow.trace.mmd
Expand Up @@ -2,21 +2,12 @@ classDiagram
%%
%% nextflow.trace
%%
direction LR
Session --> TraceObserverFactory : init

%% TraceObserverFactory "1" --> "*" TraceObserver : create
%% TraceObserver <|-- AnsiLogObserver
%% TraceObserver <|-- GraphObserver
%% TraceObserver <|-- ReportObserver
%% TraceObserver <|-- TimelineObserver
%% TraceObserver <|-- TraceFileObserver
%% TraceObserver <|-- WebLogObserver
%% TraceObserver <|-- WorkflowStatsObserver

Session --> AnsiLogObserver : init
Session --> GraphObserver : init
Session --> ReportObserver : init
Session --> TimelineObserver : init
Session --> TraceFileObserver : init
Session --> WebLogObserver : init
Session --> WorkflowStatsObserver : init
TraceObserverFactory "1" --> "*" TraceObserver : create
TraceObserver <|-- AnsiLogObserver
TraceObserver <|-- GraphObserver
TraceObserver <|-- ReportObserver
TraceObserver <|-- TimelineObserver
TraceObserver <|-- TraceFileObserver
TraceObserver <|-- WorkflowStatsObserver
69 changes: 69 additions & 0 deletions docs/developer/diagrams/overview.mmd
@@ -0,0 +1,69 @@
flowchart TB
subgraph Launcher
subgraph CmdRun
subgraph AssetManager
ScriptFile
end
subgraph ConfigBuilder
ConfigParser([ConfigParser])
ConfigBase([ConfigBase])
end
subgraph ScriptRunner
subgraph Session
ConfigMap
DAG
ExecutorFactory([ExecutorFactory])
subgraph TaskProcessor
TaskRun
end
subgraph Executor
subgraph TaskMonitor
TaskHandler
end
TaskBean
BashWrapperBuilder([BashWrapperBuilder])
end
TraceRecord
CacheFactory([CacheFactory])
CacheDB
TraceObserver([TraceObserver])
end
ScriptParser([ScriptParser])
BaseScript([BaseScript])
subgraph ScriptMeta
WorkflowDef([WorkflowDef])
ProcessDef([ProcessDef])
FunctionDef([FunctionDef])
end
IncludeDef([IncludeDef])
OpCall([OpCall])
end
ConfigParser --> ConfigBase
ConfigBase --> ConfigMap
ScriptFile --> ScriptParser
ScriptParser --> BaseScript
BaseScript --> WorkflowDef
BaseScript --> ProcessDef
BaseScript --> FunctionDef
BaseScript --> IncludeDef
IncludeDef --> ScriptParser
WorkflowDef --> OpCall
OpCall --> DAG
ProcessDef --> DAG
DAG --> TaskRun
TaskRun --> DAG
ExecutorFactory --> Executor
ConfigMap --> Executor
ProcessDef --> TaskProcessor
ConfigMap --> TaskProcessor
TaskRun --> TaskHandler
TaskRun --> TaskBean
TaskBean --> BashWrapperBuilder
BashWrapperBuilder --> TaskHandler
CacheFactory --> CacheDB
TaskHandler --> CacheDB
TaskHandler --> TraceRecord
TraceRecord --> CacheDB
TaskHandler --> TraceObserver
end
end
4 changes: 2 additions & 2 deletions docs/developer/nextflow.executor.md
Expand Up @@ -14,8 +14,8 @@ Some classes may be excluded from the above diagram for brevity.

## Notes

The `Executor` class is the base class for all Nextflow executors. The main purpose of an `Executor` is to submit tasks to an underlying compute environment, such as an HPC scheduler or cloud batch executor. It uses a `TaskMonitor` to manage the lifecycle of all tasks and `TaskHandler`s to manage each individual task. Most executors use the same polling monitor, but each executor implements its own task handler to customize it for a particular compute environment. See [nextflow.processor](nextflow.processor.md) for more details about these classes.
The `Executor` class is the base class for all Nextflow executors. The main purpose of an `Executor` is to submit tasks to an underlying compute environment, such as an HPC scheduler or cloud batch executor. It uses a `TaskMonitor` to manage the lifecycle of all tasks and a `TaskHandler` to manage each individual task. Most executors use the same polling monitor, but each executor implements its own task handler to customize it for a particular compute environment. See [nextflow.processor](nextflow.processor.md) for more details about these classes.

The built-in executors include the local executor (`LocalExecutor`) and the various grid executors (SLURM, PBS, LSF, etc), all of which extend `AbstractGridExecutor`. The `LocalExecutor` implements both "local" tasks (processes with a `script` or `shell` block) and "native" tasks (processes with an `exec` block).
The built-in executors include the local executor (`LocalExecutor`) and the various grid executors (SLURM, PBS, LSF, etc), all of which extend `AbstractGridExecutor`. The `LocalExecutor` implements both "script" tasks (processes with a `script` or `shell` block) and "native" tasks (processes with an `exec` block).

The `BashWrapperBuilder` is used by executors to generate the wrapper script (`.command.run`) for a task, from a template script called `command-run.txt`, as well as the task configuration and the execution environment.
2 changes: 1 addition & 1 deletion docs/developer/nextflow.k8s.md
Expand Up @@ -14,4 +14,4 @@ Some classes may be excluded from the above diagram for brevity.

## Notes

The Kubernetes integration uses the K8s REST API to interact with K8s clusters, and relies on the `kubectl` command and `~/.kube/config` file for authentication.
The Kubernetes integration uses the K8s HTTP API to interact with K8s clusters, and relies on the `kubectl` command and `~/.kube/config` file for authentication.
6 changes: 4 additions & 2 deletions docs/developer/nextflow.processor.md
Expand Up @@ -14,8 +14,10 @@ Some classes may be excluded from the above diagram for brevity.

## Notes

While the [`executor`](nextflow.executor.md) package defines how tasks are submitted to a particular execution environment (such as an HPC scheduler), the `processor` package defines how tasks are created and executed. As such, these packages work closely together, and in fact several components of the `Executor` interface, specifically the `TaskHandler` and `TaskMonitor`, are defined in this package.
While the [`executor`](nextflow.executor.md) package defines how tasks are submitted to a particular execution backend (such as an HPC scheduler), the `processor` package defines how tasks are created and executed. As such, these packages work closely together, and in fact several components of the `Executor` interface, specifically the `TaskHandler` and `TaskMonitor`, are defined in this package.

The `TaskProcessor` is by far the largest and most complex class in this package. It implements both the dataflow operator for a given process as well as the task execution logic. In other words, it defines the mapping from an abstract process definition with concrete channel inputs into concrete task executions.
The `TaskProcessor` is by far the largest and most complex class in this package. It implements both the dataflow operator for a given process as well as the task execution logic. In other words, it defines the mapping from an abstract process definition with input and output channels into concrete task executions.

A `TaskRun` represents a particular task execution. There is also `TaskBean`, which is a serializable representation of a task. Legends say that `TaskBean` was originally created to support a "daemon" mode in which Nextflow would run on both the head node and the worker nodes, so the Nextflow "head" would need to send tasks to the Nextflow "workers". This daemon mode was never completed, but echoes of it remain (see `CmdNode`, `DaemonLauncher`, and the `nf-ignite` plugin).

When a `TaskProcessor` receives a set of input values, it creates a `TaskRun` and submits it to an `Executor`, which in turn submits the task to a underlying execution backend. The executor's `TaskMonitor` then monitors the status of the task, and when it is completed, returns it to the task processor for finalization. If the task completed successfully, the task processor collects the task outputs and emits them on the corresponding output channels. If the task failed, the task processor will retry it if possible, or else return a task error to the workflow run.
2 changes: 1 addition & 1 deletion docs/developer/nextflow.script.md
Expand Up @@ -14,7 +14,7 @@ Some classes may be excluded from the above diagram for brevity.

## Notes

The execution of a Nextflow pipeline occurs in two phases. In the first phase, Nextflow parses and runs the script (using the language extensions in [nextflow.ast](nextflow.ast.md) and [nextflow.extension](nextflow.extension.md)), which constructs the workflow DAG. In the second phase, Nextflow executes the workflow.
The execution of a Nextflow pipeline occurs in two phases. In the first phase, Nextflow parses and runs the script (using the language extensions in [nextflow.ast](nextflow.ast.md) and [nextflow.extension](nextflow.extension.md)), which produces the workflow DAG. In the second phase, Nextflow executes the workflow.

```{note}
In DSL1, there was no separation between workflow construction and execution -- dataflow operators were executed as soon as they were constructed. DSL2 introduced lazy execution in order to separate process definition from execution, and thereby facilitate subworkflows and modules.
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/nextflow.trace.md
Expand Up @@ -14,4 +14,4 @@ Some classes may be excluded from the above diagram for brevity.

## Notes

The `TraceObserver` interface defines a set of hooks into the workflow execution, such as when a workflow starts and completes, when a task starts and completes, and when an output file is published. The `Session` maintains a list of all observers and triggers each hook when the corresponding event occurs. Implementing classes can use these hooks to perform custom behaviors. In fact, this interface is used to implemented several core features, including the various execution reports, DAG renderer, and the integration with Seqera Platform.
The `TraceObserver` interface defines a set of hooks into the workflow execution, such as when a workflow starts and completes, when a task starts and completes, and when an output file is published. The `Session` maintains a list of all observers and triggers each hook when the corresponding event occurs. Implementing classes can use these hooks to perform custom behaviors. In fact, this interface is used to implement several core features, including the various execution reports, DAG renderer, and the integration with Seqera Platform.
1 change: 1 addition & 0 deletions docs/index.md
Expand Up @@ -121,6 +121,7 @@ flux
:maxdepth: 1
developer/index
developer/diagram
developer/packages
developer/plugins
```

0 comments on commit dcff41a

Please sign in to comment.