Workflow definition & composition [RFC] #1276

pditommaso · 2019-08-13T15:49:22Z

Workflow components

This feature adds the support for workflow component definition. It allows the definition of a sequence of processes (and other workflows) as a reusable and composable component with its own inputs and outputs.

Workflow definition

The workflow keyword allows the definition of sub-workflow components that enclose the
invocation of one or more processes, operators and other workflows. For example:

    workflow my_pipeline {
        foo()
        bar( foo.out.collect() )
    }

Once defined it can be invoked from another (sub)workflow component definition as any other function or process.

Workflow parameters

A workflow component can access any variable and parameter defined in the outer scope:

        params.data = '/some/data/file'

        workflow my_pipeline  {
            if( params.data )
                bar(params.data)
            else
                bar(foo())
        }

Workflow inputs

A workflow component can declare one or more input channels using the get keyword. For example:

        workflow my_pipeline {
            get: data
            main:
            foo(data)
            bar(foo.out)
        }

WARN: When the get is used the beginning of the workflow body needs to be identified with the main keyword.

Then, the input can be specified a argument on the workflow invocation statement:

    my_pipeline( Channel.from('/some/data') )

NOTE: Workflow inputs are by definition channel data structure. If a basic data type is provided instead, ie. number, string, list, etc. it's implicitly converted to a channel value (ie. non-consumable).

Workflow outputs

A workflow component can declare one or more out channels using the emit keyword. For example:

        workflow my_pipeline {
            main:
              foo(data)
              bar(foo.out)
            emit:
              bar.out
        }

Then, the result of the my_pipeline execution can be accessed using the out property ie.
my_pipeline.out. When is declared more than one output channels, use the array bracket notation to access each output component as described for the process.

Alternatively, the output channel can be accessed using the identifier name to which it's assigned in the emit declaration:

         workflow my_pipeline {
            main:
              foo(data)
              bar(foo.out)
            emit:
              my_data = bar.out
        }

Then, the result of the above snippet can be accessed using the my_pipeline.out.my_data.

Implicit workflow

A workflow definition which does not define any name is assumed to be the main workflow and it's implicitly executed. Therefore it's the entry point of the workflow application.

Example

An example script is available at this link and here.

Request for comments

This feature is available as experimental in the latest snapshot 19.08.0-SNAPSHOT build 5148.

Any feedback and comment are welcome.

The text was updated successfully, but these errors were encountered:

Implements: #1276 Solves: #1267, #1275

mes5k · 2019-08-28T21:43:11Z

Awesome! DSL 2 is excellent and this just helps clarify things further.

pditommaso · 2019-08-28T21:46:31Z

@mes5k it took some time, but it's turning out awesome 😎

haqle314 · 2019-09-06T20:14:42Z

Hi, I don't know if I should submit a bug report for this but I noticed some inconsistencies with identifiers under emit:. Unless they are part of a pipeline, name of channels/processes/workflows are not recognized properly. Here's an example

nextflow.preview.dsl = 2

process sayHello {
    output: stdout()
    shell: "echo Hello world"
}
process greet {
    input: val person
    output: val greeting
    exec:
    greeting = "Hi, ${person}."
}

workflow foo {
    emit: sayHello | view
}

workflow bar {
    emit: foo
}

workflow greet1 {
    get: data
    emit: greet(data)
}
workflow greet2 {
    get: data
    emit: data | greet
}

workflow {
    greet1(Channel.from("Alice", "Bob")) | view
    greet2(Channel.from("Alice", "Bob")) | view
    foo | view
}

foo executes as expected but trying to execute bar results in the error
Missing workflow output parameter: foo. It doesn't matter whether foo is a
workflow or a process, the same error message will appear.
Trying to execute greet1 will result in the error message No such variable: data.

pditommaso · 2019-09-09T08:36:16Z

Thank for trying this out! I've spotted a couple of problems in your script:

the emit in workflow bar should reference explicitly foo.out
same problem with the workflow greet1

I'm copying below the working script for your convenience.

nextflow.preview.dsl = 2

process sayHello {
    output: stdout()
    shell: "echo Hello world"
}
process greet {
    input: val person
    output: val greeting
    exec:
    greeting = "Hi, ${person}."
}

workflow foo {
    emit: sayHello | view
}

workflow bar {
    emit: foo.out
}

workflow greet1 {
    get: data
    main: greet(data)
    emit: greet.out
}
workflow greet2 {
    get: data
    emit: data | greet
}

workflow {
    greet1(Channel.from("Alice", "Bob")) | view
    greet2(Channel.from("Alice", "Bob")) | view
    foo | view
}

haqle314 · 2019-09-09T16:01:13Z

Thanks for the script, @pditommaso. Now that I think about it, using foo.out
instead of foo makes sense. I just initially felt that emit: foo | view
works but emit: foo doesn't isn't very intuitive.

One thing about the script you posted is that I still have to explicitly execute
foo inside main

workflow bar {
    foo()                      // Need this line, bar.out emits null without it
    emit: foo.out
}

I'm bugging you about this because I think it's a lot more succint, being able
to skip the workflow body like test2 in
subworkflow-dsl2.nf

On an unrelated note, is there a particular reason why you picked get, main,
and emit as the keywords? I'd think that it's easier for users to pick up if
workflow definition mimics the structure of a process definition, something like
this.

workflow foo {
    input: data
    output: result
    exec:
    /* data processing steps */
}

pditommaso · 2019-10-09T09:38:53Z

On an unrelated note, is there a particular reason why you picked get, main,
and emit as the keywords?

Sorry for the late reply. get and emit were chosen for the reason that the workflow declaration has a syntax different from the process input and output statements. Using the same keyword in a different context with a different syntax can open a space of ambiguity for the developer. However, others have expressed doubts in particular for the get keyword. I'll open a separate thread to discuss other proposals.

bobamess · 2019-11-15T16:59:47Z

Could you use take as a keyword instead of get? It's short and begins with a t, which is opposite to emit and has a similar meaning to get.

pditommaso · 2019-11-16T13:37:59Z

Yes, also @evanfloden was proposing this. Next edge release will use take instead of get.

darcyabjones · 2019-12-11T06:13:41Z

Hi there!

I've been looking at using multiple related workflows defined in the same package using DSL2.
I can see a few ways to do it, but my current stumbling point is when publishing results.

What I'd like to be able to do is define multiple workflows and be able to run a subset of the total pipeline using the -entry flag.
But because this won't be the main/implicit workflow, I can't use the publish block.

Here's a contrived example:

nextflow.preview.dsl = 2

process greet {
    input:
    val person

    output:
    path "greeting_${person}.txt"

    script:
    """
    echo "Hi, ${person}." > "greeting_${person}.txt"
    """
}

process reply {
    input:
    val person1
    val person2
    path "in.txt"

    output:
    path "reply_${person1}_${person2}.txt"

    script:
    """
    echo "Oh hey ${person2}." | cat in.txt - > "reply_${person1}_${person2}.txt"
    """
}

workflow greet1 {
    get: data
    main: greet(data)
    emit: greet.out
}

workflow greet2 {
    get:
        data1
        data2
    main:
        greet(data1)
        reply(data1, data2, greet.out)
    emit: reply.out
}

workflow alternate {
    main:
        greet2(Channel.from("Darcy", "Paolo"), Channel.from("Alice", "Bob"))
}

workflow {
    main:
        greet1(Channel.from("Alice", "Bob"))
        greet2(Channel.from("Alice", "Bob"), Channel.from("Darcy", "Paolo"))

    publish:
        greet1.out to: "results"
        greet2.out to: "results"
}

Running nextflow run main.nf will publish as expected, but nextflow run main.nf -entry alternate cannot.

What do you imagine would be the best way to do this in dsl2?
The current workaround would be to use a number of conditional statements in the main workflow.
But even that would involve lots of boilerplate to do publishing (empty channels etc).

Thanks Darcy.

PS. Really loving DSL2 so far.
I can see a lot of potential for code re-use, modularity, and maybe even unit/integration testing.
Thanks for your great work!

pditommaso · 2020-01-29T13:43:06Z

What I'd like to be able to do is define multiple workflows and be able to run a subset of the total pipeline using the -entry flag.

That's expected. The idea of sub-workflow is to make them composable and therefore not bound to any result path. You should create a separate script with its own main workflow and include modularised sub-workflow.

pditommaso added the lang/dsl2 label Aug 13, 2019

pditommaso added this to the v19.10.0 milestone Aug 13, 2019

pditommaso added a commit that referenced this issue Aug 13, 2019

Add workflow definition and composition #1276

13f4329

Implements: #1276 Solves: #1267, #1275

This was referenced Aug 13, 2019

Misleading error message is given when a process is piped into multiple consumers #1267

Closed

Multiple use of a workflow under DSL-2 #1275

Closed

pditommaso added a commit that referenced this issue Aug 13, 2019

Fix test #1276

cc03d76

pditommaso mentioned this issue Aug 24, 2019

Process named output channels #1290

Closed

stekaz mentioned this issue Sep 13, 2019

DSL-2 Publish process outputs with identical filenames #1306

Closed

bobamess mentioned this issue Dec 13, 2019

DSL2 issue: multiple use of emit Channel operator in same process definition results in error #1414

Closed

pditommaso closed this as completed Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow definition & composition [RFC] #1276

Workflow definition & composition [RFC] #1276

pditommaso commented Aug 13, 2019 •

edited

mes5k commented Aug 28, 2019

pditommaso commented Aug 28, 2019

haqle314 commented Sep 6, 2019

pditommaso commented Sep 9, 2019 •

edited

haqle314 commented Sep 9, 2019

pditommaso commented Oct 9, 2019

bobamess commented Nov 15, 2019

pditommaso commented Nov 16, 2019

darcyabjones commented Dec 11, 2019

pditommaso commented Jan 29, 2020

Workflow definition & composition [RFC] #1276

Workflow definition & composition [RFC] #1276

Comments

pditommaso commented Aug 13, 2019 • edited

Workflow components