Proposal: Beyond DSL2 #309

bentsherman · 2024-05-01T04:14:12Z

This PR is a showcase of many language improvements we are working on. The changes vary widely from things that can be done today, to things that will be possible in upcoming releases, to things that are still being designed. I wanted to lay out a comprehensive vision for where we're going, even for things potentially far in the future, to help explain how we are thinking about new features right now.

View only the changes proposed for DSL2+: #312

New features / changes:

Use static types, record types: under development (Static types for process inputs/outputs nextflow-io/nextflow#4553). Specify process inputs and outputs as regular variable declarations with any type, including user-defined record types.
- Paths are automatically detected and staged.
- Inputs can have default values and can be passed by name (not shown in this PR).
- Use Optional<T> (or possibly T?) to denote optional output (not shown in this PR).
- Use Path and List<Path> to distinguish between a single file or list of files.
- Use topic: section to send values to topics (e.g. tool versions)
- Record types can be imported from modules like processes

Replace pipe operator | with |> which works not only with channels / processes / workflows but any function call:

// x, y can be any value
// f can be function, process, operator, workflow
x |> f == f(x)

// even a closure!
x |> { x -> f(x, y) } == f(x, y)

Formalize Channel type which is a queue channel. Treat value channels as regular values that can be used without dataflow logic, for example:
```
// convert a list into a queue channel and back into a list
vals = 1..10 |> Channel.fromList |> collect
// `vals` is just a value, so just use it!
println "${vals.size()}"
```
Any operator that currently returns a value channel will just return a regular value, which can be used without e.g. a map operator
Treat process as a regular function. You can call the process in the workflow body with regular values, which is like calling it with all value channels (i.e. it will execute once). Or you can call it in an operator closure, also with regular values.

Calling a process in a map operator is like calling it with a queue channel:
```
Channel.fromPath( "inputs/*.fastq" )
  |> map { fastq -> FASTQC( fastq ) }
```
You can call a process in a reduce operator to do process iteration:
```
Channel.fromPath( "inputs/*.txt" )
  |> reduce { result, file -> ACCUMULATE( result, file ) }
```
This way, you never call a process directly with channels, only with regular values. The way you call a process is exactly the way it looks in the definition (now with static types).

Like before, a process can only be called once in a workflow, unless you use import aliases.

Deprecations:

Use of params outside the top-level workflow -- subworkflows should receive params as explicit inputs
params() and addParams() methods with include statement -- pass params as process / workflow inputs
-entry command-line option -- use params to select different subworkflows from the top-level workflow instead
Object-method syntax for operators e.g. foo.collect() -- operators are just standalone functions, you can do either collect(foo) or foo |> collect
Value channels -- everything that was a value channel in DSL2 will appear to the user as a regular value, even though Nextflow might represent them as value channels "under the hood"
Process when: section -- use conditional logic in the workflow instead
Accessing process outputs via PROCESS_NAME.out -- just assign the return value of the process to a variable
Experimental process recursion -- invoke process in a reduce or scan operator instead
Many operators and some channel factories can be removed, simplified, or replaced with regular functions, e.g. splitCsv operator is equivalent to splitCsv function with flatMap, collectFile can be replaced by a mergeText function which can be used with groupTuple and sort to group and sort entries as before. The operator library can be much smaller and simpler, but also won't be needed as much because of the other improvements around value channels and processes

Extra

Some other improvements which are needed "behind the scenes" to make everything work:

New script parser (Formal grammar and parser nextflow-io/nextflow#4613) will simplify the Nextflow syntax, improve error reporting, form the basis of a language server, and enable custom syntax like |> and record
With static types and the params schema, Nextflow can infer the type of every variable at compile-time instead of run-time, and the language server can use this to display type hints in the IDE. The type of each line is commented in this PR to demonstrate what the hint-on-hover would show.
Similarly, the config parser and Config schema nextflow-io/nextflow#4201 will make the config syntax strict and type-checked, for better error-reporting and IDE tooling (i.e. code completion)
The DAG will be constructed at compile-time instead of run-time, which will allow the DAG to be more comprehensive -- include params and how they connect to processes, include conditional pipeline code (e.g. if-else statements), allow nextflow inspect to list every container that might possibly be used, etc

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman · 2024-05-01T04:28:15Z

workflows/sra/main.nf

+            |> map { meta ->
+                def sample = new Sample( meta, meta.fastq_aspera.tokenize(';').take(2).collect( name -> file(name) ) )
+                ASPERA_CLI ( sample, 'era-fasp', aspera_cli_args )
+            }                                                   // fastq: Channel<Sample>, md5: Channel<Sample>


@mahesh-panchal to your point about dynamic args, I think we can do even better in DSL3:

|> map { meta -> def sample = new Sample( /* ... */ ) ASPERA_CLI ( sample, 'era-fasp', "${meta.key}" ) }

Because we call the process in a map operator explicitly (currently it is implied), we can control how the process is invoked for each task within the operator closure, instead of passing multiple queue channels.

Yes, this specifically is much better. Treating processes like functions is soooo much better since there's no implicit transformation stuff going on with all the singleton/queue channel stuff. It has to be formed and then mapped. And tuples disappear too, except in channels (?).

Actually I think what's worrying me about this syntax is the mixing of input types. An input could be a channel (e.g. MULTIQC_MAPPINGS_CONFIG ( mappings ) lower down ) or it could be an input set (e.g. this dynamically defined Sample). This is already confusing to new comers where we commonly see people trying to use channels inside map, branch, etc.
I guess one could explain the second option as passing dynamically defined singleton channels.

In this proposal, there are no "value" channels, only queue channels and regular values. So the MULTIQC_MAPPINGS_CONFIG ( mappings ) is no different because mappings is just a value. It may be an async value, and Nextflow might represent it as a value channel under the hood, but to the user it should be indistinguishable from a regular value

In other words, you can not call a process with a channel, only values

adamrtalbot · 2024-05-01T15:38:25Z

main.nf

+    SRA (
+        ids,
+        params.ena_metadata_fields ?: '',
+        params.sample_mapping_fields,
+        params.nf_core_pipeline ?: '',
+        params.nf_core_rnaseq_strandedness ?: 'auto',
+        params.download_method,
+        params.skip_fastq_download,
+        params.dbgap_key,
+        params.aspera_cli_args,
+        params.sra_fastq_ftp_args,
+        params.sratools_fasterqdump_args,
+        params.sratools_pigz_args,
+        params.outdir
+    )


My immediate thought it to make a map or SraParams object to handle all these params in a single object. Map is simpler but an object would give you the typing I'd like.

Suggested change

SRA (

ids,

params.ena_metadata_fields ?: '',

params.sample_mapping_fields,

params.nf_core_pipeline ?: '',

params.nf_core_rnaseq_strandedness ?: 'auto',

params.download_method,

params.skip_fastq_download,

params.dbgap_key,

params.aspera_cli_args,

params.sra_fastq_ftp_args,

params.sratools_fasterqdump_args,

params.sratools_pigz_args,

params.outdir

)

mySraParam = SraParams(

params.ena_metadata_fields ?: '',

params.sample_mapping_fields,

params.nf_core_pipeline ?: '',

params.nf_core_rnaseq_strandedness ?: 'auto',

params.download_method,

params.skip_fastq_download,

params.dbgap_key,

params.aspera_cli_args,

params.sra_fastq_ftp_args,

params.sratools_fasterqdump_args,

params.sratools_pigz_args,

params.outdir

)

SRA (

ids,

mySraParams

)

Although I imagine someone will pass the whole dang params object in 🤔 .

You could make a record type 😄

An option I'd considered was to have those params as part of the record that supplied meta and files to stage.

adamrtalbot · 2024-05-01T15:41:31Z

modules/local/aspera_cli/main.nf

+    Sample fastq    = new Sample(meta, path("*fastq.gz"))
+    Sample md5      = new Sample(meta, path("*md5"))


Can we simplify this or is it important to be explicit? Feels like a lot of boilerplate to set some outputs? I think implicit looks a bit nicer and I can't really see the downside?

Suggested change

Sample fastq = new Sample(meta, path("*fastq.gz"))

Sample md5 = new Sample(meta, path("*md5"))

fastq = Sample(meta, path("*fastq.gz"))

md5 = Sample(meta, path("*md5"))

I've considered it. The proposed syntax is the most basic form that matches how variables are declared in general.

I guess you don't really need the output type on the left if it can always be inferred from the right-hand side.

Alternatively, since we always call a process with a single input -> single output with this proposed syntax, instead of channels, we could also just specify each record element as a separate input/output and bundle them into records in the workflow as needed. But that might be unwieldy for records with many elements.

Did some refactoring today. Since process calls are so much more flexible now, I think we can simplify a few things here:

the output type can be omitted, inferred from the right-hand side

if there is only one output, the output name can be omitted because the process will just return that value directly

if there are multiple outputs, the process will return an "implicit" record type, similar to the process .out but with single values instead of channels.

I removed the use of Sample from all processes since it doesn't add much value. Instead I only bundle some things into Samples at the workflow level where it makes sense. Of course for larger record types it might be different.

Hopefully that makes the types less daunting as well. Basically the only place where they need to be explicitly declared is for function/process/workflow inputs.

adamrtalbot · 2024-05-01T15:43:20Z

modules/local/sra_to_samplesheet/main.nf

-    val pipeline
-    val strandedness
-    val mapping_fields
+    List<Map> sra_metadata


Does this mean a sample of 1 no longer has to be explicitly tested if you want to use it? https://github.com/nf-core/modules/blob/5f12fc2128f419a8750c5b0620e4b54d7aa33fec/modules/nf-core/ashlar/main.nf#L27-L29

yes exactly

List<Map> syntax is a bit hard/complex

I agree, it's a little clunky but it's inherited from Groovy directly? [Map] or List(Map) is nicer looking?

It's groovy syntax for generics: https://groovy-lang.org/objectorientation.html#generics

also it's a list of maps so you can't get much simpler than List<Map>

adamrtalbot · 2024-05-01T15:45:03Z

types/types.nf

+record Sample {
+    Map<String,?> meta
+    List<Path> files
+}


adamrtalbot · 2024-05-01T15:47:56Z

workflows/sra/main.nf

+        //
+        // MODULE: Get SRA run information for public database ids
+        //
+        |> map { id ->
+            SRA_IDS_TO_RUNINFO ( id, ena_metadata_fields )
+        }                                                       // Channel<Path>
+        //
+        // MODULE: Parse SRA run information, create file containing FTP links and read into workflow as [ meta, [reads] ]
+        //
+        |> map(SRA_RUNINFO_TO_FTP)                              // Channel<Path>
+        |> set { runinfo_ftp }                                  // Channel<Path>
+        |> flatMap { tsv ->
+            splitCsv(tsv, header:true, sep:'\t')
+        }                                                       // Channel<Map>
+        |> map { meta ->
+            meta + [single_end: meta.single_end.toBoolean()]
+        }                                                       // Channel<Map>
+        |> unique                                               // Channel<Map>
+        |> set { sra_metadata }                                 // Channel<Map>
+


This actually might have the biggest impact. Piping has never been more popular, especially with biologists because of R. Being able to write the pipeline in this functional way might help with express the mental model of channels. Sorry, too much language there but I like this a lot.

Indeed. A lot of moving pieces have to come together to make this work. Does it make sense to you how I am calling the process like a function in the map operator?

It took a second to process, but this is soooo much better. This deals with what I wanted much better than how I initially thought about from the current syntax.

what is |> used for?

ooh, I see, sorry

mahesh-panchal

I really like the pipe replacement |> is easier to see, and provides some visual directionality, helping readability.

Something extra: How about also shifting:

workflow {
    workflow.onComplete {
    }
}

to

workflow {
    onStart:
    ...

    take:
    ...

    main:
    ...
    
    onComplete:
    ...
}

There are some things I really like here, but I have reservations about other stuff like how channels are obfuscated with their channel values, and process outputs

main.nf

mahesh-panchal · 2024-05-02T08:40:11Z

main.nf

+    SRA (
+        ids,
+        params.ena_metadata_fields ?: '',
+        params.sample_mapping_fields,
+        params.nf_core_pipeline ?: '',
+        params.nf_core_rnaseq_strandedness ?: 'auto',
+        params.download_method,
+        params.skip_fastq_download,
+        params.dbgap_key,
+        params.aspera_cli_args,
+        params.sra_fastq_ftp_args,
+        params.sratools_fasterqdump_args,
+        params.sratools_pigz_args,
+        params.outdir
+    )


An option I'd considered was to have those params as part of the record that supplied meta and files to stage.

mahesh-panchal · 2024-05-02T08:41:11Z

main.nf

@@ -86,6 +102,11 @@ workflow {
    )
 }

+publish {
+    directory params.outdir


Can we have the syntax have an = or : here for readability? Or is this a function?

directory(params.outdir)

it is a function call under the hood, so you could use parentheses here. it is the same syntax as process directives.

personally I would rather put these settings in the config file because they seem more like config, but Paolo prefers this form for now

mahesh-panchal · 2024-05-02T09:04:18Z

workflows/sra/main.nf

+        //
+        // MODULE: Get SRA run information for public database ids
+        //
+        |> map { id ->
+            SRA_IDS_TO_RUNINFO ( id, ena_metadata_fields )
+        }                                                       // Channel<Path>
+        //
+        // MODULE: Parse SRA run information, create file containing FTP links and read into workflow as [ meta, [reads] ]
+        //
+        |> map(SRA_RUNINFO_TO_FTP)                              // Channel<Path>
+        |> set { runinfo_ftp }                                  // Channel<Path>
+        |> flatMap { tsv ->
+            splitCsv(tsv, header:true, sep:'\t')
+        }                                                       // Channel<Map>
+        |> map { meta ->
+            meta + [single_end: meta.single_end.toBoolean()]
+        }                                                       // Channel<Map>
+        |> unique                                               // Channel<Map>
+        |> set { sra_metadata }                                 // Channel<Map>
+


It took a second to process, but this is soooo much better. This deals with what I wanted much better than how I initially thought about from the current syntax.

mahesh-panchal · 2024-05-02T11:00:13Z

modules/nf-core/sratools/fasterqdump/main.nf

+    topic:
+    [ task.process, 'sratools', eval("fasterq-dump --version 2>&1 | grep -Eo '[0-9.]+'") ] >> 'versions'
+    [ task.process, 'pigz', eval("pigz --version 2>&1 | sed 's/pigz //g'") ] >> 'versions'


I kind of like this, but I dislike the name topic. I don't feel like the word communicates what it's function is.

It would also be nice if we could supply a regex to validate what should be returned by the eval for some fast fail behavior when there's extra stuff being emitted. Where would one define a global variable pattern? E.g.

def SOFTWARE_VERSION = /\d+.../ def SHASUM = /\w{16}/

Or maybe this should be a class? like you can filter { Number }.

topic is a term from stream processing, used to collect related events from many different sources. in this case we are sending the tool version info to a custom "versions" topic, then the workflow reads from that topic to build the versions yaml file.

eval is just a function defined in the output / topic scope, so you could wrap it in a custom validation function:

def validate( pattern, text ) { // ... } // ... topic: validate( /foo/, eval('...') ) >> 'versions'

workflows/sra/nextflow.config

mahesh-panchal · 2024-05-02T12:28:18Z

subworkflows/local/utils_nfcore_fetchngs_pipeline/main.nf

-        sraCheckENAMetadataFields(ena_metadata_fields)
-    } else {
+    input = file(input)
+    if (!isSraId(input))
        error('Ids provided via --input not recognised please make sure they are either SRA / ENA / GEO / DDBJ ids!')


Can we have a set of functions for reporting errors, tips, warnings, etc to the user without reporting script line number? As in, there should be a distinction between error messages generated for the user, and error messages generated for the developer.

And ideally something that doesn't put something into a channel if the channel is empty.

think there is an ongoing discussion for that here: nextflow-io/nextflow#4937

in any case, it can be done independently of these language improvements

mahesh-panchal · 2024-05-02T12:41:16Z

subworkflows/nf-core/fastq_download_prefetch_fasterqdump_sratools/main.nf

+            //
+            // Prefetch sequencing reads in SRA format.
+            //
+            input = SRATOOLS_PREFETCH ( input, ncbi_settings, dbgap_key )


I don't like this.

|> operator { var -> var = process1( var, ... ) process2( var, ... ) }

This mixing is confusing.
It should be:

|> operator { var -> process1(var, ...) } |> operator { var -> process2(var, ...) }

you can keep them separate if you want. I combined them here mainly to show that it's possible. they are just functions after all, so why not be able to compose them?

mahesh-panchal · 2024-05-02T12:46:15Z

workflows/sra/main.nf

-        .set { ch_mappings }
+    sra_metadata                                            // Channel<Map>
+        |> collect                                          // List<Map>
+        |> { sra_metadata ->


So we don't need map if we can just supply closures? Does map have a purpose then?

the pipe into closure is a shorthand for this:

index_files = SRA_TO_SAMPLESHEET ( sra_metadata |> collect, // a.k.a. collect(sra_metadata) nf_core_pipeline, nf_core_rnaseq_strandedness, sample_mapping_fields )

it's a convenient way to keep the pipeline going when you can't express the step as a curried function call. in this case, I want to supply some extra arguments to SRA_TO_SAMPLESHEET, so I can use the closure to customize that function call instead of breaking up the pipeline.

actually I'd like to be able to curry the process call just like an operator:

sra_metadata |> collect |> SRA_TO_SAMPLESHEET ( nf_core_pipeline, nf_core_rnaseq_strandedness, sample_mapping_fields ) |> set { index_files }

in any case, it's important to understand that the result of sra_metadata |> collect is a value (a list of meta maps), not a value channel. you can't use operators like map on a value here, only on channels. there are no more value channels, only queue channels

OK. So my general confusion is around the point it's either a stream or value, vs it always being a stream.

Does this mean operators like transpose will be redefined, since it'll be problematic to distinguish between a stream vs a value, and what follows |> could be either a Channel operator or Collection function.

in order for |> to work with anything without causing ambiguities, everything needs to be typed. I've gone back and forth on whether to allow operators to accept list inputs and "cast" them to channels, but ultimately I think I would prefer to force the user to be explicit. it's also not that hard:

1..10 |> Channel.of // it's just an extra line |> map { /* ... */ }

I think this makes it perfectly clear what can go into an operator: only channels (i.e. queue channels). Anything else can be converted into a channel beforehand using Channel.of() or Channel.fromList(). So no, we won't need to change operators like transpose, and any operator that currently returns a value channel like collect will just return a regular value.

You can also clearly distinguish between the List::transpose() method and the transpose operator:

[ 1, 2, 3 ].transpose() [ 1, 2, 3 ] |> Channel.of |> transpose

Note that operators can no longer be called using the dot syntax, and you can't use |> to call an object method, only standalone functions.

Note that operators can no longer be called using the dot syntax, and you can't use |> to call an object method, only standalone functions.

This helps with transparency and readability a lot.

Also, I commented the expected type of each line off to the right so that you can tell whether something is a channel or value. Ideally the IDE tooling will be able to show these type hints in the editor

My issue with understanding the comments was that my mental model was still incorrect at the time of reading it, so it just caused confusion. It makes much more sense now my mental model is corrected.

mahesh-panchal · 2024-05-02T12:54:58Z

workflows/sra/main.nf

+            |> map { meta ->
+                new Tuple2<Map,String>( meta, meta.run_accession )
+            }                                                   // Channel<Tuple2<Map,String>>
+            |> FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS (
+                dbgap_key ? file(dbgap_key, checkIfExists: true) : [],
+                sratools_fasterqdump_args,
+                sratools_pigz_args )                            // Channel<Sample>


So how does this compose inputs? Anything that's piped in is taken as the first channel, otherwise we need to use map?

This would otherwise be:

|> map { meta -> def tuple2 = new Tuple2<Map,String>( meta, meta.run_accession ) FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS ( tuple2, dbgap_key ? file(dbgap_key, checkIfExists: true) : [], sratools_fasterqdump_args, sratools_pigz_args ) }

I'm not sure I'm liking the flexibility in the new syntax. This makes readability harder in my opinion.

yes, the pipe source becomes the first argument in the call. this is already how it works for operators, so just extending it to functions / processes / workflows in general

but you can't call a workflow in an operator because a workflow itself contains dataflow logic. a process is more like a regular function, which is why it can be called anywhere within a workflow.

I suspect that this syntax is easier to understand for someone new to Nextflow, but possibly harder to someone used to DSL2. People have learned a lot of things in order to cope with the complexity of dataflow logic, which will need to be unlearned

this example is actually the inverse of the other one, so it can also be written as:

sra_metadata // ... |> { sra_metadata -> FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS ( sra_metadata, dbgap_key ? file(dbgap_key, checkIfExists: true) : [], sratools_fasterqdump_args, sratools_pigz_args ) }

but you can't call a workflow in an operator because a workflow itself contains dataflow logic. a process is more like a regular function, which is why it can be called anywhere within a workflow.

Is there a reason this has to be the case? At the moment workflows seem just to be channel zip-ties, and I think many would really like them to be more like functions

That's how I used to feel as well, but it doesn't really make sense in general. Calling a workflow in a map operator implies that you want the workflow to independently process each value from the input channel, like a process. But what if the workflow has dataflow logic like groupTuple and reduce? Then it needs to operate on the channel as a whole, not just the individual values.

Now there is a special case, which is a workflow that only calls processes and certain operators like map and filter, for example:

workflow FOO { take: input main: input |> map(PROC1) |> map(PROC2) |> set { out } emit: out }

This workflow could in theory be called within an operator because it never needs the entire channel, each value is processed independently. But in that case, it could just be an operator closure!

workflow { // closure equivalent to workflow FOO input |> map { val -> PROC2(PROC1(val)) // can use pipes here btw // val |> PROC1 |> PROC2 } }

But what if the workflow has dataflow logic like groupTuple and reduce ? Then it needs to operate on the channel as a whole, not just the individual values.

I'm not convinced of this. If I knew that the subworkflow acted more like a function, then I would expect these operators only to work on the subset I passed as input and not everything. I have to admit though I don't have much experience with scatter/gather implementations so outside of the naive implementation I haven't thought about it a lot.

Thinking about this more, I guess you could treat workflows like functions and execute each workflow on an independent set of inputs. For example you could have a channel of channels, map it with a workflow, then each workflow invocation operates on one channel. This is actually something that has been requested before.

But I suspect it would add a lot of complexity without much benefit over what is already possible. I'll have to think on it though, maybe it'll become clearer after the first round of development

samuell · 2024-05-02T13:52:31Z

FWIW, I have a tiny feedback, that has came up after the previous discussion (which I wasn't aware of):

The fair keyword, describes what is to my knowledge very often called "FIFO" (First-In, First-Out) in other contexts, and might have been a clearer name? (That said, perhaps not worth the change...)

bentsherman · 2024-05-02T14:44:06Z

@samuell I would say, just submit an issue for that, it is more of an API change than a syntax change

samuell · 2024-05-03T08:48:44Z

Reading through the suggestion in more detail now, I'm a little concerned about this one:

The DAG will be constructed at compile-time instead of run-time, which will allow the DAG to be more comprehensive -- include params and how they connect to processes, include conditional pipeline code (e.g. if-else statements), allow nextflow inspect to list every container that might possibly be used, etc

In my experience, there are some use cases that require run-time generated DAGs, for example when initiating pipeline structure based on values extracted as part of the workflow.

This is common e.g. in machine learning, where you might run hyper-parameter tuning, which generates values which are send to initialize downstream processes, but that might potentially also influence how the DAG is generated downstream.

I've been writing about it before: https://bionics.it/posts/dynamic-workflow-scheduling

Not sure how well this applies here, but want to raise the flag about it, since it is a real limitation we have been running into with other pipeline systems (Luigi).

EDIT: Actually, I guess since we are almost definitely talking about the DAG of processes and not the DAG of tasks, a compile-time DAG would still not rule out all of dynamic scheduling (Since the dataflow paradigm of Nextflow does dynamic task scheduling inherently). Still, it seems some cases of dynamic scheduling might be affected; those that require the process DAG structure to be defined based on outcomes of previous computations.

maxulysse · 2024-05-03T12:18:27Z

modules/local/aspera_cli/main.nf

+    Sample md5      = new Sample(meta, path("*md5"))
+
+    topic:
+    [ task.process, 'aspera_cli', eval('ascli --version') ] >> 'versions'


why brackets?

My guess is that this could be any object but a List is easy to process

yes it's just a list literal, could be any expression

Are there reasons for the above over

'versions' << [ task.process, 'aspera_cli', eval('ascli --version') ]

This is more consistent with add/append isn't it?

that could also work

bentsherman · 2024-05-03T14:08:24Z

@samuell yes I'm talking about the process i.e. "abstract" DAG, which Nextflow already constructs before executing the pipeline. But it has to execute the script in order to do this which limits its usefulness.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

mahesh-panchal · 2024-05-20T12:24:23Z

How come there are new keywords (let, fn, etc)? What's the difference to def?

bentsherman · 2024-05-20T13:29:11Z

Just another idea to consider. With a formal grammar, we don't have to adhere so closely to Groovy, we can make whatever syntax we want, as long as it can be translated to Groovy AST. So as a demonstration I have replaced def with more specific keywords: fn for function defintion, let for variable that can't be reassigned, var for variable that can be reassigned (essentially def vs final in Groovy).

Notice I also changed how types are specified: <name>: <tyoe> instead of <type> <name>, which I like personally because it emphasizes the semantic name over the type which is optional.

mahesh-panchal · 2024-05-20T14:02:23Z

Is it going to be problematic if people combine groovy and this grammar? For example in exec: blocks.

bentsherman · 2024-05-20T14:14:03Z

It would apply to all Nextflow code, including exec: blocks

mahesh-panchal · 2024-05-20T14:29:02Z

I understood it would apply to all, but my question was really if it was possible that people could mix grammars and if so what would happen: e.g.

exec:
let some_var = do_stuff
def another_thing = do_other_stuff

bentsherman · 2024-05-20T19:59:31Z

We would either drop def in the next DSL version (a hard cut-off) or support it temporarily with a compiler warning

bentsherman added 7 commits April 26, 2024 19:52

Replace ext/publishDir with params/publish definition

f531c5d

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Update config to comply with strict parser

836ace2

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Use param schemas as source of truth, convert to YAML

25a1fb5

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Use eval output, topic channels to collect tool versions

505806a

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Use static types, record types

5ae1562

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Refactor params as inputs for SRA workflow

b2f563d

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

New dataflow syntax

24b34cf

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman mentioned this pull request May 1, 2024

Refactor ext config as params #308

Closed

bentsherman commented May 1, 2024

View reviewed changes

bentsherman mentioned this pull request May 1, 2024

Best practices for multiple process inputs nf-core/modules#4311

Open

bentsherman changed the title ~~DSL2+ / DSL3 preview~~ DSL2+ / DSL3 proof-of-concept May 1, 2024

bentsherman changed the title ~~DSL2+ / DSL3 proof-of-concept~~ Preview: DSL2+ (and beyond) May 1, 2024

adamrtalbot reviewed May 1, 2024

View reviewed changes

mahesh-panchal reviewed May 2, 2024

View reviewed changes

mahesh-panchal mentioned this pull request May 3, 2024

Provide a special class for pipeline-defined error messages nextflow-io/nextflow#4937

Open

maxulysse reviewed May 3, 2024

View reviewed changes

bentsherman added 3 commits May 18, 2024 14:40

Simplify process inputs/outputs

2771765

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Replace def with fn / let / var

90e4ac1

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

Omit name for single process output

a7bebba

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

bentsherman changed the title ~~Preview: DSL2+ (and beyond)~~ Proposal: Beyond DSL2 May 21, 2024

This was referenced May 21, 2024

Proposal: DSL2+ #312

Open

Static types for process inputs/outputs nextflow-io/nextflow#4553

Draft

MatthiasZepper mentioned this pull request Jun 5, 2024

External Argument Documentation nf-core/rnaseq#1313

Open

		Sample fastq = new Sample(meta, path("*fastq.gz"))
		Sample md5 = new Sample(meta, path("*md5"))

Proposal: Beyond DSL2 #309

Are you sure you want to change the base?

Proposal: Beyond DSL2 #309

Conversation

bentsherman commented May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamrtalbot May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahesh-panchal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahesh-panchal May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuell commented May 2, 2024 • edited Loading

bentsherman commented May 2, 2024

samuell commented May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahesh-panchal May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bentsherman commented May 3, 2024

mahesh-panchal commented May 20, 2024

bentsherman commented May 20, 2024

mahesh-panchal commented May 20, 2024

bentsherman commented May 20, 2024

mahesh-panchal commented May 20, 2024

bentsherman commented May 20, 2024

bentsherman commented May 1, 2024 •

edited

Loading

adamrtalbot May 1, 2024 •

edited

Loading

mahesh-panchal May 2, 2024 •

edited

Loading

samuell commented May 2, 2024 •

edited

Loading

samuell commented May 3, 2024 •

edited

Loading

mahesh-panchal May 3, 2024 •

edited

Loading