Rnaseq module version #162

pditommaso · 2019-03-04T22:56:04Z

This is a draft pull request only for the sake to experiment how improve nf-core pipelines via NF modules.

WORK IN PROGRESS, but IMO this represents a huge step forward.

ewels · 2019-03-05T10:16:23Z

This is brilliant 😀 Much simpler, and will be even better when we get around to refactoring all of that boilerplate code..

A couple of minor thoughts:

I don't really like the .first .second thing very much. Could output just be an array instead? eg:
```
 trim_galore.output[0].set { trimmed_reads }
 trim_galore.output[1].set { trimgalore_results }
 trim_galore.output[2].set { trimgalore_fastqc_reports }
```
- This feels more natural to me and would presumably scale better
Could there also be a second syntax set channels in one line? Most of the time we're setting one output to one channel. eg. something like:
```
 trim_galore.output.setAll { trimmed_reads, trimgalore_results, trimgalore_fastqc_reports }
```
- Could still have some kind of syntax to set one output in to multiple channels? eg:
```
 trim_galore.output.setAll { ch1, [ch2, ch3], ch4 }
```

With this, can we chain the .output on to the process call? eg:

 markDuplicates( ch_bam ).output.setAll { bam_md, picard_results }

This is of call all just syntactic sugar.. The core functionality that you've introduced here is great! Especially once we get rid of the cruft this will make the pipelines super simple. And it will be nice to split the processes in to multiple files for clarity too (eg: common.nf, star.nf, hisat.nf in this example).

Nice work!

drpatelh · 2019-03-05T11:13:24Z

I don't really like the .first .second thing very much. Could output just be an array instead? eg:
 trim_galore.output[0].set { trimmed_reads }
 trim_galore.output[1].set { trimgalore_results }
 trim_galore.output[2].set { trimgalore_fastqc_reports }

Would a map be better for this? Maybe something like?

trim_galore.output['fastq'].set { trimmed_reads }
trim_galore.output['results'].set { trimgalore_results }
trim_galore.output['reports'].set { trimgalore_fastqc_reports }

Easier to read if its at all possible?

pditommaso · 2019-03-10T10:10:28Z

Oops .. I was missing this.

I don't really like the .first .second thing very much. Could output just be an array instead?

You can already, the output object is a list.

Could there also be a second syntax set channels in one line?

That's something I was thinking as well, tho I would prefer not to add to many magic syntax extension. Currently it's also possible to assign an output list using the equals operator, e.g.

(trimmed_reads, trimgalore_results, trimgalore_fastqc_reports) = trim_galore.output

With this, can we chain the .output on to the process call?

Yes.

Would a map be better for this? Maybe something like?

That's an interesting point. Actually I'm starting think to some format of input/output data model definition i.e. the ability to define a custom data structure to be used in place of unnamed tuples. At that point it would be possible.

However I would close this first version using the current approach tuple based.

Also it would be nice to investigate how to remove all that ifs in the nf-core pipelines, that makes very difficult to follow the pipeline logic. In the last commit it's shown how to wrap the conditional channel creation into a custom function. That should be possible also for condition pipeline chunks.

ps. put my github handle in your reply otherwise the mail get buried in the nf-core notifications.

apeltzer · 2019-03-10T11:38:23Z

I agree that removing all of the ifs in pipeline logic would be nice - I tried using when if possible and mix but having to define channels beforehand was the problem until now (?). I think once we can use the same channel for multiple downstream processes, it gets easier to do that 👍

drpatelh · 2019-03-10T20:01:45Z

Would a map be better for this? Maybe something like?

That's an interesting point. Actually I'm starting think to some format of input/output data model definition i.e. the ability to define a custom data structure to be used in place of unnamed tuples. At that point it would be possible.

@pditommaso Sounds good. Ive also been thinking it would be good to include the version command within the module file e.g. for fastqc we could have an additional command:
fastqc --version > v_fastqc.txt

This can then be passed into an channel to be used later in the pipeline for documentation purposes.

At present we use regexes to strip the output of the command to just contain the actual version:
https://github.com/nf-core/chipseq/blob/5f67d82e330c33eb2186c3210043ddbb17d8b5f2/bin/scrape_software_versions.py#L6-L19

Maybe we can implement the regex parsing here too? It just means all of the modules are shipped with version tracking by default, and it only has to be done once!

ewels · 2019-03-11T10:38:36Z

@drpatelh - see an issue along a similar lines that @pditommaso and I discussed a little while back: nextflow-io/nextflow#879

drpatelh · 2020-08-20T12:03:21Z

This is really outdated now compared to the latest stable DSL2 release so will close. Thanks all!

pditommaso added 2 commits March 4, 2019 22:35

wip

acd13fb

Update refactoring

c07df24

ewels added the WIP Work in progress label Mar 5, 2019

Update syntax as required by draft-9

fbd6573

drpatelh mentioned this pull request Apr 14, 2019

Add parameters.settings.json to template and linting nf-core/tools#267

Closed

drpatelh closed this Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rnaseq module version #162

Rnaseq module version #162

pditommaso commented Mar 4, 2019

ewels commented Mar 5, 2019

drpatelh commented Mar 5, 2019 •

edited

Loading

pditommaso commented Mar 10, 2019

apeltzer commented Mar 10, 2019

drpatelh commented Mar 10, 2019

ewels commented Mar 11, 2019

drpatelh commented Aug 20, 2020

Rnaseq module version #162

Rnaseq module version #162

Conversation

pditommaso commented Mar 4, 2019

ewels commented Mar 5, 2019

drpatelh commented Mar 5, 2019 • edited Loading

pditommaso commented Mar 10, 2019

apeltzer commented Mar 10, 2019

drpatelh commented Mar 10, 2019

ewels commented Mar 11, 2019

drpatelh commented Aug 20, 2020

drpatelh commented Mar 5, 2019 •

edited

Loading