Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rnaseq module version #162

Closed
wants to merge 3 commits into from
Closed

Conversation

pditommaso
Copy link
Contributor

This is a draft pull request only for the sake to experiment how improve nf-core pipelines via NF modules.

WORK IN PROGRESS, but IMO this represents a huge step forward.

@ewels
Copy link
Member

ewels commented Mar 5, 2019

This is brilliant 😀 Much simpler, and will be even better when we get around to refactoring all of that boilerplate code..

A couple of minor thoughts:

  • I don't really like the .first .second thing very much. Could output just be an array instead? eg:
     trim_galore.output[0].set { trimmed_reads }
     trim_galore.output[1].set { trimgalore_results }
     trim_galore.output[2].set { trimgalore_fastqc_reports }
    • This feels more natural to me and would presumably scale better
  • Could there also be a second syntax set channels in one line? Most of the time we're setting one output to one channel. eg. something like:
     trim_galore.output.setAll { trimmed_reads, trimgalore_results, trimgalore_fastqc_reports }
    • Could still have some kind of syntax to set one output in to multiple channels? eg:
     trim_galore.output.setAll { ch1, [ch2, ch3], ch4 }
  • With this, can we chain the .output on to the process call? eg:
     markDuplicates( ch_bam ).output.setAll { bam_md, picard_results }

This is of call all just syntactic sugar.. The core functionality that you've introduced here is great! Especially once we get rid of the cruft this will make the pipelines super simple. And it will be nice to split the processes in to multiple files for clarity too (eg: common.nf, star.nf, hisat.nf in this example).

Nice work!

@ewels ewels added the WIP Work in progress label Mar 5, 2019
@drpatelh
Copy link
Member

drpatelh commented Mar 5, 2019

  • I don't really like the .first .second thing very much. Could output just be an array instead? eg:
     trim_galore.output[0].set { trimmed_reads }
     trim_galore.output[1].set { trimgalore_results }
     trim_galore.output[2].set { trimgalore_fastqc_reports }

Would a map be better for this? Maybe something like?

trim_galore.output['fastq'].set { trimmed_reads }
trim_galore.output['results'].set { trimgalore_results }
trim_galore.output['reports'].set { trimgalore_fastqc_reports }

Easier to read if its at all possible?

@pditommaso
Copy link
Contributor Author

Oops .. I was missing this.

I don't really like the .first .second thing very much. Could output just be an array instead?

You can already, the output object is a list.

Could there also be a second syntax set channels in one line?

That's something I was thinking as well, tho I would prefer not to add to many magic syntax extension. Currently it's also possible to assign an output list using the equals operator, e.g.

(trimmed_reads, trimgalore_results, trimgalore_fastqc_reports) = trim_galore.output

With this, can we chain the .output on to the process call?

Yes.

Would a map be better for this? Maybe something like?

That's an interesting point. Actually I'm starting think to some format of input/output data model definition i.e. the ability to define a custom data structure to be used in place of unnamed tuples. At that point it would be possible.

However I would close this first version using the current approach tuple based.

Also it would be nice to investigate how to remove all that ifs in the nf-core pipelines, that makes very difficult to follow the pipeline logic. In the last commit it's shown how to wrap the conditional channel creation into a custom function. That should be possible also for condition pipeline chunks.

ps. put my github handle in your reply otherwise the mail get buried in the nf-core notifications.

@apeltzer
Copy link
Member

I agree that removing all of the ifs in pipeline logic would be nice - I tried using when if possible and mix but having to define channels beforehand was the problem until now (?). I think once we can use the same channel for multiple downstream processes, it gets easier to do that 👍

@drpatelh
Copy link
Member

Would a map be better for this? Maybe something like?

That's an interesting point. Actually I'm starting think to some format of input/output data model definition i.e. the ability to define a custom data structure to be used in place of unnamed tuples. At that point it would be possible.

@pditommaso Sounds good. Ive also been thinking it would be good to include the version command within the module file e.g. for fastqc we could have an additional command:
fastqc --version > v_fastqc.txt

This can then be passed into an channel to be used later in the pipeline for documentation purposes.

At present we use regexes to strip the output of the command to just contain the actual version:
https://github.com/nf-core/chipseq/blob/5f67d82e330c33eb2186c3210043ddbb17d8b5f2/bin/scrape_software_versions.py#L6-L19

Maybe we can implement the regex parsing here too? It just means all of the modules are shipped with version tracking by default, and it only has to be done once!

@ewels
Copy link
Member

ewels commented Mar 11, 2019

@drpatelh - see an issue along a similar lines that @pditommaso and I discussed a little while back: nextflow-io/nextflow#879

@drpatelh
Copy link
Member

This is really outdated now compared to the latest stable DSL2 release so will close. Thanks all!

@drpatelh drpatelh closed this Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants