Module documentation format #1

drpatelh · 2019-07-28T07:15:29Z

We need to decide how best to be able to document each individual module itself e.g. what is this module doing, keywords for findability, links to homepage per tool used in the process etc. @sven and I came up with a rudimentary version of this but I think we will need more discussion to get this right.

/*
* Description:
*     Run FastQC on sequenced reads
* Keywords:
*     read qc
*     adapter
* Tools:
*     FastQC:
*         homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
*         documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
*         description: FastQC gives general quality metrics about your reads.
*                      It provides information about the quality score distribution
*                      across your reads, the per base sequence content (%A/C/G/T).
*                      You get information about adapter contamination and other
*                      overrepresented sequences.
*/

It would also be good to be able to generate automated docs for the types of objects that are required as input: and output: for each modules, the script: section and any other information that may be useful. @sven suggested we may be able to get this by directly by plugging into NF.

This is all still open for discussion so please chime in if you have some ideas.

The text was updated successfully, but these errors were encountered:

ewels · 2019-07-29T16:57:26Z

Suggestions: don’t prefix each line with * (no need for comment and makes it harder to write & parse); use valid YAML 😉 - keywords should be prefixed with - to make it an array, description should start with : > to make it multi-line; don’t use capitalisation in keys maybe?

sven1103 · 2019-07-29T17:04:15Z

With some x-talk with @ewels, let's try simple YAML. It is no effort at all to parse in most languages, and with a regex like \/\*(\*(?!\/)|[^*])*\*\/ everything within a comment block can be fetched:

/*
My process description.
*/

Everything that does not look like a YAML can be easily ignored (probably a usual code comment).

What information do we want to display? I will start with a list:

Description
Keywords
Tools
Input
Output
Authors

Description

Just a general description about the purpose of the process / function.

Keywords

One or more keywords to be able to group processes by keyword.

Tools

A list of tool objects used in a process. A tool object can contain fields like

description
url
doi

Input

Input is a list of Nextflow input definitions, and follow the format

 <input qualifier> <input name> [from <source channel>] [attributes]

Maybe two fields here: the definition and a description?

Output

Same as input.

Authors

A list of GitHub users contributed to the process.

Example

How would this look like:

/*
description: Simply FASTQC
keywords:
    - Quality Control
    - QC
tools:
    - fastqc:
        description: <description here>
        homepage: https://superhomepage.edu
        doi: <doi here>
input:
    - reads:
        type: file
        description: <description here>
    - sample_id
        type: string
        description: <description here>
output:
    - report:
        type: file
        schema: *_fastqc.{zip,html}
authors:
    - @sven1103
    - @drharshil
*/
process fastqc {
    tag "$sample_id"
    publishDir "${params.outdir}/fastqc", mode: 'copy',
        saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"}

    input:
    set val(sample_id), file(reads)

    output:
    file "*_fastqc.{zip,html}"

    script:
    """
    fastqc -q $reads
    fastqc --version &> fastqc.version.txt
    """
}

This is just an example, we can work out the details. But seeing the code makes it easier to communicate what we are talking about :D

ewels · 2019-07-30T13:22:39Z

Everything that does not look like a YAML can be easily ignored (probably a usual code comment).

I think we should try to parse everything inside the comment block as YAML. Guessing which bits are YAML and which bits are comment is a bit of a faff (there can always be yaml comments!).

Otherwise, I think this all looks great! Only thing I notice is that the inputs should be a list of a list, as there can be multiple input channels, each of which can have multiple definitions. So more like:

input:
  - - reads:
      type: file
      description: <description here>
    - sample_id:
      type: string
      description: <description here>

Then you can have, for example:

input:
  - 
    - reads:
      type: file
      description: <description here>
    - sample_id:
      type: string
      description: <description here>
  -
    - index:
      type: file
      description: Second input channel for a reference or whatever

This YAML syntax is a bit confusing to look at, so will definitely need some linting with nice helpful error messages 😉

sven1103 · 2019-08-19T14:40:29Z

ok, I agree. All-or-nothing parsing :) But people could still have usual comment blocks, and we should not restrict them from doing so.

So I suggest to let the linting throw warnings, if a comment block cannot be parsed as YAML.

ewels · 2019-12-05T09:14:20Z

Discussing at the hackathon - suggestion is that we should have this meta information as a separate file so that it is easier to parse by other tools (including nextflow itself). If it's in a comment then it will be very difficult to get in to nextflow.

We could copy bioconda and have a meta.yml for each module.

Note that we need things to be organised in directories for this. But we should probably have that anyway.

ewels · 2019-12-06T07:20:43Z

Addressed in #9

grst · 2020-02-12T07:49:04Z

In the context of the discussion in #8, I was wondering if the meta.yml could become a valid conda build recipe.

Name, description etc. are standard fields in a recipe already, and the rest could go into the extra section. (https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#extra-section)

ewels · 2020-07-16T12:00:45Z

Discussion at another hackathon - general consensus was that the current system of using separate YAML files is probably best. I think that we can close this issue now.

* adding plink module using nf-core tool [ci skip] * Restructures the project for plink/vcf (#1) * Add version string for plink * Create a plink/vcf module * small tweaks on main.nf and started to test [ci skip] * small changes on test args, local test with docker passed! * Update plink/vcf module listing * Update tag * fix tags as per linting guidelines * revert to the original state of tags * adding --threads to `main.nf` and `meta.yml` information Co-authored-by: Abhinav Sharma <abhi18av@users.noreply.github.com>

Update broken modules

* Adding module for miniprot/align. Closes #1 * Adding module for miniprot/align. Closes #1 * Adding module for <software/tool>. Closes #<issue_number>. * removed gtf flag from main.nf and meta.yml * removed gtf flag from main.nf and meta.yml * incorporate comments * incorporate comments

* Adding module for miniprot_index. * Adding module for miniprot_index. * Adding module for miniprot_index. * Adding module for miniprot_index. * Adding module for miniprot_index. * Adding module for miniprot_index. * Adding module for miniprot_index. * update the wrong file name * put back the test data path * change index file name Co-authored-by: Guoying Qi <gq2@sanger.ac.uk> * Adding module for miniprot/align (#31) * Adding module for miniprot/align. Closes #1 * Adding module for miniprot/align. Closes #1 * Adding module for <software/tool>. Closes #<issue_number>. * removed gtf flag from main.nf and meta.yml * removed gtf flag from main.nf and meta.yml * incorporate comments * incorporate comments * fixed a bug, swapped the order of reference and protein (#32) * Fixed the paths for the new modules structure * Switched to the nf-core test data and the biocontainer * This output is actually named "index" * linting * Fixed the tool name * Added a meta map to the reference index too, as per the latest nf-core usage * Added another keyword Co-authored-by: YSims <yy5@sanger.ac.uk> Co-authored-by: Guoying Qi <gq2@sanger.ac.uk> Co-authored-by: Matthias De Smet <11850640+matthdsm@users.noreply.github.com>

* Adding module for miniprot_index. Closes #1. * Adding module for miniprot_index. Closes #1. * Adding module for miniprot_index. Closes #1. * Adding module for miniprot_index. Closes #1. * Adding module for miniprot_index. Closes #1. * Adding module for miniprot_index. Closes #1. * update the wrong file name * put back the test data path * change index file name Co-authored-by: Guoying Qi <gq2@sanger.ac.uk>

* Adding module for miniprot/align. Closes #1 * Adding module for miniprot/align. Closes #1 * Adding module for <software/tool>. Closes #<issue_number>. * removed gtf flag from main.nf and meta.yml * removed gtf flag from main.nf and meta.yml * incorporate comments * incorporate comments

drpatelh added documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested labels Jul 28, 2019

sven1103 mentioned this issue Jul 29, 2019

Nextflow process documentation nextflow-io/nextflow#1250

Closed

grst mentioned this issue Feb 12, 2020

Handle module / process imports #8

Closed

ggabernet added this to Modules & DSL2 in hackathon-tasks Jul 12, 2020

ewels closed this as completed Jul 16, 2020

hackathon-tasks automation moved this from Modules & DSL2 to Done Jul 16, 2020

maxulysse pushed a commit to maxulysse/nf-core_modules that referenced this issue Nov 25, 2021

Merge pull request nf-core#1 from grst/update

f651901

Update broken modules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module documentation format #1

Module documentation format #1

drpatelh commented Jul 28, 2019 •

edited by ewels

ewels commented Jul 29, 2019

sven1103 commented Jul 29, 2019 •

edited

ewels commented Jul 30, 2019

sven1103 commented Aug 19, 2019

ewels commented Dec 5, 2019

ewels commented Dec 6, 2019

grst commented Feb 12, 2020

ewels commented Jul 16, 2020

Module documentation format #1

Module documentation format #1

Comments

drpatelh commented Jul 28, 2019 • edited by ewels

ewels commented Jul 29, 2019

sven1103 commented Jul 29, 2019 • edited

Description

Keywords

Tools

Input

Output

Authors

Example

ewels commented Jul 30, 2019

sven1103 commented Aug 19, 2019

ewels commented Dec 5, 2019

ewels commented Dec 6, 2019

grst commented Feb 12, 2020

ewels commented Jul 16, 2020

drpatelh commented Jul 28, 2019 •

edited by ewels

sven1103 commented Jul 29, 2019 •

edited