Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module documentation format #1

Closed
drpatelh opened this issue Jul 28, 2019 · 8 comments
Closed

Module documentation format #1

drpatelh opened this issue Jul 28, 2019 · 8 comments
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested

Comments

@drpatelh
Copy link
Member

drpatelh commented Jul 28, 2019

We need to decide how best to be able to document each individual module itself e.g. what is this module doing, keywords for findability, links to homepage per tool used in the process etc. @sven and I came up with a rudimentary version of this but I think we will need more discussion to get this right.

/*
* Description:
*     Run FastQC on sequenced reads
* Keywords:
*     read qc
*     adapter
* Tools:
*     FastQC:
*         homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
*         documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
*         description: FastQC gives general quality metrics about your reads.
*                      It provides information about the quality score distribution
*                      across your reads, the per base sequence content (%A/C/G/T).
*                      You get information about adapter contamination and other
*                      overrepresented sequences.
*/

It would also be good to be able to generate automated docs for the types of objects that are required as input: and output: for each modules, the script: section and any other information that may be useful. @sven suggested we may be able to get this by directly by plugging into NF.

This is all still open for discussion so please chime in if you have some ideas.

@drpatelh drpatelh added documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested labels Jul 28, 2019
@ewels
Copy link
Member

ewels commented Jul 29, 2019

Suggestions: don’t prefix each line with * (no need for comment and makes it harder to write & parse); use valid YAML 😉 - keywords should be prefixed with - to make it an array, description should start with : > to make it multi-line; don’t use capitalisation in keys maybe?

@sven1103
Copy link
Member

sven1103 commented Jul 29, 2019

With some x-talk with @ewels, let's try simple YAML. It is no effort at all to parse in most languages, and with a regex like \/\*(\*(?!\/)|[^*])*\*\/ everything within a comment block can be fetched:

/*
My process description.
*/

Everything that does not look like a YAML can be easily ignored (probably a usual code comment).

What information do we want to display? I will start with a list:

  • Description
  • Keywords
  • Tools
  • Input
  • Output
  • Authors

Description

Just a general description about the purpose of the process / function.

Keywords

One or more keywords to be able to group processes by keyword.

Tools

A list of tool objects used in a process. A tool object can contain fields like

  • description
  • url
  • doi

Input

Input is a list of Nextflow input definitions, and follow the format

 <input qualifier> <input name> [from <source channel>] [attributes]

Maybe two fields here: the definition and a description?

Output

Same as input.

Authors

A list of GitHub users contributed to the process.

Example

How would this look like:

/*
description: Simply FASTQC
keywords:
    - Quality Control
    - QC
tools:
    - fastqc:
        description: <description here>
        homepage: https://superhomepage.edu
        doi: <doi here>
input:
    - reads:
        type: file
        description: <description here>
    - sample_id
        type: string
        description: <description here>
output:
    - report:
        type: file
        schema: *_fastqc.{zip,html}
authors:
    - @sven1103
    - @drharshil
*/
process fastqc {
    tag "$sample_id"
    publishDir "${params.outdir}/fastqc", mode: 'copy',
        saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"}

    input:
    set val(sample_id), file(reads)

    output:
    file "*_fastqc.{zip,html}"

    script:
    """
    fastqc -q $reads
    fastqc --version &> fastqc.version.txt
    """
}

This is just an example, we can work out the details. But seeing the code makes it easier to communicate what we are talking about :D

@ewels
Copy link
Member

ewels commented Jul 30, 2019

Everything that does not look like a YAML can be easily ignored (probably a usual code comment).

I think we should try to parse everything inside the comment block as YAML. Guessing which bits are YAML and which bits are comment is a bit of a faff (there can always be yaml comments!).

Otherwise, I think this all looks great! Only thing I notice is that the inputs should be a list of a list, as there can be multiple input channels, each of which can have multiple definitions. So more like:

input:
  - - reads:
      type: file
      description: <description here>
    - sample_id:
      type: string
      description: <description here>

Then you can have, for example:

input:
  - 
    - reads:
      type: file
      description: <description here>
    - sample_id:
      type: string
      description: <description here>
  -
    - index:
      type: file
      description: Second input channel for a reference or whatever

This YAML syntax is a bit confusing to look at, so will definitely need some linting with nice helpful error messages 😉

@sven1103
Copy link
Member

ok, I agree. All-or-nothing parsing :) But people could still have usual comment blocks, and we should not restrict them from doing so.

So I suggest to let the linting throw warnings, if a comment block cannot be parsed as YAML.

@ewels
Copy link
Member

ewels commented Dec 5, 2019

Discussing at the hackathon - suggestion is that we should have this meta information as a separate file so that it is easier to parse by other tools (including nextflow itself). If it's in a comment then it will be very difficult to get in to nextflow.

We could copy bioconda and have a meta.yml for each module.

Note that we need things to be organised in directories for this. But we should probably have that anyway.

@ewels
Copy link
Member

ewels commented Dec 6, 2019

Addressed in #9

@grst
Copy link
Member

grst commented Feb 12, 2020

In the context of the discussion in #8, I was wondering if the meta.yml could become a valid conda build recipe.

Name, description etc. are standard fields in a recipe already, and the rest could go into the extra section. (https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#extra-section)

@ggabernet ggabernet added this to Modules & DSL2 in hackathon-tasks Jul 12, 2020
@ewels
Copy link
Member

ewels commented Jul 16, 2020

Discussion at another hackathon - general consensus was that the current system of using separate YAML files is probably best. I think that we can close this issue now.

@ewels ewels closed this as completed Jul 16, 2020
hackathon-tasks automation moved this from Modules & DSL2 to Done Jul 16, 2020
abhi18av added a commit that referenced this issue Sep 14, 2021
* adding plink module using nf-core tool [ci skip]

* Restructures the project for plink/vcf (#1)

* Add version string for plink
* Create a plink/vcf module

* small tweaks on main.nf and started to test [ci skip]

* small changes on test args, local test with docker passed!

* Update plink/vcf module listing

* Update tag

* fix tags as per linting guidelines

* revert to the original state of tags

* adding --threads to `main.nf` and `meta.yml` information

Co-authored-by: Abhinav Sharma <abhi18av@users.noreply.github.com>
maxulysse pushed a commit to maxulysse/nf-core_modules that referenced this issue Nov 25, 2021
muffato pushed a commit that referenced this issue Dec 7, 2022
* Adding module for miniprot/align. Closes #1

* Adding module for miniprot/align. Closes #1

* Adding module for <software/tool>. Closes #<issue_number>.

* removed gtf flag from main.nf and meta.yml

* removed gtf flag from main.nf and meta.yml

* incorporate comments

* incorporate comments
muffato added a commit that referenced this issue Dec 8, 2022
* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* update the wrong file name

* put back the test data path

* change index file name

Co-authored-by: Guoying Qi <gq2@sanger.ac.uk>

* Adding module for miniprot/align (#31)

* Adding module for miniprot/align. Closes #1

* Adding module for miniprot/align. Closes #1

* Adding module for <software/tool>. Closes #<issue_number>.

* removed gtf flag from main.nf and meta.yml

* removed gtf flag from main.nf and meta.yml

* incorporate comments

* incorporate comments

* fixed a bug, swapped the order of reference and protein (#32)

* Fixed the paths for the new modules structure

* Switched to the nf-core test data and the biocontainer

* This output is actually named "index"

* linting

* Fixed the tool name

* Added a meta map to the reference index too, as per the latest nf-core usage

* Added another keyword

Co-authored-by: YSims <yy5@sanger.ac.uk>
Co-authored-by: Guoying Qi <gq2@sanger.ac.uk>
Co-authored-by: Matthias De Smet <11850640+matthdsm@users.noreply.github.com>
muffato pushed a commit that referenced this issue Jul 3, 2023
* Adding module for miniprot_index. Closes #1.

* Adding module for miniprot_index. Closes #1.

* Adding module for miniprot_index. Closes #1.

* Adding module for miniprot_index. Closes #1.

* Adding module for miniprot_index. Closes #1.

* Adding module for miniprot_index. Closes #1.

* update the wrong file name

* put back the test data path

* change index file name

Co-authored-by: Guoying Qi <gq2@sanger.ac.uk>
muffato pushed a commit that referenced this issue Jul 3, 2023
* Adding module for miniprot/align. Closes #1

* Adding module for miniprot/align. Closes #1

* Adding module for <software/tool>. Closes #<issue_number>.

* removed gtf flag from main.nf and meta.yml

* removed gtf flag from main.nf and meta.yml

* incorporate comments

* incorporate comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed question Further information is requested
Projects
No open projects
Development

No branches or pull requests

4 participants