title | subtitle |
---|---|
Test-data Specifications |
Specifications for writing nf-core test dataset files |
The key words "MUST", "MUST NOT", "SHOULD", etc. are to be interpreted as described in RFC 2119.
The new test data file within a branch (modules or pipelines) SHOULD NOT replicate existing test-data unless absolutely necessary
- If you need to make a new file that can be generated from an upstream file
- For example, if you need a particular bioinformatic index file for a tool, index an existing FASTA file on the test-datasets branch
Test data SHOULD be as small as possible
- It cannot exceed the GitHub file maximum
- Data should be sub-sampled as aggressively as possible
Test data MUST be publically available and have licenses to allow public reuse
Test data files SHOULD be described on the given branch's README file, describing source, how generated, licenses etc.
Files SHOULD be generally organised based on existing structure, typically (for bioinformatics pipelines) by discipline, organism, platform or format
Downstream or related test-data files SHOULD be named based on the upstream file name
- For example, if you used
genome.fasta
as the upstream file, your output file should be calledgenome.<new_extension>
.
Test data files MUST have an entry in the nf-core/test-datasets repo README