nf-core · drpatelh · Jul 12, 2022 · Jul 7, 2022 · Jul 7, 2022 · Jul 7, 2022
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -4,3 +4,7 @@ lint:
     - assets/email_template.html
     - assets/email_template.txt
     - lib/NfcoreTemplate.groovy
+  files_exist:
+    - assets/multiqc_config.yml
+    - conf/igenomes.config
+    - lib/WorkflowViralrecon.groovy
diff --git a/.prettierignore b/.prettierignore
@@ -6,4 +6,4 @@ results/
 .DS_Store
 testing/
 testing*
-*.pyc
+*.pyc
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,14 +3,43 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unpublished Version / DEV]
+## [[2.5](https://github.com/nf-core/viralrecon/releases/tag/2.5)] - 2022-07-13
 
 ### Enhancements & fixes
 
+- Default Nextclade dataset shipped with the pipeline has been bumped from `2022-01-18T12:00:00Z` -> `2022-06-14T12:00:00Z`
+- [[#234](https://github.com/nf-core/viralrecon/issues/234)] - Remove replacement of dashes in sample name with underscores
+- [[#292](https://github.com/nf-core/viralrecon/issues/292)] - Filter empty FastQ files after adapter trimming
+- [[#303](https://github.com/nf-core/viralrecon/pull/303)] - New pangolin dbs (4.0.x) not assigning lineages to Sars-CoV-2 samples in MultiQC report correctly
+- [[#304](https://github.com/nf-core/viralrecon/pull/304)] - Re-factor code of `ivar_variants_to_vcf` script
+- [[#306](https://github.com/nf-core/viralrecon/issues/306)] - Add contig field information in vcf header in ivar_variants_to_vcf and use bcftools sort
+- [[#311](https://github.com/nf-core/viralrecon/issues/311)] - Invalid declaration val medaka_model_string
 - [[nf-core/rnaseq#764](https://github.com/nf-core/rnaseq/issues/764)] - Test fails when using GCP due to missing tools in the basic biocontainer
-- Updated pipeline template to [nf-core/tools 2.3.2](https://github.com/nf-core/tools/releases/tag/2.3.2)
-- [[#304](https://github.com/nf-core/viralrecon/pull/304)] Re-factor code of `ivar_variants_to_vcf` script.
-- [[#308](https://github.com/nf-core/viralrecon/pull/304)] Added contig tag to vcf in `ivar_variants_to_vcf` script and bcftools sort module for vcf sorting.
+- Updated pipeline template to [nf-core/tools 2.4.1](https://github.com/nf-core/tools/releases/tag/2.4.1)
+
+### Software dependencies
+
+Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
+
+| Dependency  | Old version | New version |
+| ----------- | ----------- | ----------- |
+| `artic`     | 1.2.1       | 1.2.2       |
+| `bcftools`  | 1.14        | 1.15.1      |
+| `multiqc`   | 1.11        | 1.13a       |
+| `nanoplot`  | 1.39.0      | 1.40.0      |
+| `nextclade` | 1.10.2      | 2.2.0       |
+| `pangolin`  | 3.1.20      | 4.1.1       |
+| `picard`    | 2.26.10     | 2.27.4      |
+| `quast`     | 5.0.2       | 5.2.0       |
+| `samtools`  | 1.14        | 1.15.1      |
+| `spades`    | 3.15.3      | 3.15.4      |
+| `vcflib`    | 1.0.2       | 1.0.3       |
+
+> **NB:** Dependency has been **updated** if both old and new version information is present.
+>
+> **NB:** Dependency has been **added** if just the new version information is present.
+>
+> **NB:** Dependency has been **removed** if new version information isn't present.
 
 ### Parameters
 

diff --git a/assets/multiqc_config_illumina.yml b/assets/multiqc_config_illumina.yml
@@ -283,6 +283,7 @@ extra_fn_clean_exts:
   - ".markduplicates"
   - ".unclassified"
   - "_MN908947.3"
+  - " MN908947.3"
 
 extra_fn_clean_trim:
   - "Consensus_"

diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py
@@ -99,11 +99,6 @@ def check_illumina_samplesheet(file_in, file_out):
                     f"WARNING: Spaces have been replaced by underscores for sample: {sample}"
                 )
                 sample = sample.replace(" ", "_")
-            if sample.find("-") != -1:
-                print(
-                    f"WARNING: Dashes have been replaced by underscores for sample: {sample}"
-                )
-                sample = sample.replace("-", "_")
             if not sample:
                 print_error("Sample entry has not been specified!", "Line", line)
 

diff --git a/bin/ivar_variants_to_vcf.py b/bin/ivar_variants_to_vcf.py
@@ -569,8 +569,8 @@ def main(args=None):
     ##  variant counts to pass to MultiQC      ##
     #############################################
     var_count_list = [(k, str(v)) for k, v in sorted(var_count_dict.items())]
-    ("\t".join(["sample"] + [x[0] for x in var_count_list]))
-    ("\t".join([filename] + [x[1] for x in var_count_list]))
+    print("\t".join(["sample"] + [x[0] for x in var_count_list]))
+    print("\t".join([filename] + [x[1] for x in var_count_list]))
 
 
 if __name__ == "__main__":

diff --git a/bin/multiqc_to_custom_csv.py b/bin/multiqc_to_custom_csv.py
@@ -239,7 +239,7 @@ def main(args=None):
             "multiqc_pangolin.yaml",
             [("Pangolin lineage", ["lineage"])],
         ),
-        ("multiqc_nextclade_clade.yaml", [("Nextclade clade", ["clade"])]),
+        ("multiqc_nextclade_clade-plot.yaml", [("Nextclade clade", ["clade"])]),
     ]
 
     illumina_assembly_files = [
@@ -308,7 +308,7 @@ def main(args=None):
         ("multiqc_snpeff.yaml", [("# Missense variants", ["MISSENSE"])]),
         ("multiqc_quast.yaml", [("# Ns per 100kb consensus", ["# N's per 100 kbp"])]),
         ("multiqc_pangolin.yaml", [("Pangolin lineage", ["lineage"])]),
-        ("multiqc_nextclade_clade.yaml", [("Nextclade clade", ["clade"])]),
+        ("multiqc_nextclade_clade-plot.yaml", [("Nextclade clade", ["clade"])]),
     ]
 
     if args.PLATFORM == "illumina":

diff --git a/conf/modules_illumina.config b/conf/modules_illumina.config
@@ -122,7 +122,7 @@ if (!params.skip_kraken2) {
             publishDir = [
                 path: { "${params.outdir}/kraken2" },
                 mode: params.publish_dir_mode,
-                pattern: "*.txt"
+                pattern: "*report.txt"
             ]
         }
     }
@@ -146,7 +146,7 @@ if (!params.skip_variants) {
 
         withName: 'BOWTIE2_ALIGN' {
             ext.args = '--local --very-sensitive-local --seed 1'
-            ext.args2 = '-F4'
+            ext.args2 = '-F4 -bhS'
             publishDir = [
                 [
                     path: { "${params.outdir}/variants/bowtie2/log" },
@@ -180,6 +180,7 @@ if (!params.skip_variants) {
         }
 
         withName: '.*:.*:ALIGN_BOWTIE2:.*:BAM_STATS_SAMTOOLS:.*' {
+            ext.prefix = { "${meta.id}.sorted.bam" }
             publishDir = [
                 path: { "${params.outdir}/variants/bowtie2/samtools_stats" },
                 mode: params.publish_dir_mode,
@@ -244,6 +245,7 @@ if (!params.skip_variants) {
             }
 
             withName: '.*:.*:PRIMER_TRIM_IVAR:.*:BAM_STATS_SAMTOOLS:.*' {
+                ext.prefix = { "${meta.id}.ivar_trim.sorted.bam" }
                 publishDir = [
                     path: { "${params.outdir}/variants/bowtie2/samtools_stats" },
                     mode: params.publish_dir_mode,
@@ -257,7 +259,7 @@ if (!params.skip_variants) {
         process {
             withName: 'PICARD_MARKDUPLICATES' {
                 ext.args = [
-                    'ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT TMP_DIR=tmp',
+                    '--ASSUME_SORTED true --VALIDATION_STRINGENCY LENIENT --TMP_DIR tmp',
                     params.filter_duplicates ? 'REMOVE_DUPLICATES=true' : ''
                 ].join(' ').trim()
                 ext.prefix = { "${meta.id}.markduplicates.sorted" }
@@ -276,7 +278,6 @@ if (!params.skip_variants) {
             }
 
             withName: '.*:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX' {
-                ext.prefix = { "${meta.id}.markduplicates.sorted" }
                 publishDir = [
                     path: { "${params.outdir}/variants/bowtie2" },
                     mode: params.publish_dir_mode,
@@ -285,6 +286,7 @@ if (!params.skip_variants) {
             }
 
             withName: '.*:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:.*' {
+                ext.prefix = { "${meta.id}.markduplicates.sorted.bam" }
                 publishDir = [
                     path: { "${params.outdir}/variants/bowtie2/samtools_stats" },
                     mode: params.publish_dir_mode,
@@ -297,7 +299,7 @@ if (!params.skip_variants) {
     if (!params.skip_picard_metrics) {
         process {
             withName: 'PICARD_COLLECTMULTIPLEMETRICS' {
-                ext.args = 'VALIDATION_STRINGENCY=LENIENT TMP_DIR=tmp'
+                ext.args = '--VALIDATION_STRINGENCY LENIENT --TMP_DIR tmp'
                 publishDir = [
                     [
                         path: { "${params.outdir}/variants/bowtie2/picard_metrics" },
@@ -317,7 +319,7 @@ if (!params.skip_variants) {
     if (!params.skip_mosdepth) {
         process {
             withName: 'MOSDEPTH_GENOME' {
-                ext.args = '--fast-mode'
+                ext.args = '--fast-mode --by 200'
                 publishDir = [
                     path: { "${params.outdir}/variants/bowtie2/mosdepth/genome" },
                     mode: params.publish_dir_mode,
@@ -396,7 +398,7 @@ if (!params.skip_variants) {
                 ]
             }
 
-            withName: '.*:.*:VARIANTS_IVAR:.*:.*:TABIX_TABIX' {
+            withName: '.*:.*:VARIANTS_IVAR:.*:TABIX_TABIX' {
                 ext.args = '-p vcf -f'
                 publishDir = [
                     path: { "${params.outdir}/variants/ivar" },
@@ -405,7 +407,7 @@ if (!params.skip_variants) {
                 ]
             }
 
-            withName: '.*:.*:VARIANTS_IVAR:.*:.*:BCFTOOLS_STATS' {
+            withName: '.*:.*:VARIANTS_IVAR:.*:BCFTOOLS_STATS' {
                 publishDir = [
                     path: { "${params.outdir}/variants/ivar/bcftools_stats" },
                     mode: params.publish_dir_mode,
@@ -665,7 +667,7 @@ if (!params.skip_variants) {
                     publishDir = [
                         path: { "${params.outdir}/variants/${variant_caller}/consensus/${params.consensus_caller}/nextclade" },
                         mode: params.publish_dir_mode,
-                        pattern: "*.csv"
+                        saveAs: { filename -> filename.endsWith(".csv") && !filename.endsWith("errors.csv") && !filename.endsWith("insertions.csv") ? filename : null }
                     ]
                 }
 
@@ -1048,7 +1050,10 @@ if (!params.skip_assembly) {
 if (!params.skip_multiqc) {
     process {
         withName: 'MULTIQC' {
-            ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
+            ext.args   = [
+                '-k yaml',
+                params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
+            ].join(' ').trim()
             publishDir = [
                 [
                     path: { "${params.outdir}/multiqc" },

diff --git a/conf/modules_nanopore.config b/conf/modules_nanopore.config
@@ -91,7 +91,6 @@ process {
     }
 
     withName: '.*:.*:.*:SAMTOOLS_INDEX' {
-        ext.prefix = { "${meta.id}.mapped.sorted" }
         publishDir = [
             path: { "${params.outdir}/${params.artic_minion_caller}" },
             mode: params.publish_dir_mode,
@@ -100,7 +99,7 @@ process {
     }
 
     withName: '.*:.*:.*:BAM_STATS_SAMTOOLS:.*' {
-        ext.prefix = { "${meta.id}.mapped.sorted" }
+        ext.prefix = { "${meta.id}.mapped.sorted.bam" }
         publishDir = [
             path: { "${params.outdir}/${params.artic_minion_caller}/samtools_stats" },
             mode: params.publish_dir_mode,
@@ -168,7 +167,7 @@ if (!params.skip_mosdepth) {
         }
 
         withName: 'MOSDEPTH_GENOME' {
-            ext.args = '--fast-mode'
+            ext.args = '--fast-mode --by 200'
             publishDir = [
                 path: { "${params.outdir}/${params.artic_minion_caller}/mosdepth/genome" },
                 mode: params.publish_dir_mode,
@@ -241,7 +240,7 @@ if (!params.skip_nextclade) {
             publishDir = [
                 path: { "${params.outdir}/${params.artic_minion_caller}/nextclade" },
                 mode: params.publish_dir_mode,
-                pattern: "*.csv"
+                saveAs: { filename -> filename.endsWith(".csv") && !filename.endsWith("errors.csv") && !filename.endsWith("insertions.csv") ? filename : null }
             ]
         }
 
@@ -362,7 +361,10 @@ if (!params.skip_asciigenome) {
 if (!params.skip_multiqc) {
     process {
         withName: 'MULTIQC' {
-            ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
+            ext.args   = [
+                '-k yaml',
+                params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
+            ].join(' ').trim()
             publishDir = [
                 path: { "${params.outdir}/multiqc/${params.artic_minion_caller}" },
                 mode: params.publish_dir_mode,

diff --git a/docs/usage.md b/docs/usage.md
@@ -391,7 +391,7 @@ You can use a similar approach to update the version of Nextclade used by the pi
 
 ##### Nextclade datasets
 
-A [`nextclade dataset`](https://docs.nextstrain.org/projects/nextclade/en/latest/user/datasets.html#nextclade-datasets) feature was introduced in [Nextclade CLI v1.3.0](https://github.com/nextstrain/nextclade/releases/tag/1.3.0) that fetches input genome files such as reference sequences and trees from a central dataset repository. We have uploaded Nextclade dataset [v2022-01-18](https://github.com/nextstrain/nextclade_data/releases/tag/2022-01-24--21-27-29--UTC) to [nf-core/test-datasets](https://github.com/nf-core/test-datasets/blob/viralrecon/genome/MN908947.3/nextclade_sars-cov-2_MN908947_2022-01-18T12_00_00Z.tar.gz?raw=true), and for reproducibility, this will be used by default if you specify `--genome 'MN908947.3'` when running the pipeline. However, there are a number of ways you can use a more recent version of the dataset:
+A [`nextclade dataset`](https://docs.nextstrain.org/projects/nextclade/en/latest/user/datasets.html#nextclade-datasets) feature was introduced in [Nextclade CLI v1.3.0](https://github.com/nextstrain/nextclade/releases/tag/1.3.0) that fetches input genome files such as reference sequences and trees from a central dataset repository. We have uploaded Nextclade dataset [v2022-06-14](https://github.com/nextstrain/nextclade_data/releases/tag/2022-06-16--16-03-24--UTC) to [nf-core/test-datasets](https://github.com/nf-core/test-datasets/blob/viralrecon/genome/MN908947.3/nextclade_sars-cov-2_MN908947_2022-06-14T12_00_00Z.tar.gz?raw=true), and for reproducibility, this will be used by default if you specify `--genome 'MN908947.3'` when running the pipeline. However, there are a number of ways you can use a more recent version of the dataset:
 
 - Supply your own by setting: `--nextclade_dataset <PATH_TO_DATASET>`
 - Let the pipeline create and use the latest version by setting: `--nextclade_dataset false --nextclade_dataset_tag false`