Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update modules and fix bugs for v2.5 release #314

Merged
merged 22 commits into from
Jul 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@ lint:
- assets/email_template.html
- assets/email_template.txt
- lib/NfcoreTemplate.groovy
files_exist:
- assets/multiqc_config.yml
- conf/igenomes.config
- lib/WorkflowViralrecon.groovy
2 changes: 1 addition & 1 deletion .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ results/
.DS_Store
testing/
testing*
*.pyc
*.pyc
37 changes: 33 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,43 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unpublished Version / DEV]
## [[2.5](https://github.com/nf-core/viralrecon/releases/tag/2.5)] - 2022-07-13

### Enhancements & fixes

- Default Nextclade dataset shipped with the pipeline has been bumped from `2022-01-18T12:00:00Z` -> `2022-06-14T12:00:00Z`
- [[#234](https://github.com/nf-core/viralrecon/issues/234)] - Remove replacement of dashes in sample name with underscores
- [[#292](https://github.com/nf-core/viralrecon/issues/292)] - Filter empty FastQ files after adapter trimming
- [[#303](https://github.com/nf-core/viralrecon/pull/303)] - New pangolin dbs (4.0.x) not assigning lineages to Sars-CoV-2 samples in MultiQC report correctly
- [[#304](https://github.com/nf-core/viralrecon/pull/304)] - Re-factor code of `ivar_variants_to_vcf` script
- [[#306](https://github.com/nf-core/viralrecon/issues/306)] - Add contig field information in vcf header in ivar_variants_to_vcf and use bcftools sort
- [[#311](https://github.com/nf-core/viralrecon/issues/311)] - Invalid declaration val medaka_model_string
- [[nf-core/rnaseq#764](https://github.com/nf-core/rnaseq/issues/764)] - Test fails when using GCP due to missing tools in the basic biocontainer
- Updated pipeline template to [nf-core/tools 2.3.2](https://github.com/nf-core/tools/releases/tag/2.3.2)
- [[#304](https://github.com/nf-core/viralrecon/pull/304)] Re-factor code of `ivar_variants_to_vcf` script.
- [[#308](https://github.com/nf-core/viralrecon/pull/304)] Added contig tag to vcf in `ivar_variants_to_vcf` script and bcftools sort module for vcf sorting.
- Updated pipeline template to [nf-core/tools 2.4.1](https://github.com/nf-core/tools/releases/tag/2.4.1)

### Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Dependency | Old version | New version |
| ----------- | ----------- | ----------- |
| `artic` | 1.2.1 | 1.2.2 |
| `bcftools` | 1.14 | 1.15.1 |
| `multiqc` | 1.11 | 1.13a |
| `nanoplot` | 1.39.0 | 1.40.0 |
| `nextclade` | 1.10.2 | 2.2.0 |
| `pangolin` | 3.1.20 | 4.1.1 |
| `picard` | 2.26.10 | 2.27.4 |
| `quast` | 5.0.2 | 5.2.0 |
| `samtools` | 1.14 | 1.15.1 |
| `spades` | 3.15.3 | 3.15.4 |
| `vcflib` | 1.0.2 | 1.0.3 |

> **NB:** Dependency has been **updated** if both old and new version information is present.
>
> **NB:** Dependency has been **added** if just the new version information is present.
>
> **NB:** Dependency has been **removed** if new version information isn't present.

### Parameters

Expand Down
1 change: 1 addition & 0 deletions assets/multiqc_config_illumina.yml
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ extra_fn_clean_exts:
- ".markduplicates"
- ".unclassified"
- "_MN908947.3"
- " MN908947.3"

extra_fn_clean_trim:
- "Consensus_"
Expand Down
5 changes: 0 additions & 5 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,6 @@ def check_illumina_samplesheet(file_in, file_out):
f"WARNING: Spaces have been replaced by underscores for sample: {sample}"
)
sample = sample.replace(" ", "_")
if sample.find("-") != -1:
print(
f"WARNING: Dashes have been replaced by underscores for sample: {sample}"
)
sample = sample.replace("-", "_")
if not sample:
print_error("Sample entry has not been specified!", "Line", line)

Expand Down
4 changes: 2 additions & 2 deletions bin/ivar_variants_to_vcf.py
Original file line number Diff line number Diff line change
Expand Up @@ -569,8 +569,8 @@ def main(args=None):
## variant counts to pass to MultiQC ##
#############################################
var_count_list = [(k, str(v)) for k, v in sorted(var_count_dict.items())]
("\t".join(["sample"] + [x[0] for x in var_count_list]))
("\t".join([filename] + [x[1] for x in var_count_list]))
print("\t".join(["sample"] + [x[0] for x in var_count_list]))
print("\t".join([filename] + [x[1] for x in var_count_list]))


if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions bin/multiqc_to_custom_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ def main(args=None):
"multiqc_pangolin.yaml",
[("Pangolin lineage", ["lineage"])],
),
("multiqc_nextclade_clade.yaml", [("Nextclade clade", ["clade"])]),
("multiqc_nextclade_clade-plot.yaml", [("Nextclade clade", ["clade"])]),
]

illumina_assembly_files = [
Expand Down Expand Up @@ -308,7 +308,7 @@ def main(args=None):
("multiqc_snpeff.yaml", [("# Missense variants", ["MISSENSE"])]),
("multiqc_quast.yaml", [("# Ns per 100kb consensus", ["# N's per 100 kbp"])]),
("multiqc_pangolin.yaml", [("Pangolin lineage", ["lineage"])]),
("multiqc_nextclade_clade.yaml", [("Nextclade clade", ["clade"])]),
("multiqc_nextclade_clade-plot.yaml", [("Nextclade clade", ["clade"])]),
]

if args.PLATFORM == "illumina":
Expand Down
25 changes: 15 additions & 10 deletions conf/modules_illumina.config
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ if (!params.skip_kraken2) {
publishDir = [
path: { "${params.outdir}/kraken2" },
mode: params.publish_dir_mode,
pattern: "*.txt"
pattern: "*report.txt"
]
}
}
Expand All @@ -146,7 +146,7 @@ if (!params.skip_variants) {

withName: 'BOWTIE2_ALIGN' {
ext.args = '--local --very-sensitive-local --seed 1'
ext.args2 = '-F4'
ext.args2 = '-F4 -bhS'
publishDir = [
[
path: { "${params.outdir}/variants/bowtie2/log" },
Expand Down Expand Up @@ -180,6 +180,7 @@ if (!params.skip_variants) {
}

withName: '.*:.*:ALIGN_BOWTIE2:.*:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.sorted.bam" }
publishDir = [
path: { "${params.outdir}/variants/bowtie2/samtools_stats" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -244,6 +245,7 @@ if (!params.skip_variants) {
}

withName: '.*:.*:PRIMER_TRIM_IVAR:.*:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.ivar_trim.sorted.bam" }
publishDir = [
path: { "${params.outdir}/variants/bowtie2/samtools_stats" },
mode: params.publish_dir_mode,
Expand All @@ -257,7 +259,7 @@ if (!params.skip_variants) {
process {
withName: 'PICARD_MARKDUPLICATES' {
ext.args = [
'ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT TMP_DIR=tmp',
'--ASSUME_SORTED true --VALIDATION_STRINGENCY LENIENT --TMP_DIR tmp',
params.filter_duplicates ? 'REMOVE_DUPLICATES=true' : ''
].join(' ').trim()
ext.prefix = { "${meta.id}.markduplicates.sorted" }
Expand All @@ -276,7 +278,6 @@ if (!params.skip_variants) {
}

withName: '.*:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX' {
ext.prefix = { "${meta.id}.markduplicates.sorted" }
publishDir = [
path: { "${params.outdir}/variants/bowtie2" },
mode: params.publish_dir_mode,
Expand All @@ -285,6 +286,7 @@ if (!params.skip_variants) {
}

withName: '.*:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.markduplicates.sorted.bam" }
publishDir = [
path: { "${params.outdir}/variants/bowtie2/samtools_stats" },
mode: params.publish_dir_mode,
Expand All @@ -297,7 +299,7 @@ if (!params.skip_variants) {
if (!params.skip_picard_metrics) {
process {
withName: 'PICARD_COLLECTMULTIPLEMETRICS' {
ext.args = 'VALIDATION_STRINGENCY=LENIENT TMP_DIR=tmp'
ext.args = '--VALIDATION_STRINGENCY LENIENT --TMP_DIR tmp'
publishDir = [
[
path: { "${params.outdir}/variants/bowtie2/picard_metrics" },
Expand All @@ -317,7 +319,7 @@ if (!params.skip_variants) {
if (!params.skip_mosdepth) {
process {
withName: 'MOSDEPTH_GENOME' {
ext.args = '--fast-mode'
ext.args = '--fast-mode --by 200'
publishDir = [
path: { "${params.outdir}/variants/bowtie2/mosdepth/genome" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -396,7 +398,7 @@ if (!params.skip_variants) {
]
}

withName: '.*:.*:VARIANTS_IVAR:.*:.*:TABIX_TABIX' {
withName: '.*:.*:VARIANTS_IVAR:.*:TABIX_TABIX' {
ext.args = '-p vcf -f'
publishDir = [
path: { "${params.outdir}/variants/ivar" },
Expand All @@ -405,7 +407,7 @@ if (!params.skip_variants) {
]
}

withName: '.*:.*:VARIANTS_IVAR:.*:.*:BCFTOOLS_STATS' {
withName: '.*:.*:VARIANTS_IVAR:.*:BCFTOOLS_STATS' {
publishDir = [
path: { "${params.outdir}/variants/ivar/bcftools_stats" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -665,7 +667,7 @@ if (!params.skip_variants) {
publishDir = [
path: { "${params.outdir}/variants/${variant_caller}/consensus/${params.consensus_caller}/nextclade" },
mode: params.publish_dir_mode,
pattern: "*.csv"
saveAs: { filename -> filename.endsWith(".csv") && !filename.endsWith("errors.csv") && !filename.endsWith("insertions.csv") ? filename : null }
]
}

Expand Down Expand Up @@ -1048,7 +1050,10 @@ if (!params.skip_assembly) {
if (!params.skip_multiqc) {
process {
withName: 'MULTIQC' {
ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
ext.args = [
'-k yaml',
params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
].join(' ').trim()
publishDir = [
[
path: { "${params.outdir}/multiqc" },
Expand Down
12 changes: 7 additions & 5 deletions conf/modules_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ process {
}

withName: '.*:.*:.*:SAMTOOLS_INDEX' {
ext.prefix = { "${meta.id}.mapped.sorted" }
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}" },
mode: params.publish_dir_mode,
Expand All @@ -100,7 +99,7 @@ process {
}

withName: '.*:.*:.*:BAM_STATS_SAMTOOLS:.*' {
ext.prefix = { "${meta.id}.mapped.sorted" }
ext.prefix = { "${meta.id}.mapped.sorted.bam" }
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/samtools_stats" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -168,7 +167,7 @@ if (!params.skip_mosdepth) {
}

withName: 'MOSDEPTH_GENOME' {
ext.args = '--fast-mode'
ext.args = '--fast-mode --by 200'
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/mosdepth/genome" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -241,7 +240,7 @@ if (!params.skip_nextclade) {
publishDir = [
path: { "${params.outdir}/${params.artic_minion_caller}/nextclade" },
mode: params.publish_dir_mode,
pattern: "*.csv"
saveAs: { filename -> filename.endsWith(".csv") && !filename.endsWith("errors.csv") && !filename.endsWith("insertions.csv") ? filename : null }
]
}

Expand Down Expand Up @@ -362,7 +361,10 @@ if (!params.skip_asciigenome) {
if (!params.skip_multiqc) {
process {
withName: 'MULTIQC' {
ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
ext.args = [
'-k yaml',
params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/multiqc/${params.artic_minion_caller}" },
mode: params.publish_dir_mode,
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,7 @@ You can use a similar approach to update the version of Nextclade used by the pi

##### Nextclade datasets

A [`nextclade dataset`](https://docs.nextstrain.org/projects/nextclade/en/latest/user/datasets.html#nextclade-datasets) feature was introduced in [Nextclade CLI v1.3.0](https://github.com/nextstrain/nextclade/releases/tag/1.3.0) that fetches input genome files such as reference sequences and trees from a central dataset repository. We have uploaded Nextclade dataset [v2022-01-18](https://github.com/nextstrain/nextclade_data/releases/tag/2022-01-24--21-27-29--UTC) to [nf-core/test-datasets](https://github.com/nf-core/test-datasets/blob/viralrecon/genome/MN908947.3/nextclade_sars-cov-2_MN908947_2022-01-18T12_00_00Z.tar.gz?raw=true), and for reproducibility, this will be used by default if you specify `--genome 'MN908947.3'` when running the pipeline. However, there are a number of ways you can use a more recent version of the dataset:
A [`nextclade dataset`](https://docs.nextstrain.org/projects/nextclade/en/latest/user/datasets.html#nextclade-datasets) feature was introduced in [Nextclade CLI v1.3.0](https://github.com/nextstrain/nextclade/releases/tag/1.3.0) that fetches input genome files such as reference sequences and trees from a central dataset repository. We have uploaded Nextclade dataset [v2022-06-14](https://github.com/nextstrain/nextclade_data/releases/tag/2022-06-16--16-03-24--UTC) to [nf-core/test-datasets](https://github.com/nf-core/test-datasets/blob/viralrecon/genome/MN908947.3/nextclade_sars-cov-2_MN908947_2022-06-14T12_00_00Z.tar.gz?raw=true), and for reproducibility, this will be used by default if you specify `--genome 'MN908947.3'` when running the pipeline. However, there are a number of ways you can use a more recent version of the dataset:

- Supply your own by setting: `--nextclade_dataset <PATH_TO_DATASET>`
- Let the pipeline create and use the latest version by setting: `--nextclade_dataset false --nextclade_dataset_tag false`
Expand Down
Loading