Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenating germline vcfs #792

Merged
merged 36 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
1fd779d
WIP. Just concatenating germline-vcfs from strelka and hyplotypecaller
asp8200 Oct 11, 2022
22fe29e
Merge branch 'dev' into concatenating_vcfs
asp8200 Oct 11, 2022
5ee59a8
Adding the germline vcf-file from manta to the list of germline vcf-f…
asp8200 Oct 12, 2022
624b6eb
Making sure the channel manta_vcf_tbi is defined even if manta isnt run
asp8200 Oct 12, 2022
5a5fb17
merge from dev
asp8200 Nov 10, 2022
b89f088
Adding support for concatenation of germline vcf-files. Now also for …
asp8200 Nov 13, 2022
a936722
Adding CLI-open concatenate_vcf to the schema-json
asp8200 Nov 13, 2022
f91d40b
WIP: Adding support for concatenation of germline vcf-files. Now also…
asp8200 Nov 14, 2022
d859e04
WIP: Adding support for concatenation of germline vcf-files. Now also…
asp8200 Nov 14, 2022
54d6c43
Merge branch 'dev' into concatenating_vcfs
asp8200 Nov 14, 2022
ad2b5a7
Merge branch 'dev' into concatenating_vcfs
asp8200 Nov 25, 2022
38ac53d
Adding support for concatenation of vcf from mpileup
asp8200 Nov 28, 2022
dba9993
Changing CLI-option concatenate_vcf to concatenate_vcfs.
asp8200 Nov 28, 2022
f476da7
Merge branch 'dev' into concatenating_vcfs
asp8200 Nov 28, 2022
34baf9a
Initializing CLI-option concatenate_vcfs to false.
asp8200 Nov 28, 2022
d3a4578
Sorting concatenated germline-vcf-file and adding tbi.
asp8200 Nov 28, 2022
9e21631
Updating schema. Grouping the CLI-option concatenate_vcfs together wi…
asp8200 Nov 28, 2022
f8edc00
prettier
asp8200 Nov 28, 2022
e447602
Moving some config to new config-file for post-processing of vcfs
asp8200 Dec 1, 2022
00c5a9d
renaming postprocessing_vcfs.config to post_variant_calling.config
asp8200 Dec 1, 2022
70b1027
Adding INFO-field SOURCE=<input-vcf> to germline-vcf-files before con…
asp8200 Dec 1, 2022
c999a8f
cleaner
asp8200 Dec 1, 2022
24ad87a
Fixed typo in INFO-field SOURCE in concatenated germline-vcf
asp8200 Dec 1, 2022
04da3de
Temporary and fixed copy of mapped_joint_bam.csv in which sample-id a…
asp8200 Dec 5, 2022
8257243
WIP: Adding test of the concatenation of germline-vcfs
asp8200 Dec 5, 2022
498db83
Trying to add new tests
asp8200 Dec 5, 2022
27826a8
Trying to get new test running
asp8200 Dec 5, 2022
bd8f2be
Avoiding publishing files from GERMLINE_VCFS_CONCAT
asp8200 Dec 5, 2022
f910c82
Skip CI-test concatenate_vcfs in conda test-env
asp8200 Dec 6, 2022
ea9d925
prettier
asp8200 Dec 6, 2022
812f6d0
Adding synonym for module BCFTOOLS_CONCAT in order to disable publish…
asp8200 Dec 6, 2022
c733593
Updating changelog
asp8200 Dec 6, 2022
439246d
Moving config from modules.config to post_variant_calling.config
asp8200 Dec 6, 2022
4d15e40
fixing comment
asp8200 Dec 6, 2022
07fb548
Remove code to pass back tbi-files to sarek.nf
asp8200 Dec 6, 2022
b32b4cf
Comments added
asp8200 Dec 6, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/pytest-workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ jobs:
tags: snpeff
- profile: "conda"
tags: vep
- profile: "conda"
tags: concatenate_vcfs
- profile: "singularity"
tags: merge
env:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- [#864](https://github.com/nf-core/sarek/pull/864) - Added possibilities to export assembled haplotypes and locally realigned reads
- [#792](https://github.com/nf-core/sarek/pull/792) - Added the option `--concatenate_vcfs` for concatenating the germline vcf-files. Per default, the resulting vcf-files will be placed under `<outDir>/variant_calling/concat`.

### Changed

Expand Down
44 changes: 44 additions & 0 deletions conf/modules/post_variant_calling.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = When to run the module.
----------------------------------------------------------------------------------------
*/

// POSTPROCESSING VCFS
// Like, for instance, concatenating the unannotated, germline vcf-files

process {
withName: 'GERMLINE_VCFS_CONCAT'{
publishDir = [
//specify to avoid publishing, overwritten otherwise
enabled: false
]
}

withName: 'GERMLINE_VCFS_CONCAT_SORT'{
ext.prefix = { "${meta.id}.germline" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/concat/${meta.id}/" }
]
}

withName: 'TABIX_EXT_VCF_.*' {
ext.prefix = { "${input.baseName}" }
}

withName: 'TABIX_GERMLINE_VCFS_CONCAT_SORT'{
ext.prefix = { "${meta.id}.germline" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/concat/${meta.id}/" }
]
}
}
4 changes: 4 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
"git_sha": "6301e29d77e7ec7ce98b55b8a361b316a9a91bfe",
"installed_by": ["modules"]
},
"bcftools/concat": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905"
},
"bcftools/sort": {
"branch": "master",
"git_sha": "78cf39939fbe160a1410c44a6c5946f9a4c56e7e",
Expand Down
40 changes: 40 additions & 0 deletions modules/local/add_info_to_vcf/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
process ADD_INFO_TO_VCF {
tag "$meta.id"

conda (params.enable_conda ? "anaconda::gawk=5.1.0" : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/gawk:5.1.0' :
'quay.io/biocontainers/gawk:5.1.0' }"

input:
tuple val(meta), path(vcf_gz)

output:
tuple val(meta), path("*.added_info.vcf"), emit: vcf
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
"""
input="input.vcf"
output="${vcf_gz.baseName.minus(".vcf")}.added_info.vcf"
zcat $vcf_gz > \$input
## Add info header lines
grep -E "^##" \$input > \$output
## Add description of new INFO value
echo '##INFO=<ID=SOURCE,Number=1,Type=String,Description="Name of vcf-file from whence the variant came">' >> \$output
## Add column header
grep -E "^#CHROM" \$input >> \$output
## Add SOURCE value to INFO column of variant calls
if grep -Ev "^#" \$input; then
grep -Ev "^#" \$input | awk 'BEGIN{FS=OFS="\t"} { \$8=="." ? \$8="SOURCE=$vcf_gz" : \$8=\$8";SOURCE=$vcf_gz"; print }' >> \$output
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//')
END_VERSIONS
"""
}
35 changes: 35 additions & 0 deletions modules/nf-core/bcftools/concat/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/bcftools/concat/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ params {
cf_minqual = 0 // ControlFreec default values
cf_window = null // by default we are not using this in Control-FREEC
cnvkit_reference = null // by default the reference is build from the fasta file
concatenate_vcfs = false // by default we don't concatenate the germline-vcf-files
ignore_soft_clipped_bases = false // no --dont-use-soft-clipped-bases for GATK Mutect2
wes = false // Set to true, if data is exome/targeted sequencing data. Used to use correct models in various variant callers
joint_germline = false // g.vcf & joint germline calling are not run by default if HaplotypeCaller is selected
Expand Down Expand Up @@ -316,6 +317,8 @@ includeConfig 'conf/modules/mutect2.config'
includeConfig 'conf/modules/strelka.config'
includeConfig 'conf/modules/tiddit.config'

includeConfig 'conf/modules/post_variant_calling.config'

//annotate
includeConfig 'conf/modules/annotate.config'

Expand Down
6 changes: 6 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,12 @@
"default": "",
"fa_icon": "fas fa-toolbox",
"properties": {
"concatenate_vcfs": {
"type": "boolean",
"fa_icon": "fas fa-merge",
"description": "Option for concatenating germline vcf-files.",
"help_text": "Concatenating the germline vcf-files from each applied variant-caller into one vcf-file using bfctools concat."
},
"only_paired_variant_calling": {
"type": "boolean",
"fa_icon": "fas fa-forward",
Expand Down
31 changes: 15 additions & 16 deletions subworkflows/local/bam_variant_calling_germline_all/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
//TODO: Temporary until the if's can be removed and printing to terminal is prevented with "when" in the modules.config
deepvariant_vcf = Channel.empty()
freebayes_vcf = Channel.empty()
genotype_gvcf = Channel.empty()
haplotypecaller_vcf = Channel.empty()
manta_vcf = Channel.empty()
mpileup_vcf = Channel.empty()
Expand Down Expand Up @@ -95,7 +94,6 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
dict
)

mpileup_germline = BAM_VARIANT_CALLING_MPILEUP.out.mpileup
mpileup_vcf = BAM_VARIANT_CALLING_MPILEUP.out.vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_MPILEUP.out.versions)
}
Expand All @@ -116,7 +114,7 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
[]
)

ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_CNVKIT.out.versions)
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_CNVKIT.out.versions)
}

// DEEPVARIANT
Expand All @@ -128,8 +126,8 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
fasta_fai
)

deepvariant_vcf = Channel.empty().mix(BAM_VARIANT_CALLING_DEEPVARIANT.out.deepvariant_vcf)
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_DEEPVARIANT.out.versions)
deepvariant_vcf = Channel.empty().mix(BAM_VARIANT_CALLING_DEEPVARIANT.out.deepvariant_vcf)
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_DEEPVARIANT.out.versions)
}

// FREEBAYES
Expand All @@ -147,8 +145,8 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
fasta_fai
)

freebayes_vcf = BAM_VARIANT_CALLING_FREEBAYES.out.freebayes_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_FREEBAYES.out.versions)
freebayes_vcf = BAM_VARIANT_CALLING_FREEBAYES.out.freebayes_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_FREEBAYES.out.versions)
}

// HAPLOTYPECALLER
Expand Down Expand Up @@ -184,8 +182,9 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
known_sites_snps_tbi,
intervals_bed_combined_haplotypec)

haplotypecaller_vcf = BAM_VARIANT_CALLING_HAPLOTYPECALLER.out.filtered_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_HAPLOTYPECALLER.out.versions)
haplotypecaller_vcf = BAM_VARIANT_CALLING_HAPLOTYPECALLER.out.filtered_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_HAPLOTYPECALLER.out.versions)

}

// MANTA
Expand All @@ -197,8 +196,9 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
fasta_fai
)

manta_vcf = BAM_VARIANT_CALLING_GERMLINE_MANTA.out.manta_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_GERMLINE_MANTA.out.versions)

manta_vcf = BAM_VARIANT_CALLING_GERMLINE_MANTA.out.manta_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_GERMLINE_MANTA.out.versions)
}

// STRELKA
Expand All @@ -210,8 +210,8 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
fasta_fai
)

strelka_vcf = BAM_VARIANT_CALLING_SINGLE_STRELKA.out.strelka_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_SINGLE_STRELKA.out.versions)
strelka_vcf = BAM_VARIANT_CALLING_SINGLE_STRELKA.out.strelka_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_SINGLE_STRELKA.out.versions)
}

//TIDDIT
Expand All @@ -222,14 +222,13 @@ workflow BAM_VARIANT_CALLING_GERMLINE_ALL {
bwa
)

tiddit_vcf = BAM_VARIANT_CALLING_SINGLE_TIDDIT.out.tiddit_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_SINGLE_TIDDIT.out.versions)
tiddit_vcf = BAM_VARIANT_CALLING_SINGLE_TIDDIT.out.tiddit_vcf
ch_versions = ch_versions.mix(BAM_VARIANT_CALLING_SINGLE_TIDDIT.out.versions)
}

emit:
deepvariant_vcf
freebayes_vcf
genotype_gvcf
haplotypecaller_vcf
manta_vcf
mpileup_vcf
Expand Down
15 changes: 14 additions & 1 deletion subworkflows/local/bam_variant_calling_haplotypecaller/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,20 @@ workflow BAM_VARIANT_CALLING_HAPLOTYPECALLER {
known_sites_indels.concat(known_sites_snps).flatten().unique().collect(),
known_sites_indels_tbi.concat(known_sites_snps_tbi).flatten().unique().collect())

filtered_vcf = VCF_VARIANT_FILTERING_GATK.out.filtered_vcf.map{ meta, vcf-> [[patient:meta.patient, sample:meta.sample, status:meta.status, sex:meta.sex, id:meta.sample, num_intervals:meta.num_intervals, variantcaller:"haplotypecaller"], vcf]}
filtered_vcf = VCF_VARIANT_FILTERING_GATK.out.filtered_vcf.map{ meta, vcf-> [
[
patient:meta.patient,
sample:meta.sample,
status:meta.status,
sex:meta.sex,
id:meta.sample,
num_intervals:meta.num_intervals,
variantcaller:"haplotypecaller"
],
vcf
]
}

ch_versions = ch_versions.mix(GATK4_HAPLOTYPECALLER.out.versions)
ch_versions = ch_versions.mix(MERGE_HAPLOTYPECALLER.out.versions)
ch_versions = ch_versions.mix(VCF_VARIANT_FILTERING_GATK.out.versions)
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/bam_variant_calling_mpileup/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -72,5 +72,5 @@ workflow BAM_VARIANT_CALLING_MPILEUP {
emit:
versions = ch_versions
mpileup = Channel.empty().mix(CAT_MPILEUP.out.file_out, mpileup.no_intervals)
vcf = Channel.empty().mix(GATK4_MERGEVCFS.out.vcf,vcfs.no_intervals)
vcf = Channel.empty().mix(GATK4_MERGEVCFS.out.vcf, vcfs.no_intervals)
}
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ workflow BAM_VARIANT_CALLING_SINGLE_STRELKA {
sex: meta.sex,
status: meta.status,
variantcaller: "strelka"
],vcf]
], vcf]
}

ch_versions = ch_versions.mix(MERGE_STRELKA.out.versions)
Expand Down
1 change: 0 additions & 1 deletion subworkflows/local/vcf_variant_filtering_gatk/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,3 @@ workflow VCF_VARIANT_FILTERING_GATK {
versions = ch_versions
filtered_vcf
}

17 changes: 17 additions & 0 deletions tests/config/pytest_tags.yml
Original file line number Diff line number Diff line change
Expand Up @@ -263,3 +263,20 @@ vep:
- modules/nf-core/ensemblvep/main.nf
- modules/nf-core/tabix/bgziptabix/main.nf
- subworkflows/nf-core/vcf_annotate_ensemblvep/main.nf

## concatenate germline vcfs
concatenate_vcfs:
- conf/modules/post_variant_calling.config
- modules/nf-core/deepvariant/main.nf # deepvariant
- modules/nf-core/tabix/tabix/main.nf
- modules/nf-core/freebayes/main.nf # freebayes
- modules/nf-core/gatk4/haplotypecaller/main.nf # haplotypecaller
- modules/nf-core/manta/germline/main.nf # manta
- modules/nf-core/bcftools/mpileup/main.nf # mpileup/bcftools
- modules/nf-core/bcftools/sort/main.nf
- modules/nf-core/tabix/bgziptabix/main.nf
- modules/nf-core/bcftools/concat/main.nf
- modules/nf-core/samtools/mpileup/main.nf
- modules/nf-core/gatk4/mergevcfs/main.nf # gatk4/mergevcfs
- modules/nf-core/strelka/germline/main.nf # strelka
- modules/nf-core/tiddit/sv/main.nf # tiddit
3 changes: 3 additions & 0 deletions tests/csv/3.0/mapped_joint_bam.fixed.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
patient,status,sample,bam,bai
asp8200 marked this conversation as resolved.
Show resolved Hide resolved
testN,0,testN,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/bam/test.paired_end.sorted.bam,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/bam/test.paired_end.sorted.bam.bai
testT,0,testT,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.sorted.bam,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.sorted.bam.bai