Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the covid-19 consensus workflow #31

Merged
merged 3 commits into from
May 2, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
version: 1.2
workflows:
- name: 'COVID-19-CONSENSUS-CONSTRUCTION'
primaryDescriptorPath: /consensus.ga
subclass: Galaxy
testParameterFiles:
- /consensus-from-variation-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
0.1
---------
- Initial version of COVID-19: consensus construction
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
COVID-19: consensus construction
--------------------------------

This workflow aims at generating reliable consensus sequences from variant
calls according to transparent criteria that capture at least some of the
complexity of variant calling.

It takes a collection of VCFs and a collection of the corresponding
aligned reads (for the purpose of calculating genome-wide coverage) such as
produced by any of the four variant calling workflows in
https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling
and generates a collection of viral consensus sequences and a multisample FASTA
of all these sequences.

Each consensus sequence is guaranteed to capture all called, filter-passing
variants as defined in the VCF of its sample that reach a user-defined
consensus allele frequency threshold.

Filter-failing variants and variants below a second user-defined minimal
allele frequency threshold will be ignored.

Genomic positions of filter-passing variants with an allele frequency in
between the two thresholds will be hard-masked (with N) in the consensus
sequence of their sample.

Genomic positions with a coverage (calculated from the read alignments input)
below another user-defined threshold will be hard-masked, too, unless they are
consensus variant sites.
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
- doc: Test consensus building from called variants
job:
Reference genome:
class: File
location: 'https://zenodo.org/record/4555735/files/NC_045512.2_reference.fasta?download=1'
aligned reads data for depth calculation:
class: Collection
collection_type: 'list'
elements:
- identifier: SRR11578257
class: File
path: test-data/aligned_reads_for_coverage.bam
Variant calls:
class: Collection
collection_type: 'list'
elements:
- identifier: SRR11578257
class: File
path: test-data/final_snpeff_annotated_variants.vcf
outputs:
multisample_consensus_fasta:
file: test-data/masked_consensus.fa

Loading