Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the covid-19 consensus workflow #31

Merged
merged 3 commits into from
May 2, 2021
Merged

Conversation

wm75
Copy link
Contributor

@wm75 wm75 commented Apr 30, 2021

This is the last of the covid19.galaxyproject.org genomics workflows,
which still isn't deposited outside Galaxy.

This is the last of the covid19.galaxyproject.org genomics workflows,
which still isn't deposited outside Galaxy.
@wm75
Copy link
Contributor Author

wm75 commented Apr 30, 2021

The version in this PR actually contains a couple of bugs and quirks, which I've fixed this week.
So once this gets merged I would open a new PR with the 0.2 release.

This one should be it's own version cause it's been used, e.g., to construct some of the COG-UK tracking project consensus sequences and we should have a proper release to refer to.

@mvdbeek
Copy link
Member

mvdbeek commented Apr 30, 2021

We need the .dockstore.yml file to trigger tests

@@ -0,0 +1,1735 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wm75 I think this needs to be called consensus-from-variation.ga for planemo to pick up the test (or rename the test file to consensus-test.yml).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that looks better:

Applying linter structure... WARNING
.. WARNING: Workflow contained output without a label
Applying linter tests... CHECK
.. CHECK: Tests appear structurally correct for workflows/sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/consensus-from-variation.ga

@mvdbeek
Copy link
Member

mvdbeek commented Apr 30, 2021

Need to install the missing tools on main ... one of the TODO items from #29 😆. I'm on it.

@bwlang
Copy link
Contributor

bwlang commented Apr 30, 2021

@wm75 this seems pretty complex... why not use ivar consensus?
https://andersen-lab.github.io/ivar/html/manualpage.html

@wm75
Copy link
Contributor Author

wm75 commented May 1, 2021

@bwlang with the collection of sars-cov-2-genomics WFs we are handling variant calling already (in a very reliable and sensitive way), and this WF here tries to build a consensus sequence from such a list of called variants.

ivar consensus is, of course, a much simpler solution, but it's also calling variants internally from its bam input again using samtools mpileup. So this is simply a different use case: if you want to get a fast consensus use ivar consensus, but we want a consensus sequence that incorporates the exact same set of variants that we called upstream.

In addition, most of the complexity in this WF comes from the aimed-for behavior described in the README. The core business of building the consensus FASTA is also just a single step (bcftools consensus) here. A bit of simplification will also follow in the first update of the WF.

That's the other important point: we need proper releases of this particular WF because just like the ARTIC PE variation WF and the reporting WF it's being used in our national viral genome surveillance tracking efforts (see https://usegalaxy-eu.github.io/posts/2021/04/29/sars-cov-2-monitoring/plain.html). The version in this PR has been run on ~ 35,000 COG-UK and a few Estonian samples already, and the updated version, which I would like to become iwc release 0.2, will be run on many more.

All this is not to say that we shouldn't have more (and possibly more lightweight) WFs for doing viral genomic analyses, and such WFs are still on my to do list. It's just that the ones we're already running on a lot of data are a priority for moving to iwc currently.

@mvdbeek mvdbeek merged commit a595bf0 into main May 2, 2021
@mvdbeek mvdbeek deleted the add-covid-consensus-wf branch May 2, 2021 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants