Rerunnable workflow as CWL #5

rvosa · 2020-04-11T12:10:41Z

The goal of the basic workflow is to be able to consume unaligned FASTA, align this (i.e. solve #3) and build a tree with it (by addressing #4). These steps are implemented with tools, scripts, and web service calls that are all provisioned inside a Docker container (whose Dockerfile is in the root of the repo, and whose tag will be the same as the repo name).

Subsequently, these steps will be chained together using CWL, most of which is already scaffolded in PR #1. The essential test is therefore that we should be able to run the whole thing on a clean computer using something like cwl-runner. We will then submit this to covid19.workflowhub.eu.

The text was updated successfully, but these errors were encountered:

rvosa · 2020-04-11T12:34:10Z

(For consulting on how to finalize this, we might talk to Tazro and/or Michael Crusoe)

rvosa · 2020-04-11T13:45:48Z

(For consulting on workflow hub registration, we can consult Carole Goble)

mr-c · 2020-04-11T13:57:03Z

@rvosa Happy to help!

rvosa · 2020-04-13T18:34:06Z

Hi @mr-c, thanks! Here's something I'm wondering about. In this repo they built a little workflow that does the alignment and tree building locally, for the purpose of then doing a tree shape analysis that assigns clade identifiers to the different sequences. People running that pipeline might experience some performance issues especially with the alignment step, because MAFFT is kind of expensive.

To address that, I would like to be able to provide our pipeline to them so that the compute steps are done on the CIPRES server instead.

Could you sketch out the steps of what it would take for our project to be portable enough so that that would be as painless as possible. I'm thinking something like:

our docker container is on docker hub
the CWL orchestrates the interaction with the container to do our pipeline
the CWL workflow ends up on workflow hub
...
the conda environment.yml that they're running pulls in our workflow

rvosa · 2020-04-13T18:34:29Z

i.e. what are the ... steps that would need to happen?

mr-c · 2020-04-13T18:42:53Z

Hello @rvosa !

I think that is a great idea to both run the analysis and also provide a portable "take home" version.

Make a CWL workflow. Ensure that each application has its own Docker container, preferably from biocontianers.pro
Distribute this workflow. Users can run it from any CWL compatible system. The workflow should also be registered with the Workflow HUb
No need for a conda environment.yml, their CWL runners will automatically use the Docker containers. If you'd like to have a non-Docker version then we can add SoftwareRequirement hints, which some CWL runners will translate into conda packages

rvosa · 2020-04-13T19:01:46Z

Hi @mr-c,

well, for step 3 the issue is not so much that we need an environment.yml (we don't), the issue is that these guys distribute their pipeline with an environment.yml. What I would like to accomplish is that we can contribute our work as a drop-in replacement for some of the steps they've been taking. How would that work?

mr-c · 2020-04-13T19:06:44Z

While I've never packaged a CWL workflow as a single Conda tool, it should be possible. A CWL workflow can start with #!/usr/bin/env cwl-runner and be marked executable. The Conda package could recommend or depend on the CWL reference runner, so everything would be invisible to the user. When using cwltool they would even get a --help output derived from the workflow inputs and doc property.

rvosa · 2020-04-14T00:43:46Z

How would it work the other way around? Like, I make conda recipes for the reusable tools developed here, and now I want to invoke those from CWL. Is there some facility that wraps that?

mr-c · 2020-04-22T16:53:08Z

There is a basic CWL workflow

https://view.commonwl.org/workflows/github.com/common-workflow-lab/2020-covid-19-bh/blob/8fd2d9814a5641a55efd8e63fa65a652b66f9d0b/msa/msa.cwl

It can be run locally:

cwltool https://github.com/common-workflow-lab/2020-covid-19-bh/raw/master/msa/msa.cwl  \
  https://github.com/common-workflow-lab/2020-covid-19-bh/raw/master/msa/msa_test.yaml

or via the Arvados instance at biohackathon.curii.com

arvados-cwl-runner https://github.com/common-workflow-lab/2020-covid-19-bh/raw/master/msa/msa.cwl  \
  https://github.com/common-workflow-lab/2020-covid-19-bh/raw/master/msa/msa_test.yaml

mr-c · 2020-04-22T16:55:55Z

Throughout this repository I found conflicting command line arguments in use, so please tell me the preferred options.

There are two options for the XSEDE version of IQTree that I was unable to decipher:

vparam.specify_runtype_=2 - Specify the nrun type - 2 for Tree Inference.
and
vparam.specify_numparts_=1 - How many partitions does your data set have.

Is there a source file that shows how http://www.phylo.org/index.php/rest/iqtree_xsede.html is turned into a command line?

rvosa added this to the Full workflow milestone Apr 11, 2020

rvosa changed the title ~~Rerunnable workflow~~ Rerunnable workflow as CWL Apr 11, 2020

rvosa mentioned this issue Apr 11, 2020

Push container to Docker hub #6

Closed

rvosa mentioned this issue Apr 19, 2020

CWL for preprocessing, alnspread, alngather #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rerunnable workflow as CWL #5

Rerunnable workflow as CWL #5

rvosa commented Apr 11, 2020 •

edited

Loading

rvosa commented Apr 11, 2020

rvosa commented Apr 11, 2020

mr-c commented Apr 11, 2020

rvosa commented Apr 13, 2020

rvosa commented Apr 13, 2020

mr-c commented Apr 13, 2020

rvosa commented Apr 13, 2020

mr-c commented Apr 13, 2020

rvosa commented Apr 14, 2020

mr-c commented Apr 22, 2020

mr-c commented Apr 22, 2020

Rerunnable workflow as CWL #5

Rerunnable workflow as CWL #5

Comments

rvosa commented Apr 11, 2020 • edited Loading

rvosa commented Apr 11, 2020

rvosa commented Apr 11, 2020

mr-c commented Apr 11, 2020

rvosa commented Apr 13, 2020

rvosa commented Apr 13, 2020

mr-c commented Apr 13, 2020

rvosa commented Apr 13, 2020

mr-c commented Apr 13, 2020

rvosa commented Apr 14, 2020

mr-c commented Apr 22, 2020

mr-c commented Apr 22, 2020

rvosa commented Apr 11, 2020 •

edited

Loading