rishidev edited this page Feb 15, 2019 · 35 revisions

GA4GH Cloud Work Stream

This is a wiki space for the GA4GH Cloud Work Stream. Our group focuses on API standards (and implementations) that make it easier to "send the algorithms to the data". Specifically, we have 4 API standards that allow you to share tools/workflows (TRS), execute individual jobs on clouds using a standard API (TES), run full CWL/WDL workflows on execution platforms (WES), and read/write data objects across clouds in an agnostic way (DRS).

These standards are really inspired by large-scale, distributed compute projects including, for example, PCAWG. Efforts such as these are characterized by data living in many different cloud environments, compute needing to be done across these cloud locations, and a motivation for working with disparate clouds via common and consistent API interfaces. The net effect is highly portable analysis code that ultimately enables "FAIR" science, e.g. findable, accessible, interoperable, and reproducible tools, workflows, and datasets.


See this presentation for an overview of what we do.


Our API standards are defined in Swagger YAML/OpenAPI 2.0 (with an eye toward OpenAPI 3.0) in repositories within the GA4GH GitHub organization:

Tool Registry Service (TRS)

Share CWL/WDL-described, Docker-based tools as well as CWL/WDL-based workflows. This group is led by Denis Yuen and Susheel Varma.

Repo: https://github.com/ga4gh/tool-registry-schemas

Primary Implementation: Dockstore.org, https://dockstore.org

Task Execution Service (TES)

Run CWL/WDL-described, Docker-based tools on compute VMs in a cloud-agnostic way. This group is led by Angel Pizarro and Susheel Varma.

Repo: https://github.com/ga4gh/task-execution-schemas

Primary Implementation: Funnel, https://github.com/ohsu-comp-bio/funnel

Workflow Execution Service (WES)

Run CWL/WDL-based workflows on workflow execution platforms in a platform-agnostic way. This group is led by James Eddy.

Repo: https://github.com/ga4gh/workflow-execution-schemas

Primary Implementation: https://github.com/common-workflow-language/workflow-service

Other Resources: Core consonance utilities for scheduling, reporting on, and provisioning VMs for workflows https://github.com/Consonance/consonance

Data Repository Service (DRS)

Formerly Data Object Service (DOS), DRS provides read and write data objects across object stores in a cloud-agnostic way. This work is currently co-ordinated by the Work Stream leads.

Repo: https://github.com/ga4gh/data-repository-service-schemas

Other Implementations: Observe various data stores and provide a DRS interface for access. https://github.com/ohsu-comp-bio/dos_connect

Download DRS data via the command line. https://github.com/david4096/dos-downloader

Provides GA4GH DRS methods for data in NCI GDC https://github.com/david4096/dos-gdc-lambda/ Service available at https://dos-gdc.ucsc-cgp-dev.org

Access Genomic Data Commons data using GA4GH libraries https://github.com/david4096/dos-gdc-lambda/

Standards Maintenance

Security Considerations

If a security issue with any of the above specifications is realised please send an email to security-notification@ga4gh.org detailing your concerns.

HubFlow and Contributions

The way to contribute development effort and code to the project is via GitHub pull requests. GitHub provides a nice overview on how to create a pull request. See the CONTRIBUTING.md document in each schema repo. We follow HubFlow which means we use a feature branch strategy with pull requests always going to develop and releases happening from master.

WES v1.0 / Testbed Voting Procedure

For the current phase of development there is a procedure in place to approve changes related to WES and for it's 1.0 release. The issues and PRs are available to view on the WES GitHub Site.

Changes for the release are to be approved by four developers - Marcus Kinsella (HCA), Jeff Gentry (Broad Institute), James Eddy (Sage Bionetworks), Peter Amstutz (Veritas Genetics). In addition they must not be overridden by the Cloud Work Stream Leads, Brian O'Connor and David Glazer.

Long Term Voting Process

GA4GH has a number of Driver Projects. Each of those associated with the Cloud Work Stream will nominate a representative. None of these may vote against a proposed change for it to proceed. In addition they must not be overridden by the Cloud Work Stream Leads, Brian O'Connor and David Glazer.

Our Multi-year Plan

See this document that describes our plans for the next couple years.

GA4GH-DREAM Infrastructure Challenges

We are working closely with the DREAM Challenges to test our API standards and workflow sharing process. Essentially, we are attempting to demonstrate API and process FAIR-compliance. This is a multi-phase effort with the first two challenges focusing on tool and workflow portability and reproducibility.


Weekly Work Stream Meeting

We have a weekly call on Mondays at 10am Pacific time (7am Pacific time on the first Monday of every month, 2pm Pacific time on the last Monday of every month). We invite anyone interested in these standards and/or the systems that implement them to join us on these calls.

Call-in information:

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/6971797978

Or join by phone:

 +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll) 
 +1 855 880 1246 (US Toll Free) 
 +1 877 369 0926 (US Toll Free) 

 Meeting ID: 697 179 7978
 International numbers available:  https://zoom.us/zoomconference?m=NeOyn9NS9Yq9PWkltnbNQCVIXfGIKOTG 

Past Meetings

GA4GH 5th Plenary in Orlando

This meeting happened Oct 15-17th and the Cloud Work Stream had a breakout session for most of the 15th.

For our agenda see here

For the general conference agenda see here

If you were not able to attend you can watch the Plenary recording here

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.