Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
GA4GH Cloud Work Stream
This is a wiki space for the GA4GH Cloud Work Stream. Our group focuses on API standards (and implementations) that make it easier to "send the algorithms to the data". Specifically, we have 4 API standards that allow you to share tools/workflows (TRS), execute individual jobs on clouds using a standard API (TES), run full CWL/WDL workflows on execution platforms (WES), and read/write data objects across clouds in an agnostic way (DRS).
These standards are really inspired by large-scale, distributed compute projects including, for example, PCAWG. Efforts such as these are characterized by data living in many different cloud environments, compute needing to be done across these cloud locations, and a motivation for working with disparate clouds via common and consistent API interfaces. The net effect is highly portable analysis code that ultimately enables "FAIR" science, e.g. findable, accessible, interoperable, and reproducible tools, workflows, and datasets.
See this presentation for an overview of what we do.
Our API standards are defined in Swagger YAML/OpenAPI 2.0 (with an eye toward OpenAPI 3.0) in repositories within the GA4GH GitHub organization:
Tool Registry Service (TRS)
Share CWL/WDL-described, Docker-based tools as well as CWL/WDL-based workflows. This group is led by Denis Yuen and Susheel Varma.
Primary Implementation: Dockstore.org, https://dockstore.org
Task Execution Service (TES)
Run CWL/WDL-described, Docker-based tools on compute VMs in a cloud-agnostic way. This group is led by Angel Pizarro and Susheel Varma.
Primary Implementation: Funnel, https://github.com/ohsu-comp-bio/funnel
Workflow Execution Service (WES)
Run CWL/WDL-based workflows on workflow execution platforms in a platform-agnostic way. This group is led by James Eddy.
Primary Implementation: https://github.com/common-workflow-language/workflow-service
Other Resources: Core consonance utilities for scheduling, reporting on, and provisioning VMs for workflows https://github.com/Consonance/consonance
Data Repository Service (DRS)
Formerly Data Object Service (DOS), DRS provides read and write data objects across object stores in a cloud-agnostic way. This work is currently co-ordinated by the Work Stream leads.
Other Implementations: Observe various data stores and provide a DRS interface for access. https://github.com/ohsu-comp-bio/dos_connect
Download DRS data via the command line. https://github.com/david4096/dos-downloader
Access Genomic Data Commons data using GA4GH libraries https://github.com/david4096/dos-gdc-lambda/
If a security issue with any of the above specifications is realised please send an email to email@example.com detailing your concerns.
HubFlow and Contributions
The way to contribute development effort and code to the project is via
GitHub pull requests. GitHub provides a nice overview on how to create
request. See the CONTRIBUTING.md document in each schema repo. We follow HubFlow which means we use
a feature branch strategy with pull requests always going to
and releases happening from
WES v1.0 / Testbed Voting Procedure
For the current phase of development there is a procedure in place to approve changes related to WES and for it's 1.0 release. The issues and PRs are available to view on the WES GitHub Site.
Changes for the release are to be approved by four developers - Marcus Kinsella (HCA), Jeff Gentry (Broad Institute), James Eddy (Sage Bionetworks), Peter Amstutz (Veritas Genetics). In addition they must not be overridden by the Cloud Work Stream Leads, Brian O'Connor and David Glazer.
Long Term Voting Process
GA4GH has a number of Driver Projects. Each of those associated with the Cloud Work Stream will nominate a representative. None of these may vote against a proposed change for it to proceed. In addition they must not be overridden by the Cloud Work Stream Leads, Brian O'Connor and David Glazer.
Our Multi-year Plan
See this document that describes our plans for the next couple years.
GA4GH-DREAM Infrastructure Challenges
We are working closely with the DREAM Challenges to test our API standards and workflow sharing process. Essentially, we are attempting to demonstrate API and process FAIR-compliance. This is a multi-phase effort with the first two challenges focusing on tool and workflow portability and reproducibility.
Weekly Work Stream Meeting
We have a weekly call on Mondays at 10am Pacific time (7am Pacific time on the first Monday of every month, 2pm Pacific time on the last Monday of every month). We invite anyone interested in these standards and/or the systems that implement them to join us on these calls.
- Google Group for discussion and meeting announcements
- Current Agendas and notes
- 2018 Jan-Oct Agendas and notes, 2017 Agendas and notes, old Agendas and notes from the Containers and Workflows Task Team)
- Upcoming talks: please sign up here if you would like to lead a discussion!
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/6971797978
Or join by phone:
+1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll) +1 855 880 1246 (US Toll Free) +1 877 369 0926 (US Toll Free) Meeting ID: 697 179 7978 International numbers available: https://zoom.us/zoomconference?m=NeOyn9NS9Yq9PWkltnbNQCVIXfGIKOTG
GA4GH 5th Plenary in Orlando
This meeting happened Oct 15-17th and the Cloud Work Stream had a breakout session for most of the 15th.
For our agenda see here
For the general conference agenda see here
If you were not able to attend you can watch the Plenary recording here