# GA4GH Workflow Portability Testbed App

## Summary

The overall testbed goal is to demonstrate interoperability between multiple workflows running in multiple Workflow Execution Service (WES)-compatible environments. For Toronto, we intend to demonstrate the following: one **workflow** running in one WES-compatible **environment**; the demonstration workflow should nominally be registered in one **workflow library — i.e., tool registry service (TRS)**, and operations will be controlled by one **orchestrator** (represented by the `synorchestrator` library used below).

For the testbed app, the orchestrator performs three primary functions:
1. makes TRS call to identify and fetch the *checker* workflow for a selected workflow
2. makes WES call to run checker workflow
3. monitors and reports results

For more information on checker workflows, refer to the [tutorial](https://docs.dockstore.org/docs/publisher-tutorials/checker-workflows/) in Dockstore.


## Setup

Start by loading the `orchestrator` and `config` modules from **`synorchestrator`**. **Note:** this notebook assumes that the `synorchestrator` module and its dependencies are already installed; documentation for installing the orchestrator app and registering workflows, TRS endpoints, and WES endponts will be available soon.

In [1]:
from synorchestrator import orchestrator
from synorchestrator import config

### View available workflows, tool registries, and workflow services

The `config.show()` function will display a slightly abbreviated/redacted version of the stored configurations for workflow evaluation queues, tool registries, and workflow execution services registered with the orchestrator app.

This is intended to give the user a sense for which workflow/WES combinations to check.

In [2]:
config.show()


Orchestrator options:

Workflow Evaluation Queues
(queue ID: workflow ID [workflow type])
---------------------------------------------------------------------------
wflow0: github.com/dockstore-testing/md5sum-checker [CWL]
wflow1: github.com/dockstore-testing/md5sum-checker/wdl [WDL]
wflow2: github.com/DataBiosphere/topmed-workflows/TopMed_Variant_Caller [WDL]
wflow3: github.com/DataBiosphere/topmed-workflows/u_of_Michigan_alignment_pipeline [WDL]

Tool Registries
(TRS ID: host address)
---------------------------------------------------------------------------
dockstore: dockstore.org:8443

Workflow Services
(WES ID: host address)
---------------------------------------------------------------------------
hca-cromwell: g0n2qjnu94.execute-api.us-east-1.amazonaws.com/test
broad-cromwell: 35.226.102.121:9090
arvados-wes: wes.qr1hi.arvadosapi.com
local: 0.0.0.0:8080


#### Some comments on `config.show()`

Based on experiences with workflow orchestration thus far, we plan to provide the following additional details in order to inform testbed administration:

- workflow evaluation queues:
    - workflow *version* — currently specified in the evaluation queue config, but not presented — this is a required piece of information for retrieving workflow data from TRS
    - TRS ID — the workflow ID is meaningless without the context of the TRS implementation in which it is registered
    - workflow *type version* — both CWL and WDL (and other languages that might be supported in the future) are under active developtment; the language version used to produce the workflow of interest will dictate which WES endpoints are compatible for execution
- workflow services:
    - workflow types & version — a complete list of the workflow types (e.g., CWL, WDL) and respective language versions supported by the WES endpoint will allow the user to select realistic combinations for testing
    - filesystem protocol — protocols such as 'http', 'https', 'sftp', 's3', 'gs', 'file', 'synapse', or others as supported by the service; this is **as important** as workflow type and version for ensuring successful execution of workflow-parameter-WES combinations

## Testbed execution

### Specify workflows and execution service endpoints

`orchestrator.run_all()` is the central function for the testbed app. By supplying a map of workflow evaluation queues to registered WES endpoints, a user can automatically deploy multiple workflows in multiple environments. The `checker` argument instructs the orchestrator to identify and submit the registered checker workflow and test parameters for each workflow.

The logging output from the orchestrator provides a glimpse of what's happening within the application and with API calls to external TRS and WES endpoints.

In [3]:
submissions = orchestrator.run_all(
    {
        'wflow0': ['arvados-wes', 'broad-cromwell'],
        'wflow2': ['hca-cromwell', 'broad-cromwell'],
        'wflow3': ['hca-cromwell', 'broad-cromwell']
    },
    checker=True
)

INFO:synorchestrator.orchestrator:Preparing checker workflow run request for 'github.com/DataBiosphere/topmed-workflows/u_of_Michigan_alignment_pipeline' from  'dockstore''
INFO:root:retrieving workflow entry from tools/%23workflow%2Fgithub.com%2FDataBiosphere%2Ftopmed-workflows%2Fu_of_Michigan_alignment_pipeline
INFO:synorchestrator.trs.client:found checker workflow: github.com/DataBiosphere/topmed-workflows/u_of_Michigan_alignment_pipeline_wdl_checker
INFO:root:retrieving workflow entry from tools/%23workflow%2Fgithub.com%2FDataBiosphere%2Ftopmed-workflows%2Fu_of_Michigan_alignment_pipeline_wdl_checker
INFO:synorchestrator.trs.client:getting descriptor from tools/%23workflow%2Fgithub.com%2FDataBiosphere%2Ftopmed-workflows%2Fu_of_Michigan_alignment_pipeline_wdl_checker/versions/1.13.0/WDL/descriptor
INFO:synorchestrator.trs.client:getting descriptor from tools/%23workflow%2Fgithub.com%2FDataBiosphere%2Ftopmed-workflows%2Fu_of_Michigan_alignment_pipeline_wdl_checker/versions/1.13.0/WDL

### Monitor workflow runs

The `orchestrator.monitor()` function currently updates and outputs a **pandas** dataframe every ~1s, displaying the current status of all workflow runs for the specified testbed submissions. The `submission_status` for each checker workflow job should nominally be updated after the corresponding WES run completes (but this hasn't been fully implemented).

In [4]:
orchestrator.monitor(submissions)

Unnamed: 0,Unnamed: 1,submission_status,elapsed_time,job,wes_id,queue_id,run_status,run_id,start_time
TopMed_Variant_Caller,300530130558555810,SUBMITTED,0h:1m:12s,checker,hca-cromwell,TopMed_Variant_Caller,COMPLETE,d4b2928f-c0a4-417d-9325-5a0e04f154f0,Wed May 30 13:06:09 2018
TopMed_Variant_Caller,300530130600448081,SUBMITTED,0h:22m:34s,checker,broad-cromwell,TopMed_Variant_Caller,COMPLETE,a114e235-9d91-46df-9bea-ec6bea7a960c,Wed May 30 13:06:09 2018
md5sum-checker,300530130604559294,SUBMITTED,0,checker,arvados-wes,md5sum-checker,COMPLETE,qr1hi-xvhdp-wx5tzs1p4cbrfe4,Wed May 30 13:06:10 2018
md5sum-checker,300530130607804340,SUBMITTED,0h:0m:16s,checker,broad-cromwell,md5sum-checker,EXECUTOR_ERROR,eaf4267d-c928-470e-b53c-74d0cf1b1ca0,Wed May 30 13:06:10 2018
u_of_Michigan_alignment_pipeline,300530130554604938,SUBMITTED,0h:1m:8s,checker,hca-cromwell,u_of_Michigan_alignment_pipeline,COMPLETE,dfa545e9-b84c-460d-b1f6-2f7af4bcb1c6,Wed May 30 13:06:09 2018
u_of_Michigan_alignment_pipeline,300530130556466035,SUBMITTED,0h:25m:55s,checker,broad-cromwell,u_of_Michigan_alignment_pipeline,COMPLETE,ec44c275-cac1-4583-8137-7025dc2c7daf,Wed May 30 13:06:07 2018


Done


## Reporting

WDL-based workflows (TopMed) successfully ran in both Cromwell WES environments. The CWL-based `md5sum` workflow ran in Arvados — and currently the only barrier to running on the Broad Cromwell is the lack of HTTP filesystem support for inputs (i.e., both Cromwell environments can only ingest files stored in Google buckets, due to use of the downstream compute engine).

We're working on adding additional features for summarizing and reporting testbed results — including documents and dashboards.