# GA4GH Workflow Portability Testbed App

## Summary

The overall testbed goal is to demonstrate interoperability between multiple workflows running in multiple Workflow Execution Service (WES)-compatible environments. For Toronto, we intend to demonstrate the following: one **workflow** running in one WES-compatible **environment**; the demonstration workflow should nominally be registered in one **workflow library — i.e., tool registry service (TRS)**, and operations will be controlled by one **orchestrator** (represented by the `synorchestrator` library used below).

For the testbed app, the orchestrator performs three primary functions:
1. makes TRS call to identify and fetch the *checker* workflow for a selected workflow
2. makes WES call to run checker workflow
3. monitors and reports results


## Setup

Start by loading the `orchestrator` and `config` modules from **`synorchestrator`**. **Note:** this notebook assumes that the `synorchestrator` module and its dependencies are already installed; documentation for installing the orchestrator app and registering workflows, TRS endpoints, and WES endponts will be available soon.

In [22]:
from synorchestrator import orchestrator
from synorchestrator import config

### View available workflows, tool registries, and workflow services

The `config.show()` function will display a slightly abbreviated/redacted version of the stored configurations for workflow evaluation queues, tool registries, and workflow execution services registered with the orchestrator app.

This is intended to give the user a sense for which workflow/WES combinations to check.

In [24]:
config.show()


Orchestrator options:

Workflow Evaluation Queues
(queue ID: workflow ID [workflow type])
---------------------------------------------------------------------------
wflow0: github.com/dockstore-testing/md5sum-checker [CWL]
wflow1: github.com/dockstore-testing/md5sum-checker/wdl [WDL]
wflow2: github.com/DataBiosphere/topmed-workflows/TopMed_Variant_Caller [WDL]
wflow3: github.com/DataBiosphere/topmed-workflows/u_of_Michigan_alignment_pipeline [WDL]

Tool Registries
(TRS ID: host address)
---------------------------------------------------------------------------
dockstore: dockstore.org:8443

Workflow Services
(WES ID: host address)
---------------------------------------------------------------------------
hca-cromwell: g0n2qjnu94.execute-api.us-east-1.amazonaws.com/test
arvados-wes: wes.qr1hi.arvadosapi.com
local: 0.0.0.0:8080


#### Some comments on `config.show()`

Based on experiences with workflow orchestration thus far, we plan to provide the following additional details in order to inform testbed administration:

- workflow evaluation queues:
    - workflow *version* — currently specified in the evaluation queue config, but not presented — this is a required piece of information for retrieving workflow data from TRS
    - TRS ID — the workflow ID is meaningless without the context of the TRS implementation in which it is registered
    - workflow *type version* — both CWL and WDL (and other languages that might be supported in the future) are under active developtment; the language version used to produce the workflow of interest will dictate which WES endpoints are compatible for execution
- workflow services:
    - workflow types & version — a complete list of the workflow types (e.g., CWL, WDL) and respective language versions supported by the WES endpoint will allow the user to select realistic combinations for testing
    - filesystem protocol — protocols such as 'http', 'https', 'sftp', 's3', 'gs', 'file', 'synapse', or others as supported by the service; this is **as important** as workflow type and version for ensuring successful execution of workflow-parameter-WES combinations

## Testbed execution

### Specify workflows and execution service endpoints

`orchestrator.run_all()` is the central function for the testbed app. By supplying a map of workflow evaluation queues to registered WES endpoints, a user can automatically deploy multiple workflows in multiple environments. The `checker` argument instructs the orchestrator to identify and submit the registered checker workflow and test parameters for each workflow.

In [26]:
submissions = orchestrator.run_all(
    eval_wes_map = {
        'wflow0': ['arvados-wes']
    },
    checker=True
)

INFO:synorchestrator.orchestrator:Preparing checker workflow run request for 'github.com/dockstore-testing/md5sum-checker' from  'dockstore''
INFO:root:retrieving workflow entry from tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fmd5sum-checker
INFO:synorchestrator.trs.client:found checker workflow: github.com/dockstore-testing/md5sum-checker/_cwl_checker
INFO:root:retrieving workflow entry from tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fmd5sum-checker%2F_cwl_checker
INFO:synorchestrator.trs.client:getting descriptor from tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fmd5sum-checker%2F_cwl_checker/versions/develop/CWL/descriptor
INFO:synorchestrator.trs.client:getting descriptor from tools/%23workflow%2Fgithub.com%2Fdockstore-testing%2Fmd5sum-checker%2F_cwl_checker/versions/develop/CWL/descriptor
INFO:synorchestrator.eval:Created new job submission:
 - submission ID: 290529205224692783
INFO:synorchestrator.orchestrator:Submitting job '290529205104526601' for eval 

### Monitor workflow runs

The `orchestrator.monitor()` function currently updates and outputs a **pandas** dataframe every ~1s, displaying the current status of all workflow runs for the specified testbed submissions.

In [27]:
orchestrator.monitor(submissions)

Unnamed: 0,Unnamed: 1,submission_status,elapsed_time,job,wes_id,queue_id,run_status,run_id,start_time
md5sum-checker,290529205104526601,SUBMITTED,0h:0m:41s,checker,arvados-wes,md5sum-checker,COMPLETE,qr1hi-xvhdp-qy3azhblwcetpou,Tue May 29 20:52:25 2018
md5sum-checker,290529205224692783,SUBMITTED,0h:0m:41s,checker,arvados-wes,md5sum-checker,COMPLETE,qr1hi-xvhdp-iu5mvuk9gdnc8mg,Tue May 29 20:52:25 2018


Done


## Reporting

WDL-based workflows (TopMed) successfully ran in both Cromwell WES environments. The CWL-based `md5sum` workflow ran in Arvados — and currently the only barrier to running on the Broad Cromwell is the lack of HTTP filesystem support for inputs.