##  A tour of using the revised DNAStack WES Server

This will work with Thousand Genomes bam files. Even though the Thousand Genomes bam files are publically available care was taken in the following not to show the whole contents of the run details returned by the WES server. The response contains signed URLs. In other cases that would allow access to controlled access data - albeit for a limited time.

This function shows some details of the run, but not the possibly sensitive stuff.

In [4]:
def getMinimalRunLog(wesClient, run_id):
    rundetails = wesClient.GetRunLog(run_id)
    print(rundetails)
    print(rundetails['run_id'])
    print(rundetails['state'])
    print(rundetails['outputs'])

We can now use that function to minimal details of a run submitted via FASPScript4.py. That script submits a workflow to run an MD5 checksum on a Thousand Genomes bam file. It's output should just be that checksum.

In [5]:
from fasp.workflow import DNAStackWESClient
wesClient = DNAStackWESClient('~/.keys/dnastack_wes_credentials.json')
run_id = '1d764abd-5c8f-4c89-ad2b-26a43b2025f1'
getMinimalRunLog(wesClient, run_id)

{'run_id': '1d764abd-5c8f-4c89-ad2b-26a43b2025f1', 'request': {'workflow_params': {'md5Sum.inputFile': 'https://storage.googleapis.com/fc-secure-ff8156a3-ddf3-42e4-9211-0fd89da62108/GTEx_Analysis_2017-06-05_v8_WGS_CRAM_files/GTEX-1B98T-0004-SM-7J38T.cram?GoogleAccessId=forei-598@anvilprod.iam.gserviceaccount.com&Expires=1611352202&Signature=iBZ0w6x2d0bnZ2n9GvX3rK%2F4dSksgm4%2Bl9KxnZ6q9n3Sv3FobysgIuhuT%2FXVfmfc54hVkz4D2K9EFHv0rGIJe9%2BqRPlaHqAnFFtKmmTZmzkR7q7o98wa5OjUDyxxqtwLCcNZHzW1qLph6PfezZAyLuHbrmoBf49jqqpZnmlQyRHg9KawKFEp%2B7OX2QCrRShygJ3bbJg9pKtYqgSmayHDD%2B9jQI4coUTqW23jkjECQQSAaJJdYGUI0FzlGv9kziPoJDARiUIq7DcBcEonz1Y0IwyO92%2BW6psAMGL6Lm6T4ph1%2BiwEEPDIDfZPbJPPoooTtYbWzr64nPlgAqasUsMYBw=='}, 'workflow': 'WDL', 'tags': {'user_id': 'ian-fore-fasp-client', 'cromwell-workflow-id': 'cromwell-1d764abd-5c8f-4c89-ad2b-26a43b2025f1'}, 'workflow_engine_parameters': {'workflow_root': 'gs://workspaces-cromwell-execution/md5Sum/1d764abd-5c8f-4c89-ad2b-26a43b2025f1/'}, 'workflow_url': 'checksu

That indicates an empty response. Though the task appeared to complete successfully we have no result. How can we check what happened? Links to stdout and stderr are available in the response.

In [6]:
rundetails = wesClient.GetRunLog(run_id)
log = rundetails['task_logs'][0]
for stdx in ['stdout','stderr']:
    print('{} is at {}\n'.format(stdx, log[stdx]))


stdout is at https://workspaces-wes.prod.dnastack.com/ga4gh/wes/v1/runs/1d764abd-5c8f-4c89-ad2b-26a43b2025f1/logs/task/md5Sum.calculateMd5Sum/0/stdout

stderr is at https://workspaces-wes.prod.dnastack.com/ga4gh/wes/v1/runs/1d764abd-5c8f-4c89-ad2b-26a43b2025f1/logs/task/md5Sum.calculateMd5Sum/0/stderr



Following those links downloads empty files. That is the same as the previous version of the server. It was suggested that DRS would be a suitable way to make the files available.

The absence of the files means we don't have access to information that would be useful for troubleshooting. The guess is that it is the same problem as previously.

Trying another workflow. 

The function runGWASWorkflowTest encapsulates the GWAS Workflow from the 2020 GA4GH Plenary. It runs it on files accessed via Google Cloud URIs rather than DRS. This lets us check a known workflow.

In [7]:
wesClient.runGWASWorkflowTest()

'd07bcf66-893a-4151-83ab-339acbf5c5ef'

In [8]:
getMinimalRunLog(wesClient, '48749bf2-9d73-447f-8c7e-977229270578')

{'run_id': '48749bf2-9d73-447f-8c7e-977229270578', 'request': {'workflow_params': {'gwas.vcf': 'gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e/CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.JGVariants.2019-04-04/CCDG_13607_B01_GRM_WGS_2019-02-19_chr21.recalibrated_variants.vcf.gz', 'gwas.metadata_csv': 'gs://dnastack-public-bucket/thousand_genomes_meta.csv'}, 'workflow': 'WDL', 'tags': {'user_id': 'ian-fore-fasp-client', 'cromwell-workflow-id': 'cromwell-48749bf2-9d73-447f-8c7e-977229270578'}, 'workflow_engine_parameters': {'workflow_root': 'gs://workspaces-cromwell-execution/gwas/48749bf2-9d73-447f-8c7e-977229270578/'}, 'workflow_url': 'gwas.wdl'}, 'state': 'COMPLETE', 'run_log': {'name': 'gwas', 'start_time': '2021-02-24T00:29:14.958Z[UTC]', 'end_time': '2021-02-24T00:55:10.332Z[UTC]'}, 'task_logs': [{'name': 'gwas.parse_metadata', 'cmd': 'parse_metadata.sh \\\n\t-c /cromwell_root/dnastack-public-bucket/thousand_genomes_meta.csv', 'start_time': '2021-02-24T00:29:17.163Z[UTC]', 'end_time': '

In [9]:
run_id = '37589a82-536f-4b13-9199-7589d86884b2'
#rundetails = wesClient.getOutputs(run_id)

rundetails = wesClient.GetRunLog(run_id)

rundetails

{'run_id': '37589a82-536f-4b13-9199-7589d86884b2',
 'request': {'workflow_params': {'md5Sum.inputFile': 'https://storage.googleapis.com/genomics-public-data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00556/exome_alignment/HG00556.mapped.ILLUMINA.bwa.CHS.exome.20121211.bam'},
  'workflow': 'WDL',
  'tags': {'user_id': 'ian-fore-fasp-client',
   'cromwell-workflow-id': 'cromwell-37589a82-536f-4b13-9199-7589d86884b2'},
  'workflow_engine_parameters': {'workflow_root': 'gs://workspaces-cromwell-execution/md5Sum/37589a82-536f-4b13-9199-7589d86884b2/'},
  'workflow_url': 'checksum.wdl'},
 'state': 'COMPLETE',
 'run_log': {'name': 'md5Sum',
  'start_time': '2021-02-24T03:09:43.996Z[UTC]',
  'end_time': '2021-02-24T03:17:41.353Z[UTC]'},
 'task_logs': [{'name': 'md5Sum.calculateMd5Sum',
   'cmd': "gsutil hash -m /cromwell_root/storage.googleapis.com/genomics-public-data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00556/exome_alignment/HG00556.mapped.ILLUMINA.bwa.CHS.exome.201

In this case the run has produced some output files which look promising. How we access those files is still an open question. See discussion elsewhere about whether a companion DRS server, or even a DRS server accessed via the WES server and accessed under the same credentials would be a useful addition to WES.