## Checking status and getting results via DRS



In [38]:
#from fasp.workflow import sbWESClient
from fasp.workflow import sbcgcWESClient

cl = sbcgcWESClient('forei/gecco')

The above instantiates a client for the SevenBridges Cancer Genomics Cloud (CGC ). 

### Checking a previous run
The getTaskStatus function below is simply a wrapper around https://cgc-ga4gh-api.sbgenomics.com/ga4gh/wes/v1/runs/{run_id} which deals with authentication, passing and retrieving the request.

In [41]:
run_id = '4d796341-87c4-4cbc-b7fe-5f7cf2510161'
cl.getTaskStatus(run_id, verbose=True)

Get request sent to: https://cgc-ga4gh-api.sbgenomics.com/ga4gh/wes/v1/runs/4d796341-87c4-4cbc-b7fe-5f7cf2510161
{
  "request": {
    "tags": {},
    "workflow_params": {
      "name": "SAMtools Stats 1.8 run - 02-04-21 12:39:44",
      "project": "forei/gecco",
      "inputs": {
        "total_memory_GB": null,
        "coverage_limit": null,
        "include_only_read_group": null,
        "remove_duplicates": null,
        "max_insert_size": null,
        "reference_file": {
          "path": "drs://cgc-ga4gh-api.sbgenomics.com/5bad6c83e4b0abc138917143",
          "name": "references-hs37d5-hs37d5.fasta",
          "class": "File"
        },
        "output_file_path": "COPDGene_N95128.txt",
        "alignment_file_url": "https://storage.googleapis.com/topmed-irc-share/genomes/NWD224269.b38.irc.v1.cram?GoogleAccessId=forei-968@dcpstage-210518.iam.gserviceaccount.com&Expires=1612445983&Signature=pdei8vSrVhAXDqLMAQrGL6nzZh3G5zYOWxq%2B3%2Fsv4eyy8LCK1Cb%2FiobHJGObhmpubuUwLJYAE3pWAeChwtp

'COMPLETE'

In [43]:
cl.getTaskStatus(run_id)

'COMPLETE'

## Getting the results - via DRS
Once the run is complete, further steps can use DRS to obtain the file output from the workflow.

In [44]:
runLog = cl.GetRunLog(run_id)
runLog['outputs']

{'statistics': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/601bff910a9d98531cd03715',
  'name': '_3_COPDGene_N95128.txt',
  'class': 'File'}}

In [45]:
resultsDRSID = runLog['outputs']['statistics']['path']
resultsDRSID

'drs://cgc-ga4gh-api.sbgenomics.com/601bff910a9d98531cd03715'

### DRS GetObject
Here's how we then get details of the file. Note that here only the id portion of the DRS URI is being passed. 


In [46]:
from fasp.loc import DRSMetaResolver
mr = DRSMetaResolver()
fileDetails = mr.getObject2(resultsDRSID)
fileDetails

Searching the GA4GH registry for org.ga4gh:drs services
id:601bff910a9d98531cd03715
sending to: sbcgcDRSClient


{'id': '601bff910a9d98531cd03715',
 'name': '_3_COPDGene_N95128.txt',
 'size': 113472,
 'checksums': [{'type': 'etag',
   'checksum': '8ac7477b148bc880ce74091ad69d5ef6-1'}],
 'self_uri': 'drs://cgc-ga4gh-api.sbgenomics.com/601bff910a9d98531cd03715',
 'created_time': '2021-02-04T14:07:13Z',
 'updated_time': '2021-02-04T14:07:13Z',
 'mime_type': 'application/json',
 'access_methods': [{'type': 's3',
   'region': 'us-east-1',
   'access_id': 'aws-us-east-1'}]}

In [47]:
url = mr.getAccessURL2(resultsDRSID,'s3')

### Downloading the file
Now we can use the url obtained to download the file. We'll create a small function to encapsulate the download.

In [48]:
import requests
import os
def download(url, file_path):
    with open(os.path.expanduser(file_path), "wb") as file:
        response = requests.get(url)
        file.write(response.content)

In [49]:
fullPath = '~/Downloads/' + fileDetails['name']
download(url, fullPath)